Software module classification for commercial bug reports

Öztürk, Ceyhun EmreYilmaz, E. H.Koksal, O.Koç, Aykut2024-03-122024-03-122023-08-029798350302615https://hdl.handle.net/11693/114551Conference Name: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, ICASSPWDate of Conference: 04-10 June 2023In this work, we curate and investigate a dataset named Turkish Software Report - Module Classification (TSRMC), consisting of commercial software bug reports of a company. Automated bug classification is required in large-scale software projects due to the vast amount of bugs. We analyze and report the statistical features and classification difficulty of the dataset. We use several methods from the text classification literature to assign each bug report of the TSRMC dataset a suitable software module. The utilized methods include traditional machine learning (ML) methods, such as support vector machine (SVM) and logistic regression; sequential deep learning (DL) models, such as gated recurrent unit (GRU) and convolutional neural networks (CNN); and Bidirectional Encoder Representations from Transformers (BERT)-based pre-trained language models (PLMs). Our work is one of the first efforts in automated bug report classification literature that focuses on commercial bugs and uses bilingual (Turkish and English) texts.enBug triagingMachine learningNatural language processingSoftware bug report classificationSoftware engineeringSoftware module classification for commercial bug reportsConference Paper10.1109/ICASSPW59220.2023.10193706