Software module classification for commercial bug reports
Date
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Citation Stats
Attention Stats
Usage Stats
views
downloads
Series
Abstract
In this work, we curate and investigate a dataset named Turkish Software Report - Module Classification (TSRMC), consisting of commercial software bug reports of a company. Automated bug classification is required in large-scale software projects due to the vast amount of bugs. We analyze and report the statistical features and classification difficulty of the dataset. We use several methods from the text classification literature to assign each bug report of the TSRMC dataset a suitable software module. The utilized methods include traditional machine learning (ML) methods, such as support vector machine (SVM) and logistic regression; sequential deep learning (DL) models, such as gated recurrent unit (GRU) and convolutional neural networks (CNN); and Bidirectional Encoder Representations from Transformers (BERT)-based pre-trained language models (PLMs). Our work is one of the first efforts in automated bug report classification literature that focuses on commercial bugs and uses bilingual (Turkish and English) texts.