• About
  • Policies
  • What is open access
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Cleaning ground truth data in software task assignment

      Thumbnail
      Embargo Lift Date: 2024-05-25
      View / Download
      1.8 Mb
      Author(s)
      Tecimer, K. A.
      Tüzün, Eray
      Moran, Cansu
      Erdogmus, H.
      Date
      2022-05-25
      Source Title
      Information and Software Technology
      Print ISSN
      0950-5849
      Electronic ISSN
      1873-6025
      Publisher
      Elsevier BV
      Volume
      149
      Pages
      106956- 1 - 106956- 14
      Language
      English
      Type
      Article
      Item Usage Stats
      7
      views
      0
      downloads
      Abstract
      Context: In the context of collaborative software development, there are many application areas of task assignment such as assigning a developer to fix a bug, or assigning a code reviewer to a pull request. Most task assignment techniques in the literature build and evaluate their models based on datasets collected from real projects. The techniques invariably presume that these datasets reliably represent the “ground truth”. In a project dataset used to build an automated task assignment system, the recommended assignee for the task is usually assumed to be the best assignee for that task. However, in practice, the task assignee may not be the best possible task assignee, or even a sufficiently qualified one. Objective: We aim to clean up the ground truth by removing the samples that are potentially problematic or suspect with the assumption that removing such samples would reduce any systematic labeling bias in the dataset and lead to performance improvements. Method: We devised a debiasing method to detect potentially problematic samples in task assignment datasets. We then evaluated the method’s impact on the performance of seven task assignment techniques by comparing the Mean Reciprocal Rank (MRR) scores before and after debiasing. We used two different task assignment applications for this purpose: Code Reviewer Recommendation (CRR) and Bug Assignment (BA). Results: In the CRR application, we achieved an average MRR improvement of 18.17% for the three learning-based techniques tested on two datasets. No significant improvements were observed for the two optimization-based techniques tested on the same datasets. In the BA application, we achieved a similar average MRR improvement of 18.40% for the two learning-based techniques tested on four different datasets. Conclusion: Debiasing the ground truth data by removing suspect samples can help improve the performance of learning-based techniques in software task assignment applications.
      Keywords
      Task assignment
      Code reviewer recommendation
      Bug assignment
      Ground truth
      Labeling bias elimination
      Systematic labeling bias
      Data cleaning
      Permalink
      http://hdl.handle.net/11693/111501
      Published Version (Please cite this version)
      https://doi.org/10.1016/j.infsof.2022.106956
      Collections
      • Department of Computer Engineering 1561
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCoursesThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCourses

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 2976
      © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy