Detection and elimination of systematic labeling bias in code reviewer recommendation systems

Tecimer, K. Ayberk; Tüzün, Eray; Dibeklioğlu, Hamdi; Erdoğmuş, Hakan

Detection and elimination of systematic labeling bias in code reviewer recommendation systems

buir.contributor.author	Tüzün, Eray
buir.contributor.author	Dibeklioğlu, Hamdi
buir.contributor.orcid	Tüzün, Eray\|0000-0002-5550-7816
dc.citation.epage	191	en_US
dc.citation.spage	181	en_US
dc.contributor.author	Tecimer, K. Ayberk
dc.contributor.author	Tüzün, Eray
dc.contributor.author	Dibeklioğlu, Hamdi
dc.contributor.author	Erdoğmuş, Hakan
dc.coverage.spatial	Trondheim, Norway	en_US
dc.date.accessioned	2022-02-03T06:08:29Z
dc.date.available	2022-02-03T06:08:29Z
dc.date.issued	2021-06-21
dc.department	Department of Computer Engineering	en_US
dc.description	Conference Name: EASE 2021: Evaluation and Assessment in Software Engineering	en_US
dc.description	Date of Conference: June 2021	en_US
dc.description.abstract	Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.” In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.	en_US
dc.identifier.doi	10.1145/3463274.3463336	en_US
dc.identifier.isbn	978-145039053-8	en_US
dc.identifier.uri	http://hdl.handle.net/11693/76981	en_US
dc.language.iso	English	en_US
dc.publisher	Association for Computing Machinery	en_US
dc.relation.isversionof	https://doi.org/10.1145/3463274.3463336	en_US
dc.source.title	EASE 2021: Evaluation and Assessment in Software Engineering	en_US
dc.subject	Modern code review	en_US
dc.subject	Ground truth	en_US
dc.subject	Labeling bias elimination	en_US
dc.subject	Systematic labeling bias	en_US
dc.subject	Data cleaning	en_US
dc.subject	Code review recommendation	en_US
dc.title	Detection and elimination of systematic labeling bias in code reviewer recommendation systems	en_US
dc.type	Conference Paper	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Detection_and_Elimination_of_Systematic_Labeling_Bias_in_Code_Reviewer_Recommendation_Systems.pdf
Size:: 1.22 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering