Detection and elimination of systematic labeling bias in code reviewer recommendation systems

Tecimer, K. Ayberk; Tüzün, Eray; Dibeklioğlu, Hamdi; Erdoğmuş, Hakan

Detection and elimination of systematic labeling bias in code reviewer recommendation systems

Files

Detection_and_Elimination_of_Systematic_Labeling_Bias_in_Code_Reviewer_Recommendation_Systems.pdf (1.22 MB)

Date

2021-06-21

Authors

BUIR Usage Stats

3
views

103
downloads

Citation Stats

Abstract

Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.”

In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice.

In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.

Source Title

EASE 2021: Evaluation and Assessment in Software Engineering

Publisher

Association for Computing Machinery

Keywords

Modern code review, Ground truth, Labeling bias elimination, Systematic labeling bias, Data cleaning, Code review recommendation

Permalink

http://hdl.handle.net/11693/76981

Published Version (Please cite this version)

https://doi.org/10.1145/3463274.3463336

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Conference Paper

Full item page

Detection and elimination of systematic labeling bias in code reviewer recommendation systems

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Detection and elimination of systematic labeling bias in code reviewer recommendation systems

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type