Leveraging file significance in bus factor estimation
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
BUIR Usage Stats
views
downloads
Series
Abstract
Software projects often face developer turnover for various reasons. Since develop-ers are key sources of knowledge in these projects, their absence inevitably leads to some degree of knowledge loss. The Bus Factor (BF) is a metric used to assess the impact of this knowledge loss on a project’s continuity. Traditionally, BF is defined as the smallest group of developers whose departure would result in a loss of more than half of the project’s knowledge. Current state-of-the-art methods calculate developers’ knowledge based on the number of files they have authored, using data from version control systems (VCS). However, numerous studies have highlighted that not all files in software projects hold the same level of significance. In this study, we investigate the impact of weighting files based on their significance on the performance of two widely used BF estimators. Significance scores are calculated using five established graph metrics derived from the project’s De-pendency Graph: PageRank, In-/Out-/All-Degree, and Betweenness Centralities. Additionally, we introduce BFSig, a prototype implementing our approach. Lastly, we present a new dataset featuring BF scores reported by software practitioners from five prominent GitHub repositories. Our findings show that BFSig surpasses the baseline methods, achieving up to an 18% reduction in Normalized Mean Absolute Error (NMAE). Additionally, BFSig reduces False Negatives by 18%when identifying potential risks linked to low BF. Furthermore, our respondents validated BFSig’s versatility, highlighting its capability to evaluate the BF of individual project subfolders. In conclusion, we believe that when estimating BF from authorship, software components of greater significance should be given higher weight.