Leveraging file significance in bus factor estimation

Haratian, Vahid

Leveraging file significance in bus factor estimation

Files

B149023.pdf (1.63 MB)

Date

2025-01

Authors

Haratian, Vahid

Advisor

Tüzün, Eray

BUIR Usage Stats

14
views

30
downloads

Abstract

Software projects often face developer turnover for various reasons. Since develop-ers are key sources of knowledge in these projects, their absence inevitably leads to some degree of knowledge loss. The Bus Factor (BF) is a metric used to assess the impact of this knowledge loss on a project’s continuity. Traditionally, BF is defined as the smallest group of developers whose departure would result in a loss of more than half of the project’s knowledge. Current state-of-the-art methods calculate developers’ knowledge based on the number of files they have authored, using data from version control systems (VCS). However, numerous studies have highlighted that not all files in software projects hold the same level of significance. In this study, we investigate the impact of weighting files based on their significance on the performance of two widely used BF estimators. Significance scores are calculated using five established graph metrics derived from the project’s De-pendency Graph: PageRank, In-/Out-/All-Degree, and Betweenness Centralities. Additionally, we introduce BFSig, a prototype implementing our approach. Lastly, we present a new dataset featuring BF scores reported by software practitioners from five prominent GitHub repositories. Our findings show that BFSig surpasses the baseline methods, achieving up to an 18% reduction in Normalized Mean Absolute Error (NMAE). Additionally, BFSig reduces False Negatives by 18%when identifying potential risks linked to low BF. Furthermore, our respondents validated BFSig’s versatility, highlighting its capability to evaluate the BF of individual project subfolders. In conclusion, we believe that when estimating BF from authorship, software components of greater significance should be given higher weight.

Keywords

Bus factor, Truck factor, File significance, Knowledge management, Intelligent collaboration tools, Dependency graph, Code referencing

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

https://hdl.handle.net/11693/115953

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Leveraging file significance in bus factor estimation

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Leveraging file significance in bus factor estimation

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type