Leveraging file significance in bus factor estimation

buir.advisorTüzün, Eray
dc.contributor.authorHaratian, Vahid
dc.date.accessioned2025-01-17T06:13:37Z
dc.date.available2025-01-17T06:13:37Z
dc.date.copyright2025-01
dc.date.issued2025-01
dc.date.submitted2025-01-15
dc.descriptionCataloged from PDF version of article.
dc.descriptionIncludes bibliographical references (leaves 64-70).
dc.description.abstractSoftware projects often face developer turnover for various reasons. Since develop-ers are key sources of knowledge in these projects, their absence inevitably leads to some degree of knowledge loss. The Bus Factor (BF) is a metric used to assess the impact of this knowledge loss on a project’s continuity. Traditionally, BF is defined as the smallest group of developers whose departure would result in a loss of more than half of the project’s knowledge. Current state-of-the-art methods calculate developers’ knowledge based on the number of files they have authored, using data from version control systems (VCS). However, numerous studies have highlighted that not all files in software projects hold the same level of significance. In this study, we investigate the impact of weighting files based on their significance on the performance of two widely used BF estimators. Significance scores are calculated using five established graph metrics derived from the project’s De-pendency Graph: PageRank, In-/Out-/All-Degree, and Betweenness Centralities. Additionally, we introduce BFSig, a prototype implementing our approach. Lastly, we present a new dataset featuring BF scores reported by software practitioners from five prominent GitHub repositories. Our findings show that BFSig surpasses the baseline methods, achieving up to an 18% reduction in Normalized Mean Absolute Error (NMAE). Additionally, BFSig reduces False Negatives by 18%when identifying potential risks linked to low BF. Furthermore, our respondents validated BFSig’s versatility, highlighting its capability to evaluate the BF of individual project subfolders. In conclusion, we believe that when estimating BF from authorship, software components of greater significance should be given higher weight.
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2025-01-17T06:13:37Z No. of bitstreams: 1 B149023.pdf: 1708035 bytes, checksum: 49eb860728efdd49235326f92fd85816 (MD5)en
dc.description.provenanceMade available in DSpace on 2025-01-17T06:13:37Z (GMT). No. of bitstreams: 1 B149023.pdf: 1708035 bytes, checksum: 49eb860728efdd49235326f92fd85816 (MD5) Previous issue date: 2025-01en
dc.description.statementofresponsibilityby Vahid Haratian
dc.format.extentxi, 70 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemidB149023
dc.identifier.urihttps://hdl.handle.net/11693/115953
dc.language.isoEnglish
dc.subjectBus factor
dc.subjectTruck factor
dc.subjectFile significance
dc.subjectKnowledge management
dc.subjectIntelligent collaboration tools
dc.subjectDependency graph
dc.subjectCode referencing
dc.titleLeveraging file significance in bus factor estimation
dc.title.alternativeDosya öneminin otobüs faktörü tahminindeki rolü
dc.typeThesis
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B149023.pdf
Size:
1.63 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: