Novel models and methods for accelerating parallel full-batch gnn training on distributed-memory systems

buir.advisorAykanat, Cevdet
dc.contributor.authorBağırgan, Ahmet Can
dc.date.accessioned2025-08-04T12:43:12Z
dc.date.available2025-08-04T12:43:12Z
dc.date.issued2025-07
dc.date.submitted2025-08-01
dc.descriptionCataloged from PDF version of article.
dc.descriptionIncludes bibliographical references (leaves 57-63)
dc.description.abstractGraph Neural Networks (GNNs) have emerged as effective tools for learning from graph-structured data across diverse application domains. Despite their suc cess, the scalability of GNNs remains a critical challenge, particularly in full-batch training on large-scale, irregularly sparse, and scale-free graphs. Traditional one dimensional (1D) vertex-parallel training strategies, while widely adopted, often suffer from severe load imbalance and excessive communication overhead, limit ing their performance on distributed-memory systems. This thesis addresses the scalability limitations of 1D approaches by investigating alternative partitioning strategies for parallelization that better exploit the structure of modern graph workloads. A systematic evaluation framework is developed to assess parallel GNN training performance across a range of datasets with varying sparsity and degree distributions. The framework captures key performance indicators such as computational load balance, inter-process communication volume, and paral lel runtime. Extensive experiments are conducted on two Tier-0 supercomputers, LUMI and MareNostrum5, using hundreds of real-world graph instances. On average of 22 well-known GNN datasets, the results show up to 61% decrease in total communication volume and up to 39% decrease in parallel runtime com pared to 1D partitioning strategies on 1024 processes. These improvements are consistent across graphs with high variance in degree and sparsity, confirming the robustness of the proposed approaches. The findings demonstrate the potential of moving beyond traditional 1D paradigms and provide practical insights into scalable and communication-efficient GNN training on distributed platforms.
dc.description.statementofresponsibilityby Ahmet Can Bağırgan
dc.embargo.release2026-02-01
dc.format.extentx, 66 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemidB163140
dc.identifier.urihttps://hdl.handle.net/11693/117413
dc.language.isoEnglish
dc.subjectGraph neural networks
dc.subjectGraph partitioning
dc.subjectParallel computing
dc.subjectDistributed systems
dc.titleNovel models and methods for accelerating parallel full-batch gnn training on distributed-memory systems
dc.title.alternativeDağıtık-bellekli sistemlerde paralel çizge sinir ağları eğitimini hızlandırmak için yeni model ve yöntemler
dc.typeThesis
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B163140.pdf
Size:
6.83 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.1 KB
Format:
Item-specific license agreed upon to submission
Description: