Novel models and methods for accelerating parallel full-batch gnn training on distributed-memory systems

Bağırgan, Ahmet Can

Novel models and methods for accelerating parallel full-batch gnn training on distributed-memory systems

buir.advisor	Aykanat, Cevdet
dc.contributor.author	Bağırgan, Ahmet Can
dc.date.accessioned	2025-08-04T12:43:12Z
dc.date.available	2025-08-04T12:43:12Z
dc.date.issued	2025-07
dc.date.submitted	2025-08-01
dc.description	Cataloged from PDF version of article.
dc.description	Includes bibliographical references (leaves 57-63)
dc.description.abstract	Graph Neural Networks (GNNs) have emerged as effective tools for learning from graph-structured data across diverse application domains. Despite their suc cess, the scalability of GNNs remains a critical challenge, particularly in full-batch training on large-scale, irregularly sparse, and scale-free graphs. Traditional one dimensional (1D) vertex-parallel training strategies, while widely adopted, often suffer from severe load imbalance and excessive communication overhead, limit ing their performance on distributed-memory systems. This thesis addresses the scalability limitations of 1D approaches by investigating alternative partitioning strategies for parallelization that better exploit the structure of modern graph workloads. A systematic evaluation framework is developed to assess parallel GNN training performance across a range of datasets with varying sparsity and degree distributions. The framework captures key performance indicators such as computational load balance, inter-process communication volume, and paral lel runtime. Extensive experiments are conducted on two Tier-0 supercomputers, LUMI and MareNostrum5, using hundreds of real-world graph instances. On average of 22 well-known GNN datasets, the results show up to 61% decrease in total communication volume and up to 39% decrease in parallel runtime com pared to 1D partitioning strategies on 1024 processes. These improvements are consistent across graphs with high variance in degree and sparsity, confirming the robustness of the proposed approaches. The findings demonstrate the potential of moving beyond traditional 1D paradigms and provide practical insights into scalable and communication-efficient GNN training on distributed platforms.
dc.description.statementofresponsibility	by Ahmet Can Bağırgan
dc.embargo.release	2026-02-01
dc.format.extent	x, 66 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemid	B163140
dc.identifier.uri	https://hdl.handle.net/11693/117413
dc.language.iso	English
dc.subject	Graph neural networks
dc.subject	Graph partitioning
dc.subject	Parallel computing
dc.subject	Distributed systems
dc.title	Novel models and methods for accelerating parallel full-batch gnn training on distributed-memory systems
dc.title.alternative	Dağıtık-bellekli sistemlerde paralel çizge sinir ağları eğitimini hızlandırmak için yeni model ve yöntemler
dc.type	Thesis
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: B163140.pdf
Size:: 6.83 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.1 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science