Improving the performance of 1D vertex parallel GNN training on distributed memory systems

Graph Neural Networks (GNNs) are pivotal for analyzing data within graphstructured domains such as social media, biological networks, and recommendation systems. Despite their advantages, scaling GNN training to large datasets in distributed settings poses significant challenges due to the complex task of managing computation and communication costs. The objective of this work is to scale 1D vertex-parallel GNN training on distributed memory systems via (i) twoconstraint partitioning formulation for better computational load balancing and (ii) overlapping communication with computation for reducing communication overhead. In the proposed two-constraint formulation, one constraint encodes the computational load balance during forward propagation, whereas the second constraint encodes the computational load balance during backward propagation. We propose three communication and computation overlapping methods that perform overlapping at three different levels. These methods were tested against traditional approaches using benchmark datasets, demonstrating improved training efficiency without altering the model structure. The outcomes indicate that multi-constraint graph partitioning and the integration of communication and computation overlapping schemes can significantly mitigate the challenges of distributed GNN training. The research concludes with recommendations for future work, including adapting these techniques to dynamic and more complex GNN architectures, promising further improvements in the efficiency and applicability of GNNs in real-world scenarios.

Keywords

Graph neural networks, Parallel and distributed memory systems, Graph partitioning, Load balancing, Overlapping communication with computation

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

https://hdl.handle.net/11693/115727

Collections

Graduate School of Economics and Social Sciences

Full item page

Improving the performance of 1D vertex parallel GNN training on distributed memory systems

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Volume

Issue

Pages

Language

Type

Journal Title

Journal ISSN

Volume Title

Attention Stats

Usage Stats

Series

Abstract

Course

Other identifiers

Book Title