Partitioning models for scaling distributed graph computations
Author
Demirci, Gündüz Vehbi
Advisor
Aykanat, Cevdet
Date
2019-09Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
132
views
views
125
downloads
downloads
Abstract
The focus of this thesis is intelligent partitioning models and methods for
scaling the performance of parallel graph computations on distributed-memory
systems. Distributed databases utilize graph partitioning to provide servers with
data-locality and workload-balance. Some queries performed on a database may
form cascades due to the queries triggering each other. The current partitioning
methods consider the graph structure and logs of query workload. We introduce
the cascade-aware graph partitioning problem with the objective of minimizing
the overall cost of communication operations between servers during cascade processes.
We propose a randomized algorithm that integrates the graph structure
and cascade processes to use as input for large-scale partitioning. Experiments
on graphs representing real social networks demonstrate the e ectiveness of the
proposed solution in terms of the partitioning objectives.
Sparse-general-matrix-multiplication (SpGEMM) is a key computational kernel
used in scienti c computing and high-performance graph computations. We
propose an SpGEMM algorithm for Accumulo database which enables high performance
distributed parallelism through its iterator framework. The proposed
algorithm provides write-locality and avoids scanning input matrices multiple
times by utilizing Accumulo's batch scanning capability and node-level parallelism
structures. We also propose a matrix partitioning scheme that reduces
the total communication volume and provides a workload-balance among servers.
Extensive experiments performed on both real-world and synthetic sparse matrices
show that the proposed algorithm and matrix partitioning scheme provide
signi cant performance improvements.
Scalability of parallel SpGEMM algorithms are heavily communication bound.
Multidimensional partitioning of SpGEMM's workload is essential to achieve
higher scalability. We propose hypergraph models that utilize the arrangement of processors and also attain a multidimensional partitioning on SpGEMM's workload.
Thorough experimentation performed on both realistic as well as synthetically
generated SpGEMM instances demonstrates the e ectiveness of the
proposed partitioning models.
Keywords
Graph partitioningPropagation models
Information cascade
Social networks
Randomized algorithms
Scalability
Databases
Accumulo
Graphulo
Parallel and distributed computing
Sparse matrices
Sparse matrix-matrix multiplication
SpGEMM
Matrix partitioning
Data locality
Hypergraph partitioning
Numerical linear algebra