Reducing communication volume overhead in large-scale parallel SpGEMM
Sparse matrix-matrix multiplication of the form of C = A x B, C = A x A and C = A x AT is a key operation in various domains and is characterized with high complexity and runtime overhead. There exist models for parallelizing this operation in distributed memory architectures such as outer-product (OP), inner-product (IP), row-by-row-product (RRP) and column-by-column-product (CCP). We focus on row-by-row-product due to its convincing performance, row preprocessing overhead and no symbolic multiplication requirement. The paral- lelization via row-by-row-product model can be achieved using bipartite graphs or hypergraphs. For an efficient parallelization, we can consider multiple volume- based metrics to be reduced such as total volume, maximum volume, etc. Existing approaches for RRP model do not encapsulate multiple volume-based metrics. In this thesis, we propose a two-phase approach to reduce multiple volume- based cost metrics. In the first phase, total volume is reduced with a bipartite graph model. In the second phase, we reduce maximum volume while trying to keep the increase in total volume as small as possible. Our experiments show that the proposed approach is effective at reducing multiple volume-based metrics for different forms of SpGEMM operations.