Partitioning models for scaling parallel sparse matrix-matrix multiplication

buir.contributor.authorAkbudak, Kadir
buir.contributor.authorSelvitopi, Oğuz
buir.contributor.authorAykanat, Cevdet
dc.citation.epage13:34en_US
dc.citation.issueNumber3en_US
dc.citation.spage13:1en_US
dc.citation.volumeNumber4en_US
dc.contributor.authorAkbudak, Kadiren_US
dc.contributor.authorSelvitopi, Oğuzen_US
dc.contributor.authorAykanat, Cevdeten_US
dc.date.accessioned2019-02-12T08:53:53Z
dc.date.available2019-02-12T08:53:53Z
dc.date.issued2018en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractWe investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a hypergraph model and a bipartite graph model for distributing SpGEMM computations based on one-dimensional (1D) partitioning of input matrices. We also propose a communication hypergraph model for each formulation for distributing communication operations. The computational graph and hypergraph models adopted in the first phase aim at minimizing the total message volume and balancing the computational loads of processors, whereas the communication hypergraph models adopted in the second phase aim at minimizing the total message count and balancing the message volume loads of processors. That is, the computational partitioning models reduce the bandwidth cost and the communication hypergraph models reduce the latency cost. Our extensive parallel experiments on up to 2048 processors for a wide range of realistic SpGEMM instances show that although the outer-product--parallel formulation scales better, the row-by-row-product--parallel formulation is more viable due to its significantly lower partitioning overhead and competitive scalability. For computational partitioning models, our experimental findings indicate that the proposed bipartite graph models are attractive alternatives to their hypergraph counterparts because of their lower partitioning overhead. Finally, we show that by reducing the latency cost besides the bandwidth cost through using the communication hypergraph models, the parallel SpGEMM time can be further improved up to 32%.en_US
dc.description.provenanceSubmitted by Türkan Cesur (cturkan@bilkent.edu.tr) on 2019-02-12T08:53:53Z No. of bitstreams: 1 Partitioning_models_for_scaling_parallel_sparse_matrix-matrix_multiplicatio.pdf: 1997081 bytes, checksum: 75e08e01b334225214882a059ab78553 (MD5)en
dc.description.provenanceMade available in DSpace on 2019-02-12T08:53:53Z (GMT). No. of bitstreams: 1 Partitioning_models_for_scaling_parallel_sparse_matrix-matrix_multiplicatio.pdf: 1997081 bytes, checksum: 75e08e01b334225214882a059ab78553 (MD5) Previous issue date: 2018-04en
dc.identifier.doi10.1145/3155292en_US
dc.identifier.eissn2329-4957en_US
dc.identifier.issn2329-4949en_US
dc.identifier.urihttp://hdl.handle.net/11693/49306en_US
dc.language.isoEnglishen_US
dc.publisherAssociation for Computing Machineryen_US
dc.relation.isversionofhttp://doi.org/10.1145/3155292en_US
dc.source.titleACM Transactions on Parallel Computingen_US
dc.subjectSparse matrix-matrix multiplicationen_US
dc.subjectSpGEMMen_US
dc.subjectHypergraph partitioningen_US
dc.subjectGraph partitioningen_US
dc.subjectCommunication costen_US
dc.subjectBandwidthen_US
dc.subjectLatencyen_US
dc.titlePartitioning models for scaling parallel sparse matrix-matrix multiplicationen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Partitioning_models_for_scaling_parallel_sparse_matrix-matrix_multiplicatio.pdf
Size:
1.9 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: