Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication

Akbudak K.; Aykanat, Cevdet

Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication

buir.contributor.author	Aykanat, Cevdet
dc.citation.epage	C590	en_US
dc.citation.issueNumber	5	en_US
dc.citation.spage	C568	en_US
dc.citation.volumeNumber	36	en_US
dc.contributor.author	Akbudak K.	en_US
dc.contributor.author	Aykanat, Cevdet	en_US
dc.date.accessioned	2015-07-28T12:02:37Z
dc.date.available	2015-07-28T12:02:37Z
dc.date.issued	2014-10-23	en_US
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	FFor outer-product-parallel sparse matrix-matrix multiplication (SpGEMM) of the form C=A×B, we propose three hypergraph models that achieve simultaneous partitioning of input and output matrices without any replication of input data. All three hypergraph models perform conformable one-dimensional (1D) columnwise and 1D rowwise partitioning of the input matrices A and B, respectively. The first hypergraph model performs two-dimensional (2D) nonzero-based partitioning of the output matrix, whereas the second and third models perform 1D rowwise and 1D columnwise partitioning of the output matrix, respectively. This partitioning scheme induces a two-phase parallel SpGEMM algorithm, where communication-free local SpGEMM computations constitute the first phase and the multiple single-node-accumulation operations on the local SpGEMM results constitute the second phase. In these models, the two partitioning constraints defined on weights of vertices encode balancing computational loads of processors during the two separate phases of the parallel SpGEMM algorithm. The partitioning objective of minimizing the cutsize defined over the cut nets encodes minimizing the total volume of communication that will occur during the second phase of the parallel SpGEMM algorithm. An MPI-based parallel SpGEMM library is developed to verify the validity of our models in practice. Parallel runs of the library for a wide range of realistic SpGEMM instances on two large-scale parallel systems JUQUEEN (an IBM BlueGene/Q system) and SuperMUC (an Intel-based cluster) show that the proposed hypergraph models attain high speedup values. © 2014 Society for Industrial and Applied Mathematics.	en_US
dc.description.provenance	Made available in DSpace on 2015-07-28T12:02:37Z (GMT). No. of bitstreams: 1 8273.pdf: 434241 bytes, checksum: e302de5b39686904b0ee0bf5cbbac70e (MD5)	en
dc.identifier.doi	10.1137/13092589X	en_US
dc.identifier.issn	1064-8275	en_US
dc.identifier.uri	http://hdl.handle.net/11693/12688	en_US
dc.language.iso	English	en_US
dc.publisher	Society for Industrial and Applied Mathematics	en_US
dc.relation.isversionof	http://dx.doi.org/10.1137/13092589X	en_US
dc.source.title	SIAM Journal on Scientific Computing	en_US
dc.subject	Matrix partitioning	en_US
dc.subject	Parallel computing	en_US
dc.subject	Sparse matrices	en_US
dc.subject	Sparse matrix-matrix multiplication	en_US
dc.subject	SpGEMM	en_US
dc.subject	Hypergraph Partitioning	en_US
dc.title	Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 8273.pdf
Size:: 424.06 KB
Format:: Adobe Portable Document Format

Download

Collections

Scholarly Publications - Computer Engineering