Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems

Acer, S.; Selvitopi, O.; Aykanat, Cevdet

Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems

buir.contributor.author	Aykanat, Cevdet
dc.citation.epage	96	en_US
dc.citation.spage	71	en_US
dc.citation.volumeNumber	59	en_US
dc.contributor.author	Acer, S.	en_US
dc.contributor.author	Selvitopi, O.	en_US
dc.contributor.author	Aykanat, Cevdet	en_US
dc.date.accessioned	2018-04-12T10:53:28Z
dc.date.available	2018-04-12T10:53:28Z
dc.date.issued	2016	en_US
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	We propose a comprehensive and generic framework to minimize multiple and different volume-based communication cost metrics for sparse matrix dense matrix multiplication (SpMM). SpMM is an important kernel that finds application in computational linear algebra and big data analytics. On distributed memory systems, this kernel is usually characterized with its high communication volume requirements. Our approach targets irregularly sparse matrices and is based on both graph and hypergraph partitioning models that rely on the widely adopted recursive bipartitioning paradigm. The proposed models are lightweight, portable (can be realized using any graph and hypergraph partitioning tool) and can simultaneously optimize different cost metrics besides total volume, such as maximum send/receive volume, maximum sum of send and receive volumes, etc., in a single partitioning phase. They allow one to define and optimize as many custom volume-based metrics as desired through a flexible formulation. The experiments on a wide range of about thousand matrices show that the proposed models drastically reduce the maximum communication volume compared to the standard partitioning models that only address the minimization of total volume. The improvements obtained on volume-based partition quality metrics using our models are validated with parallel SpMM as well as parallel multi-source BFS experiments on two large-scale systems. For parallel SpMM, compared to the standard partitioning models, our graph and hypergraph partitioning models respectively achieve reductions of 14% and 22% in runtime, on average. Compared to the state-of-the-art partitioner UMPa, our graph model is overall 14.5 ï¿½ faster and achieves an average improvement of 19% in the partition quality on instances that are bounded by maximum volume. For parallel BFS, we show on graphs with more than a billion edges that the scalability can significantly be improved with our models compared to a recently proposed two-dimensional partitioning model.	en_US
dc.identifier.doi	10.1016/j.parco.2016.10.001	en_US
dc.identifier.issn	0167-8191	en_US
dc.identifier.uri	http://hdl.handle.net/11693/36791	en_US
dc.language.iso	English	en_US
dc.publisher	Elsevier BV	en_US
dc.relation.isversionof	http://dx.doi.org/10.1016/j.parco.2016.10.001	en_US
dc.source.title	Parallel Computing	en_US
dc.subject	Combinatorial scientific computing	en_US
dc.subject	Communication volume balancing	en_US
dc.subject	Graph partitioning	en_US
dc.subject	Hypergraph partitioning	en_US
dc.subject	Irregular applications	en_US
dc.subject	Load balancing	en_US
dc.subject	Matrix partitioning	en_US
dc.subject	Recursive bipartitioning	en_US
dc.subject	Sparse matrices	en_US
dc.subject	Sparse matrix dense matrix multiplication	en_US
dc.subject	Big data	en_US
dc.subject	Graph theory	en_US
dc.subject	Large scale systems	en_US
dc.subject	Linear algebra	en_US
dc.subject	Resource allocation	en_US
dc.subject	Combinatorial scientific computing	en_US
dc.subject	Dense matrices	en_US
dc.subject	Graph Partitioning	en_US
dc.subject	Hypergraph partitioning	en_US
dc.subject	Matrix partitioning	en_US
dc.subject	Recursive bipartitioning	en_US
dc.subject	Sparse matrices	en_US
dc.subject	Matrix algebra	en_US
dc.title	Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems.pdf
Size:: 3.53 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Computer Engineering