Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems

buir.contributor.authorAykanat, Cevdet
dc.citation.epage96en_US
dc.citation.spage71en_US
dc.citation.volumeNumber59en_US
dc.contributor.authorAcer, S.en_US
dc.contributor.authorSelvitopi, O.en_US
dc.contributor.authorAykanat, Cevdeten_US
dc.date.accessioned2018-04-12T10:53:28Z
dc.date.available2018-04-12T10:53:28Z
dc.date.issued2016en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractWe propose a comprehensive and generic framework to minimize multiple and different volume-based communication cost metrics for sparse matrix dense matrix multiplication (SpMM). SpMM is an important kernel that finds application in computational linear algebra and big data analytics. On distributed memory systems, this kernel is usually characterized with its high communication volume requirements. Our approach targets irregularly sparse matrices and is based on both graph and hypergraph partitioning models that rely on the widely adopted recursive bipartitioning paradigm. The proposed models are lightweight, portable (can be realized using any graph and hypergraph partitioning tool) and can simultaneously optimize different cost metrics besides total volume, such as maximum send/receive volume, maximum sum of send and receive volumes, etc., in a single partitioning phase. They allow one to define and optimize as many custom volume-based metrics as desired through a flexible formulation. The experiments on a wide range of about thousand matrices show that the proposed models drastically reduce the maximum communication volume compared to the standard partitioning models that only address the minimization of total volume. The improvements obtained on volume-based partition quality metrics using our models are validated with parallel SpMM as well as parallel multi-source BFS experiments on two large-scale systems. For parallel SpMM, compared to the standard partitioning models, our graph and hypergraph partitioning models respectively achieve reductions of 14% and 22% in runtime, on average. Compared to the state-of-the-art partitioner UMPa, our graph model is overall 14.5 � faster and achieves an average improvement of 19% in the partition quality on instances that are bounded by maximum volume. For parallel BFS, we show on graphs with more than a billion edges that the scalability can significantly be improved with our models compared to a recently proposed two-dimensional partitioning model.en_US
dc.description.provenanceMade available in DSpace on 2018-04-12T10:53:28Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 179475 bytes, checksum: ea0bedeb05ac9ccfb983c327e155f0c2 (MD5) Previous issue date: 2016en
dc.identifier.doi10.1016/j.parco.2016.10.001en_US
dc.identifier.issn0167-8191
dc.identifier.urihttp://hdl.handle.net/11693/36791
dc.language.isoEnglishen_US
dc.publisherElsevier BVen_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/j.parco.2016.10.001en_US
dc.source.titleParallel Computingen_US
dc.subjectCombinatorial scientific computingen_US
dc.subjectCommunication volume balancingen_US
dc.subjectGraph partitioningen_US
dc.subjectHypergraph partitioningen_US
dc.subjectIrregular applicationsen_US
dc.subjectLoad balancingen_US
dc.subjectMatrix partitioningen_US
dc.subjectRecursive bipartitioningen_US
dc.subjectSparse matricesen_US
dc.subjectSparse matrix dense matrix multiplicationen_US
dc.subjectBig dataen_US
dc.subjectGraph theoryen_US
dc.subjectLarge scale systemsen_US
dc.subjectLinear algebraen_US
dc.subjectResource allocationen_US
dc.subjectCombinatorial scientific computingen_US
dc.subjectDense matricesen_US
dc.subjectGraph Partitioningen_US
dc.subjectHypergraph partitioningen_US
dc.subjectMatrix partitioningen_US
dc.subjectRecursive bipartitioningen_US
dc.subjectSparse matricesen_US
dc.subjectMatrix algebraen_US
dc.titleImproving performance of sparse matrix dense matrix multiplication on large-scale parallel systemsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems.pdf
Size:
3.53 MB
Format:
Adobe Portable Document Format
Description:
Full printable version