• About
  • Policies
  • What is open access
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems

      Thumbnail
      View / Download
      3.5 Mb
      Author(s)
      Acer, S.
      Selvitopi, O.
      Aykanat, Cevdet
      Date
      2016
      Source Title
      Parallel Computing
      Print ISSN
      0167-8191
      Publisher
      Elsevier BV
      Volume
      59
      Pages
      71 - 96
      Language
      English
      Type
      Article
      Item Usage Stats
      260
      views
      308
      downloads
      Abstract
      We propose a comprehensive and generic framework to minimize multiple and different volume-based communication cost metrics for sparse matrix dense matrix multiplication (SpMM). SpMM is an important kernel that finds application in computational linear algebra and big data analytics. On distributed memory systems, this kernel is usually characterized with its high communication volume requirements. Our approach targets irregularly sparse matrices and is based on both graph and hypergraph partitioning models that rely on the widely adopted recursive bipartitioning paradigm. The proposed models are lightweight, portable (can be realized using any graph and hypergraph partitioning tool) and can simultaneously optimize different cost metrics besides total volume, such as maximum send/receive volume, maximum sum of send and receive volumes, etc., in a single partitioning phase. They allow one to define and optimize as many custom volume-based metrics as desired through a flexible formulation. The experiments on a wide range of about thousand matrices show that the proposed models drastically reduce the maximum communication volume compared to the standard partitioning models that only address the minimization of total volume. The improvements obtained on volume-based partition quality metrics using our models are validated with parallel SpMM as well as parallel multi-source BFS experiments on two large-scale systems. For parallel SpMM, compared to the standard partitioning models, our graph and hypergraph partitioning models respectively achieve reductions of 14% and 22% in runtime, on average. Compared to the state-of-the-art partitioner UMPa, our graph model is overall 14.5 � faster and achieves an average improvement of 19% in the partition quality on instances that are bounded by maximum volume. For parallel BFS, we show on graphs with more than a billion edges that the scalability can significantly be improved with our models compared to a recently proposed two-dimensional partitioning model.
      Keywords
      Combinatorial scientific computing
      Communication volume balancing
      Graph partitioning
      Hypergraph partitioning
      Irregular applications
      Load balancing
      Matrix partitioning
      Recursive bipartitioning
      Sparse matrices
      Sparse matrix dense matrix multiplication
      Big data
      Graph theory
      Large scale systems
      Linear algebra
      Resource allocation
      Combinatorial scientific computing
      Dense matrices
      Graph Partitioning
      Hypergraph partitioning
      Matrix partitioning
      Recursive bipartitioning
      Sparse matrices
      Matrix algebra
      Permalink
      http://hdl.handle.net/11693/36791
      Published Version (Please cite this version)
      http://dx.doi.org/10.1016/j.parco.2016.10.001
      Collections
      • Department of Computer Engineering 1561
      Show full item record

      Related items

      Showing items related by title, author, creator and subject.

      • Thumbnail

        Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies 

        Uçar, B.; Aykanat, Cevdet (SIAM, 2004)
        This paper addresses the problem of one-dimensional partitioning of structurally unsymmetric square and rectangular sparse matrices for parallel matrix-vector and matrix-transpose-vector multiplies. The objective is to ...
      • Thumbnail

        ON two-dimensional sparse matrix partitioning: models, methods, and a recipe 

        Çatalyürek, U. V.; Aykanat, Cevdet; Uçar, A. (Society for Industrial and Applied Mathematics, 2010)
        We consider two-dimensional partitioning of general sparse matrices for parallel sparse matrix-vector multiply operation. We present three hypergraph-partitioning-based methods, each having unique advantages. The first one ...
      • Thumbnail

        Fast optimal load balancing algorithms for 1D partitioning 

        Pınar, A.; Aykanat, Cevdet (Academic Press, 2004)
        The one-dimensional decomposition of nonuniform workload arrays with optimal load balancing is investigated. The problem has been studied in the literature as the "chains-on-chains partitioning" problem. Despite the rich ...

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCoursesThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCourses

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 2976
      © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy