Browsing by Subject "One-dimensional partitioning"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Open Access Addressing volume and latency overheads in 1d-parallel sparse matrix-vector multiplication(Springer, 2017-08-09) Acer, Seher; Selvitopi, Oğuz; Aykanat, CevdetThe scalability of sparse matrix-vector multiplication (SpMV) on distributed memory systems depends on multiple factors that involve different communication cost metrics. The irregular sparsity pattern of the coefficient matrix manifests itself as high bandwidth (total and/or maximum volume) and/or high latency (total and/or maximum message count) overhead. In this work, we propose a hypergraph partitioning model which combines two earlier models for one-dimensional partitioning, one addressing total and maximum volume, and the other one addressing total volume and total message count. Our model relies on the recursive bipartitioning paradigm and simultaneously addresses three cost metrics in a single partitioning phase in order to reduce volume and latency overheads. We demonstrate the validity of our model on a large dataset that contains more than 300 matrices. The results indicate that compared to the earlier models, our model significantly improves the scalability of SpMV. © 2017, Springer International Publishing AG.Item Open Access Fast optimal load balancing algorithms for 1D partitioning(Academic Press, 2004) Pınar, A.; Aykanat, CevdetThe one-dimensional decomposition of nonuniform workload arrays with optimal load balancing is investigated. The problem has been studied in the literature as the "chains-on-chains partitioning" problem. Despite the rich literature on exact algorithms, heuristics are still used in parallel computing community with the "hope" of good decompositions and the "myth" of exact algorithms being hard to implement and not runtime efficient. We show that exact algorithms yield significant improvements in load balance over heuristics with negligible overhead. Detailed pseudocodes of the proposed algorithms are provided for reproducibility. We start with a literature review and propose improvements and efficient implementation tips for these algorithms. We also introduce novel algorithms that are asymptotically and runtime efficient. Our experiments on sparse matrix and direct volume rendering datasets verify that balance can be significantly improved by using exact algorithms. The proposed exact algorithms are 100 times faster than a single sparse-matrix vector multiplication for 64-way decompositions on the average. We conclude that exact algorithms with proposed efficient implementations can effectively replace heuristics. © 2004 Elsevier Inc. All rights reserved.Item Open Access ON two-dimensional sparse matrix partitioning: models, methods, and a recipe(Society for Industrial and Applied Mathematics, 2010) Çatalyürek, U. V.; Aykanat, Cevdet; Uçar, A.We consider two-dimensional partitioning of general sparse matrices for parallel sparse matrix-vector multiply operation. We present three hypergraph-partitioning-based methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces fine-grain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed two-dimensional partitioning methods together with the hypergraph-based one-dimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics. © 2010 Society for Industrial and Applied Mathematics.Item Open Access One-dimensional partitioning for heterogeneous systems: theory and practice(Academic Press, 2008-11) Pınar, A.; Tabak, E. K.; Aykanat, CevdetWe study the problem of one-dimensional partitioning of nonuniform workload arrays, with optimal load balancing for heterogeneous systems. We look at two cases: chain-on-chain partitioning, where the order of the processors is specified, and chain partitioning, where processor permutation is allowed. We present polynomial time algorithms to solve the chain-on-chain partitioning problem optimally, while we prove that the chain partitioning problem is NP-complete. Our empirical studies show that our proposed exact algorithms produce substantially better results than heuristics, while solution times remain comparable. © 2008 Elsevier Inc. All rights reserved.