Browsing by Subject "Distributed-memory architectures"

Now showing 1 - 2 of 2

Open Access
Hypergraph partitioning and reordering for parallel sparse triangular solves and tensor decomposition
(2021-07) Torun, Tuğba
Several scientiﬁc and real-world problems require computations with sparse ma-trices, or more generally, sparse tensors which are multi-dimensional arrays. For sparse matrix computations, parallelization of sparse triangular systems intro-duces signiﬁcant challenges because of the sequential nature of the computations involved. One approach to parallelize sparse triangular systems is to use sparse triangular SPIKE (stSPIKE) algorithm, which was originally proposed for shared memory architectures. stSPIKE decouples the problem into independent smaller systems and requires the solution of a much smaller reduced sparse triangular sys-tem. We extend and implement stSPIKE for distributed-memory architectures. Then we propose distributed-memory parallel Gauss-Seidel (dmpGS) and ILU (dmpILU) algorithms by means of stSPIKE. Furthermore, we propose novel hy-pergraph partitioning models and in-block reordering methods for minimizing the size and nonzero count of the reduced systems that arise in dmpGS and dmpILU. For sparse tensor computations, tensor decomposition is widely used in the anal-ysis of multi-dimensional data. The canonical polyadic decomposition (CPD) is one of the most popular tensor decomposition methods, which is commonly computed by the CPD-ALS algorithm. Due to high computational and mem-ory demands of CPD-ALS, it is inevitable to use a distributed-memory-parallel algorithm for eﬃciency. The medium-grain CPD-ALS algorithm, which adopts multi-dimensional cartesian tensor partitioning, is one of the most successful dis-tributed CPD-ALS algorithms for sparse tensors. We propose a novel hypergraph partitioning model, CartHP, whose partitioning objective correctly encapsulates the minimization of total communication volume of multi-dimensional cartesian tensor partitioning. Extensive experiments on real-world sparse matrices and tensors validate the parallel scalability of the proposed algorithms as well as the eﬀectiveness of the proposed hypergraph partitioning and reordering models.
Open Access
Recursive bipartitioning models for performance improvement in sparse matrix computations
(2017-08) Acer, Seher
Sparse matrix computations are among the most important building blocks of linear algebra and arise in many scienti c and engineering problems. Depending on the problem type, these computations may be in the form of sparse matrix dense matrix multiplication (SpMM), sparse matrix vector multiplication (SpMV), or factorization of a sparse symmetric matrix. For both SpMM and SpMV performed on distributed-memory architectures, the associated data and task partitions among processors a ect the parallel performance in a great extent, especially for the sparse matrices with an irregular sparsity pattern. Parallel SpMM is characterized by high volumes of data communicated among processors, whereas both the volume and number of messages are important for parallel SpMV. For the factorization performed in envelope methods, the envelope size (i.e., pro le) is an important factor which determines the performance. For improving the performance in each of these sparse matrix computations, we propose graph/hypergraph partitioning models that exploit the advantages provided by the recursive bipartitioning (RB) paradigm in order to meet the speci c needs of the respective computation. In the models proposed for SpMM and SpMV, we utilize the RB process to enable targeting multiple volume-based communication cost metrics and the combination of volume- and number-based communication cost metrics in their partitioning objectives, respectively. In the model proposed for the factorization in envelope methods, the input matrix is reordered by utilizing the RB process in which two new quality metrics relating to pro le minimization are de ned and maintained. The experimantal results show that the proposed RB-based approach outperforms the state-of-the-art for each mentioned computation.