Browsing by Subject "Tensor decomposition"

Now showing 1 - 3 of 3

Open Access
Hypergraph partitioning and reordering for parallel sparse triangular solves and tensor decomposition
(2021-07) Torun, Tuğba
Several scientiﬁc and real-world problems require computations with sparse ma-trices, or more generally, sparse tensors which are multi-dimensional arrays. For sparse matrix computations, parallelization of sparse triangular systems intro-duces signiﬁcant challenges because of the sequential nature of the computations involved. One approach to parallelize sparse triangular systems is to use sparse triangular SPIKE (stSPIKE) algorithm, which was originally proposed for shared memory architectures. stSPIKE decouples the problem into independent smaller systems and requires the solution of a much smaller reduced sparse triangular sys-tem. We extend and implement stSPIKE for distributed-memory architectures. Then we propose distributed-memory parallel Gauss-Seidel (dmpGS) and ILU (dmpILU) algorithms by means of stSPIKE. Furthermore, we propose novel hy-pergraph partitioning models and in-block reordering methods for minimizing the size and nonzero count of the reduced systems that arise in dmpGS and dmpILU. For sparse tensor computations, tensor decomposition is widely used in the anal-ysis of multi-dimensional data. The canonical polyadic decomposition (CPD) is one of the most popular tensor decomposition methods, which is commonly computed by the CPD-ALS algorithm. Due to high computational and mem-ory demands of CPD-ALS, it is inevitable to use a distributed-memory-parallel algorithm for eﬃciency. The medium-grain CPD-ALS algorithm, which adopts multi-dimensional cartesian tensor partitioning, is one of the most successful dis-tributed CPD-ALS algorithms for sparse tensors. We propose a novel hypergraph partitioning model, CartHP, whose partitioning objective correctly encapsulates the minimization of total communication volume of multi-dimensional cartesian tensor partitioning. Extensive experiments on real-world sparse matrices and tensors validate the parallel scalability of the proposed algorithms as well as the eﬀectiveness of the proposed hypergraph partitioning and reordering models.
Open Access
Partitioning models for general medium-grain parallel sparse tensor decomposition
(IEEE, 2021) Karsavuran, M. Ozan; Acer, S.; Aykanat, Cevdet
The focus of this article is efficient parallelization of the canonical polyadic decomposition algorithm utilizing the alternating least squares method for sparse tensors on distributed-memory architectures. We propose a hypergraph model for general medium-grain partitioning which does not enforce any topological constraint on the partitioning. The proposed model is based on splitting the given tensor into nonzero-disjoint component tensors. Then a mode-dependent coarse-grain hypergraph is constructed for each component tensor. A net amalgamation operation is proposed to form a composite medium-grain hypergraph from these mode-dependent coarse-grain hypergraphs to correctly encapsulate the minimization of the communication volume. We propose a heuristic which splits the nonzeros of dense slices to obtain sparse slices in component tensors. So we partially attain slice coherency at (sub)slice level since partitioning is performed on (sub)slices instead of individual nonzeros. We also utilize the well-known recursive-bipartitioning framework to improve the quality of the splitting heuristic. Finally, we propose a medium-grain tripartite graph model with the aim of a faster partitioning at the expense of increasing the total communication volume. Parallel experiments conducted on 10 real-world tensors on up to 1024 processors confirm the validity of the proposed hypergraph and graph models.
Open Access
Scalable unsupervised ML: Latency hiding in distributed sparse tensor decomposition
(IEEE Computer Society, 2022-11-01) Abubaker, Nabil; Karsavuran, M. Ozan; Aykanat, Cevdet
Latency overhead in distributed-memory parallel CPD-ALS scales with the number of processors, limiting the scalability of computing CPD of large irregularly sparse tensors. This overhead comes in the form of sparse reduce and expand operations performed on factor-matrix rows via point-to-point messages. We propose to hide the latency overhead through embedding all of the point-to-point messages incurred by the sparse reduce and expand into dense collective operations which already exist in the CPD-ALS. The conventional parallel CPD-ALS algorithm is not amenable for embedding so we propose a computation/communication rearrangement to enable the embedding. We embed the sparse expand and reduce into a hypercube-based ALL-REDUCE operation to limit the latency overhead to Oðlog 2KÞ for a K-processor system. The embedding comes with the cost of increased bandwidth overhead due to the multi-hop routing of factor-matrix rows during the embedded-ALL-REDUCE. We propose an embedding scheme that takes advantage of the expand/reduce properties to reduce this overhead. Furthermore, we propose a novel recursive bipartitioning framework that enables simultaneous hypergraph partitioning and subhypergraph-to-subhypercube mapping to achieve subtensor-to-processor assignment with the objective of reducing the bandwidth overhead during the embedded-ALL-REDUCE. We also propose a bin-packing-based algorithm for factor-matrix row to processor assignment aiming at reducing processors’ maximum send and receive volumes during the embedded-ALL-REDUCE. Experiments on up to 4096 processors show that the proposed framework scales significantly better than the state-of-the-art point-to-point method.