BUIR Repository :: Browsing by Subject "SpGEMM"

Browsing by Subject "SpGEMM"

Now showing 1 - 7 of 7

Open Access
Cartesian partitioning models for 2D and 3D parallel SpGEMM algorithms
(IEEE, 2020) Demirci, Gündüz Vehbi; Aykanat, Cevdet
The focus is distributed-memory parallelization of sparse-general-matrix-multiplication (SpGEMM). Parallel SpGEMM algorithms are classified under one-dimensional (1D), 2D, and 3D categories denoting the number of dimensions by which the 3D sparse workcube representing the iteration space of SpGEMM is partitioned. Recently proposed successful 2D- and 3D-parallel SpGEMM algorithms benefit from upper bounds on communication overheads enforced by 2D and 3D cartesian partitioning of the workcube on 2D and 3D virtual processor grids, respectively. However, these methods are based on random cartesian partitioning and do not utilize sparsity patterns of SpGEMM instances for reducing the communication overheads. We propose hypergraph models for 2D and 3D cartesian partitioning of the workcube for further reducing the communication overheads of these 2D- and 3D- parallel SpGEMM algorithms. The proposed models utilize two- and three-phase partitioning that exploit multi-constraint hypergraph partitioning formulations. Extensive experimentation performed on 20 SpGEMM instances by using upto 900 processors demonstrate that proposed partitioning models significantly improve the scalability of 2D and 3D algorithms. For example, in 2D-parallel SpGEMM algorithm on 900 processors, the proposed partitioning model respectively achieves 85 and 42 percent decrease in total volume and total number of messages, leading to 1.63 times higher speedup compared to random partitioning, on average.
Open Access
Exploiting locality in sparse matrix-matrix multiplication on many-core rchitectures
(IEEE Computer Society, 2017) Akbudak K.; Aykanat, Cevdet
Exploiting spatial and temporal localities is investigated for efficient row-by-row parallelization of general sparse matrix-matrix multiplication (SpGEMM) operation of the form C=A,B on many-core architectures. Hypergraph and bipartite graph models are proposed for 1D rowwise partitioning of matrix A to evenly partition the work across threads with the objective of reducing the number of B-matrix words to be transferred from the memory and between different caches. A hypergraph model is proposed for B-matrix column reordering to exploit spatial locality in accessing entries of thread-private temporary arrays, which are used to accumulate results for C-matrix rows. A similarity graph model is proposed for B-matrix row reordering to increase temporal reuse of these accumulation array entries. The proposed models and methods are tested on a wide range of sparse matrices from real applications and the experiments were carried on a 60-core Intel Xeon Phi processor, as well as a two-socket Xeon processor. Results show the validity of the models and methods proposed for enhancing the locality in parallel SpGEMM operations. © 1990-2012 IEEE.
Open Access
Hypergraph models for parallel sparse matrix-matrix multiplication
(Bilkent University, 2015-09) Akbudak, Kadir
Multiplication of two sparse matrices (i.e., sparse matrix-matrix multiplication, which is abbreviated as SpGEMM) is a widely used kernel in many applications such as molecular dynamics simulations, graph operations, and linear programming. We identify parallel formulations of SpGEMM operation in the form of C = AB for distributed-memory architectures. Using these formulations, we propose parallel SpGEMM algorithms that have the multiplication and communication phases: The multiplication phase consists of local SpGEMM computations without any communication and the communication phase consists of transferring required input/output matrices. For these algorithms, three hypergraph models are proposed. These models are used to partition input and output matrices simultaneously. The input matrices A and B are partitioned in one dimension in all of these hypergraph models. The output matrix C is partitioned in two dimensions, which is nonzero-based in the rst hypergraph model, and it is partitioned in one dimension in the second and third models. In partitioning of these hypergraph models, the constraint on vertex weights corresponds to computational load balancing among processors for the multiplication phase of the proposed SpGEMM algorithms, and the objective, which is minimizing cutsize de ned in terms of costs of the cut hyperedges, corresponds to minimizing the communication volume due to transferring required matrix entries in the communication phase of the SpGEMM algorithms. We also propose models for reducing the total number of messages while maintaining balance on communication volumes handled by processors during the communication phase of the SpGEMM algorithms. An SpGEMM library for distributed memory architectures is developed in order to verify the empirical validity of our models. The library uses MPI (Message Passing Interface) for performing communication in the parallel setting. The developed SpGEMM library is run on SpGEMM instances from various realistic applications and the experiments are carried out on a large parallel IBM BlueGene/Q system, named JUQUEEN. In the experimentation of the proposed hypergraph models, high speedup values are observed.
Open Access
Partitioning models for scaling distributed graph computations
(Bilkent University, 2019-08) Demirci, Gündüz Vehbi
The focus of this thesis is intelligent partitioning models and methods for scaling the performance of parallel graph computations on distributed-memory systems. Distributed databases utilize graph partitioning to provide servers with data-locality and workload-balance. Some queries performed on a database may form cascades due to the queries triggering each other. The current partitioning methods consider the graph structure and logs of query workload. We introduce the cascade-aware graph partitioning problem with the objective of minimizing the overall cost of communication operations between servers during cascade processes. We propose a randomized algorithm that integrates the graph structure and cascade processes to use as input for large-scale partitioning. Experiments on graphs representing real social networks demonstrate the e ectiveness of the proposed solution in terms of the partitioning objectives. Sparse-general-matrix-multiplication (SpGEMM) is a key computational kernel used in scienti c computing and high-performance graph computations. We propose an SpGEMM algorithm for Accumulo database which enables high performance distributed parallelism through its iterator framework. The proposed algorithm provides write-locality and avoids scanning input matrices multiple times by utilizing Accumulo's batch scanning capability and node-level parallelism structures. We also propose a matrix partitioning scheme that reduces the total communication volume and provides a workload-balance among servers. Extensive experiments performed on both real-world and synthetic sparse matrices show that the proposed algorithm and matrix partitioning scheme provide signi cant performance improvements. Scalability of parallel SpGEMM algorithms are heavily communication bound. Multidimensional partitioning of SpGEMM's workload is essential to achieve higher scalability. We propose hypergraph models that utilize the arrangement of processors and also attain a multidimensional partitioning on SpGEMM's workload. Thorough experimentation performed on both realistic as well as synthetically generated SpGEMM instances demonstrates the e ectiveness of the proposed partitioning models.
Open Access
Partitioning models for scaling parallel sparse matrix-matrix multiplication
(Association for Computing Machinery, 2018) Akbudak, Kadir; Selvitopi, Oğuz; Aykanat, Cevdet
We investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a hypergraph model and a bipartite graph model for distributing SpGEMM computations based on one-dimensional (1D) partitioning of input matrices. We also propose a communication hypergraph model for each formulation for distributing communication operations. The computational graph and hypergraph models adopted in the first phase aim at minimizing the total message volume and balancing the computational loads of processors, whereas the communication hypergraph models adopted in the second phase aim at minimizing the total message count and balancing the message volume loads of processors. That is, the computational partitioning models reduce the bandwidth cost and the communication hypergraph models reduce the latency cost. Our extensive parallel experiments on up to 2048 processors for a wide range of realistic SpGEMM instances show that although the outer-product--parallel formulation scales better, the row-by-row-product--parallel formulation is more viable due to its significantly lower partitioning overhead and competitive scalability. For computational partitioning models, our experimental findings indicate that the proposed bipartite graph models are attractive alternatives to their hypergraph counterparts because of their lower partitioning overhead. Finally, we show that by reducing the latency cost besides the bandwidth cost through using the communication hypergraph models, the parallel SpGEMM time can be further improved up to 32%.
Open Access
Scaling sparse matrix-matrix multiplication in the accumulo database
(Springer, 2020) Demirci, Gündüz Vehbi; Aykanat, Cevdet
We propose and implement a sparse matrix-matrix multiplication (SpGEMM) algorithm running on top of Accumulo’s iterator framework which enables high performance distributed parallelism. The proposed algorithm provides write-locality while ingesting the output matrix back to database via utilizing row-by-row parallel SpGEMM. The proposed solution also alleviates scanning of input matrices multiple times by making use of Accumulo’s batch scanning capability which is used for accessing multiple ranges of key-value pairs in parallel. Even though the use of batch-scanning introduces some latency overheads, these overheads are alleviated by the proposed solution and by using node-level parallelism structures. We also propose a matrix partitioning scheme which reduces the total communication volume and provides a balance of workload among servers. The results of extensive experiments performed on both real-world and synthetic sparse matrices show that the proposed algorithm scales significantly better than the outer-product parallel SpGEMM algorithm available in the Graphulo library. By applying the proposed matrix partitioning, the performance of the proposed algorithm is further improved considerably.
Open Access
Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication
(Society for Industrial and Applied Mathematics, 2014-10-23) Akbudak K.; Aykanat, Cevdet
FFor outer-product-parallel sparse matrix-matrix multiplication (SpGEMM) of the form C=A×B, we propose three hypergraph models that achieve simultaneous partitioning of input and output matrices without any replication of input data. All three hypergraph models perform conformable one-dimensional (1D) columnwise and 1D rowwise partitioning of the input matrices A and B, respectively. The first hypergraph model performs two-dimensional (2D) nonzero-based partitioning of the output matrix, whereas the second and third models perform 1D rowwise and 1D columnwise partitioning of the output matrix, respectively. This partitioning scheme induces a two-phase parallel SpGEMM algorithm, where communication-free local SpGEMM computations constitute the first phase and the multiple single-node-accumulation operations on the local SpGEMM results constitute the second phase. In these models, the two partitioning constraints defined on weights of vertices encode balancing computational loads of processors during the two separate phases of the parallel SpGEMM algorithm. The partitioning objective of minimizing the cutsize defined over the cut nets encodes minimizing the total volume of communication that will occur during the second phase of the parallel SpGEMM algorithm. An MPI-based parallel SpGEMM library is developed to verify the validity of our models in practice. Parallel runs of the library for a wide range of realistic SpGEMM instances on two large-scale parallel systems JUQUEEN (an IBM BlueGene/Q system) and SuperMUC (an Intel-based cluster) show that the proposed hypergraph models attain high speedup values. © 2014 Society for Industrial and Applied Mathematics.

Browsing by Subject "SpGEMM"

Results Per Page

Sort Options