Browsing by Subject "Matrix reordering"

Now showing 1 - 3 of 3

Open Access
Cache locality exploiting methods and models for sparse matrix-vector multiplication
(2009) Akbudak, Kadir
The sparse matrix-vector multiplication (SpMxV) is an important kernel operation widely used in linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers to solve a system of linear equations. High performance gains can be obtained if we can take the advantage of today’s deep cache hierarchy in SpMxV operations. Matrices with irregular sparsity patterns make it difficult to utilize data locality effectively in SpMxV computations. Different techniques are proposed in the literature to utilize cache hierarchy effectively via exploiting data locality during SpMxV. In this work, we investigate two distinct frameworks for cacheaware/oblivious SpMxV: single matrix-vector multiply and multiple submatrix-vector multiplies. For the single matrix-vector multiply framework, we propose a cache-size aware top-down row/column-reordering approach based on 1D sparse matrix partitioning by utilizing the recently proposed appropriate hypergraph models of sparse matrices, and a cache oblivious bottom-up approach based on hierarchical clustering of rows/columns with similar sparsity patterns. We also propose a column compression scheme as a preprocessing step which makes these two approaches cache-line-size aware. The multiple submatrix-vector multiplies framework depends on the partitioning the matrix into multiple nonzero-disjoint submatrices. For an effective matrixto-submatrix partitioning required in this framework, we propose a cache-size aware top-down approach based on 2D sparse matrix partitioning by utilizing the recently proposed fine-grain hypergraph model. For this framework, we also propose a traveling salesman formulation for an effective ordering of individual submatrix-vector multiply operations. We evaluate the validity of our models and methods on a wide range of sparse matrices. Experimental results show that proposed methods and models outperforms state-of-the-art schemes.
Open Access
Locality-aware parallel sparse matrix-vector and matrix-transpose-vector multiplication on many-core processors
(Institute of Electrical and Electronics Engineers, 2016) Karsavuran, M. O.; Akbudak K.; Aykanat, Cevdet
Sparse matrix-vector and matrix-transpose-vector multiplication (SpMMTV) repeatedly performed as z ← ATx and y ← A z (or y ← A w) for the same sparse matrix A is a kernel operation widely used in various iterative solvers. One important optimization for serial SpMMTV is reusing A-matrix nonzeros, which halves the memory bandwidth requirement. However, thread-level parallelization of SpMMTV that reuses A-matrix nonzeros necessitates concurrent writes to the same output-vector entries. These concurrent writes can be handled in two ways: via atomic updates or thread-local temporary output vectors that will undergo a reduction operation, both of which are not efficient or scalable on processors with many cores and complicated cache-coherency protocols. In this work, we identify five quality criteria for efficient and scalable thread-level parallelization of SpMMTV that utilizes one-dimensional (1D) matrix partitioning. We also propose two locality-aware 1D partitioning methods, which achieve reusing A-matrix nonzeros and intermediate z-vector entries; exploiting locality in accessing x -, y -, and -vector entries; and reducing the number of concurrent writes to the same output-vector entries. These two methods utilize rowwise and columnwise singly bordered block-diagonal (SB) forms of A. We evaluate the validity of our methods on a wide range of sparse matrices. Experiments on the 60-core cache-coherent Intel Xeon Phi processor show the validity of the identified quality criteria and the validity of the proposed methods in practice. The results also show that the performance improvement from reusing A-matrix nonzeros compensates for the overhead of concurrent writes through the proposed SB-based methods.
Open Access
Reordering methods for exploiting spatial and temporal localities in parallel sparse matrix-vector multiplication
(2016-08) AbuBaker, Nabil
Sparse Matrix-Vector multiplication (SpMV) is a very important kernel operation for many scientific applications. For irregular sparse matrices, the SpMV operation suffers from poor cache performance due to the irregular accesses of the input vector entries. In this work, we propose row and column reordering methods based on Graph partitioning (GP) and Hypergraph partitioning (HP) in order to exploit spatial and temporal localities in accessing input vector entries by clustering rows/columns with a similar sparsity pattern close to each other. The proposed methods exploit spatial and temporal localities separately (using either rows or columns of the matrix in a GP or HP method), simultaneously (using both rows and column) and in a two-phased manner(using either rows or columns in each phase). We evaluate the validity of the proposed models on a 60- core Xeon Phi co-processor for a large set of sparse matrices arising from different applications. The performance results confirm the validity and the effectiveness of the proposed methods and models.