Increasing data reuse in parallel sparse matrix-vector and matrix-transpose-vector multiply on shared-memory architectures
Embargo Lift Date: 2016-09-05
Karsavuran, Mustafa Ozan
Item Usage Stats
MetadataShow full item record
Sparse matrix-vector and matrix-transpose-vector multiplications (Sparse AAT x) are the kernel operations used in iterative solvers. Sparsity pattern of the input matrix A, as well as its transpose, remains the same throughout the iterations. CPU cache could not be used properly during these Sparse AAT x operations due to irregular sparsity pattern of the matrix. We propose two parallelization strategies for Sparse AAT x. Our methods partition A matrix in order to exploit cache locality for matrix nonzeros and vector entries. We conduct experiments on the recently-released Intel R Xeon PhiTM coprocessor involving large variety of sparse matrices. Experimental results show that proposed methods achieve higher performance improvement than the state-of-the-art methods in the literature.
KeywordsIntel Many Integrated Core Architecture (Intel MIC)
Intel Xeon Phi
Sparse Matrix-Vector Multiplication
Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication
Showing items related by title, author, creator and subject.
Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies Uçar, B.; Aykanat, Cevdet (SIAM, 2004)This paper addresses the problem of one-dimensional partitioning of structurally unsymmetric square and rectangular sparse matrices for parallel matrix-vector and matrix-transpose-vector multiplies. The objective is to ...
Locality-aware parallel sparse matrix-vector and matrix-transpose-vector multiplication on many-core processors Karsavuran, M. O.; Akbudak K.; Aykanat, Cevdet (Institute of Electrical and Electronics Engineers, 2016)Sparse matrix-vector and matrix-transpose-vector multiplication (SpMMTV) repeatedly performed as z ← ATx and y ← A z (or y ← A w) for the same sparse matrix A is a kernel operation widely used in various iterative solvers. ...
Pınar, A.; Aykanat, Cevdet (Academic Press, 2004)The one-dimensional decomposition of nonuniform workload arrays with optimal load balancing is investigated. The problem has been studied in the literature as the "chains-on-chains partitioning" problem. Despite the rich ...