Exploiting locality in sparse matrix-matrix multiplication on many-core rchitectures
Exploiting spatial and temporal localities is investigated for efficient row-by-row parallelization of general sparse matrix-matrix multiplication (SpGEMM) operation of the form C=A,B on many-core architectures. Hypergraph and bipartite graph models are proposed for 1D rowwise partitioning of matrix A to evenly partition the work across threads with the objective of reducing the number of B-matrix words to be transferred from the memory and between different caches. A hypergraph model is proposed for B-matrix column reordering to exploit spatial locality in accessing entries of thread-private temporary arrays, which are used to accumulate results for C-matrix rows. A similarity graph model is proposed for B-matrix row reordering to increase temporal reuse of these accumulation array entries. The proposed models and methods are tested on a wide range of sparse matrices from real applications and the experiments were carried on a 60-core Intel Xeon Phi processor, as well as a two-socket Xeon processor. Results show the validity of the models and methods proposed for enhancing the locality in parallel SpGEMM operations. © 1990-2012 IEEE.