A novel method for scaling iterative solvers: avoiding latency overhead of parallel sparse-matrix vector multiplies

Selvitopi, R. O.; Ozdal, M. M.; Aykanat, Cevdet

A novel method for scaling iterative solvers: avoiding latency overhead of parallel sparse-matrix vector multiplies

Files

A novel method for scaling iterative solvers Avoiding latency overhead of parallel sparse-matrix vector multiplies.pdf (1.73 MB)

Date

2015

Authors

Selvitopi, R. O.

Ozdal, M. M.

Aykanat, Cevdet

BUIR Usage Stats

5
views

24
downloads

Citation Stats

Abstract

In parallel linear iterative solvers, sparse matrix vector multiplication (SpMxV) incurs irregular point-to-point (P2P) communications, whereas inner product computations incur regular collective communications. These P2P communications cause an additional synchronization point with relatively high message latency costs due to small message sizes. In these solvers, each SpMxV is usually followed by an inner product computation that involves the output vector of SpMxV. Here, we exploit this property to propose a novel parallelization method that avoids the latency costs and synchronization overhead of P2P communications. Our method involves a computational and a communication rearrangement scheme. The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase. The communication rearrangement realizes this opportunity by embedding P2P communications into global collective communication operations. The proposed method grants a certain value on the maximum number of messages communicated regardless of the sparsity pattern of the matrix. The downside, however, is the increased message volume and the negligible redundant computation. We favor reducing the message latency costs at the expense of increasing message volume. Yet, we propose two iterative-improvement-based heuristics to alleviate the increase in the volume through one-to-one task-to-processor mapping. Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2,048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional parallelization method.

Source Title

IEEE Transactions on Parallel and Distributed Systems

Publisher

Institute of Electrical and Electronics Engineers

Permalink

http://hdl.handle.net/11693/22358

Published Version (Please cite this version)

http://dx.doi.org/10.1109/TPDS.2014.2311804

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

A novel method for scaling iterative solvers: avoiding latency overhead of parallel sparse-matrix vector multiplies

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

A novel method for scaling iterative solvers: avoiding latency overhead of parallel sparse-matrix vector multiplies

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type