Show simple item record

dc.contributor.authorSelvitopi, R. O.en_US
dc.contributor.authorOzdal, M. M.en_US
dc.contributor.authorAykanat, Cevdeten_US
dc.date.accessioned2016-02-08T09:59:04Z
dc.date.available2016-02-08T09:59:04Z
dc.date.issued2015en_US
dc.identifier.issn1045-9219
dc.identifier.urihttp://hdl.handle.net/11693/22358
dc.description.abstractIn parallel linear iterative solvers, sparse matrix vector multiplication (SpMxV) incurs irregular point-to-point (P2P) communications, whereas inner product computations incur regular collective communications. These P2P communications cause an additional synchronization point with relatively high message latency costs due to small message sizes. In these solvers, each SpMxV is usually followed by an inner product computation that involves the output vector of SpMxV. Here, we exploit this property to propose a novel parallelization method that avoids the latency costs and synchronization overhead of P2P communications. Our method involves a computational and a communication rearrangement scheme. The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase. The communication rearrangement realizes this opportunity by embedding P2P communications into global collective communication operations. The proposed method grants a certain value on the maximum number of messages communicated regardless of the sparsity pattern of the matrix. The downside, however, is the increased message volume and the negligible redundant computation. We favor reducing the message latency costs at the expense of increasing message volume. Yet, we propose two iterative-improvement-based heuristics to alleviate the increase in the volume through one-to-one task-to-processor mapping. Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2,048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional parallelization method.en_US
dc.language.isoEnglishen_US
dc.source.titleIEEE Transactions on Parallel and Distributed Systemsen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/TPDS.2014.2311804en_US
dc.subjectAvoiding latencyen_US
dc.subjectConjugate gradienten_US
dc.subjectInner product computationen_US
dc.subjectIterative improvement heuristicen_US
dc.subjectMessage latency overheaden_US
dc.subjectConjugate gradient methoden_US
dc.subjectCostsen_US
dc.subjectMatrix algebraen_US
dc.subjectParallel processing systemsen_US
dc.subjectSupercomputersen_US
dc.subjectVectorsen_US
dc.subjectAvoiding latencyen_US
dc.subjectCollective communicationsen_US
dc.subjectHiding latencyen_US
dc.subjectInner producten_US
dc.subjectIterative improvementsen_US
dc.subjectIterative solversen_US
dc.subjectMessage latencyen_US
dc.subjectPoint-to-point communicationen_US
dc.subjectSparse matrix-vector multiplicationen_US
dc.subjectIterative methodsen_US
dc.titleA novel method for scaling iterative solvers: avoiding latency overhead of parallel sparse-matrix vector multipliesen_US
dc.typeArticleen_US
dc.departmentDepartment of Computer Engineeringen_US
dc.citation.spage632en_US
dc.citation.epage645en_US
dc.citation.volumeNumber26en_US
dc.citation.issueNumber3en_US
dc.identifier.doi10.1109/TPDS.2014.2311804en_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.contributor.bilkentauthorAykanat, Cevdet


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record