Scalable unsupervised ML: Latency hiding in distributed sparse tensor decomposition

buir.contributor.authorAbubaker, Nabil
buir.contributor.authorKarsavuran, M. Ozan
buir.contributor.authorAykanat, Cevdet
buir.contributor.orcidAbubaker, Nabil|0000-0002-5060-3059
buir.contributor.orcidKarsavuran, M. Ozan|0000-0002-0298-3034
buir.contributor.orcidAykanat, Cevdet|0000-0002-4559-1321
dc.citation.epage3040en_US
dc.citation.issueNumber11en_US
dc.citation.spage3028en_US
dc.citation.volumeNumber33en_US
dc.contributor.authorAbubaker, Nabil
dc.contributor.authorKarsavuran, M. Ozan
dc.contributor.authorAykanat, Cevdet
dc.date.accessioned2023-02-24T14:17:58Z
dc.date.available2023-02-24T14:17:58Z
dc.date.issued2022-11-01
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractLatency overhead in distributed-memory parallel CPD-ALS scales with the number of processors, limiting the scalability of computing CPD of large irregularly sparse tensors. This overhead comes in the form of sparse reduce and expand operations performed on factor-matrix rows via point-to-point messages. We propose to hide the latency overhead through embedding all of the point-to-point messages incurred by the sparse reduce and expand into dense collective operations which already exist in the CPD-ALS. The conventional parallel CPD-ALS algorithm is not amenable for embedding so we propose a computation/communication rearrangement to enable the embedding. We embed the sparse expand and reduce into a hypercube-based ALL-REDUCE operation to limit the latency overhead to Oðlog 2KÞ for a K-processor system. The embedding comes with the cost of increased bandwidth overhead due to the multi-hop routing of factor-matrix rows during the embedded-ALL-REDUCE. We propose an embedding scheme that takes advantage of the expand/reduce properties to reduce this overhead. Furthermore, we propose a novel recursive bipartitioning framework that enables simultaneous hypergraph partitioning and subhypergraph-to-subhypercube mapping to achieve subtensor-to-processor assignment with the objective of reducing the bandwidth overhead during the embedded-ALL-REDUCE. We also propose a bin-packing-based algorithm for factor-matrix row to processor assignment aiming at reducing processors’ maximum send and receive volumes during the embedded-ALL-REDUCE. Experiments on up to 4096 processors show that the proposed framework scales significantly better than the state-of-the-art point-to-point method.en_US
dc.description.provenanceSubmitted by Cem Çağatay Akgün (cem.akgun@bilkent.edu.tr) on 2023-02-24T14:17:58Z No. of bitstreams: 1 Scalable_Unsupervised_ML_Latency_Hiding_in_Distributed_Sparse_Tensor_Decomposition.pdf: 923175 bytes, checksum: 8251a4fdd26ac74c5389231350586b99 (MD5)en
dc.description.provenanceMade available in DSpace on 2023-02-24T14:17:58Z (GMT). No. of bitstreams: 1 Scalable_Unsupervised_ML_Latency_Hiding_in_Distributed_Sparse_Tensor_Decomposition.pdf: 923175 bytes, checksum: 8251a4fdd26ac74c5389231350586b99 (MD5) Previous issue date: 2022-11-01en
dc.identifier.doi10.1109/TPDS.2021.3128827en_US
dc.identifier.issn10459219
dc.identifier.urihttp://hdl.handle.net/11693/111701
dc.language.isoEnglishen_US
dc.publisherIEEE Computer Societyen_US
dc.relation.isversionofhttps://dx.doi.org/10.1109/TPDS.2021.3128827en_US
dc.source.titleIEEE Transactions on Parallel and Distributed Systems (TPDS)en_US
dc.subjectSparse tensoren_US
dc.subjectTensor decompositionen_US
dc.subjectCANDECOMP/PARAFACen_US
dc.subjectCanonical polyadic decompositionen_US
dc.subjectLatency hidingen_US
dc.subjectEmbedded communicationen_US
dc.subjectCommunication costen_US
dc.subjectConcurrent communicationen_US
dc.subjectRecursive bipartitioningen_US
dc.subjectHypergraph partitioningen_US
dc.titleScalable unsupervised ML: Latency hiding in distributed sparse tensor decompositionen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Scalable_Unsupervised_ML_Latency_Hiding_in_Distributed_Sparse_Tensor_Decomposition.pdf
Size:
901.54 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: