Novel algorithms and models for scaling parallel sparse tensor and matrix factorizations

buir.advisorAykanat, Cevdet
dc.contributor.authorAbubaker, Nabil F. T.
dc.date.accessioned2022-08-29T11:36:39Z
dc.date.available2022-08-29T11:36:39Z
dc.date.copyright2022-07
dc.date.issued2022-07
dc.date.submitted2022-08-26
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (Ph.D.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2022.en_US
dc.descriptionIncludes bibliographical references (leaves 106-117).en_US
dc.description.abstractTwo important and widely-used factorization algorithms, namely CPD-ALS for sparse tensor decomposition and distributed stratified SGD for low-rank matrix factorization, suffer from limited scalability. In CPD-ALS, the computational load associated with a tensor/subtensor assigned to a processor is a function of the nonzero counts as well as the fiber counts of the tensor when the CSF stor-age is utilized. The tensor fibers fragment as a result of nonzero distributions, which makes balancing the computational loads a hard problem. Two strategies are proposed to tackle the balancing problem on an existing fine-grain hyper-graph model: a novel weighting scheme to cover the cost of fibers in the true load as well as an augmentation to the hypergraph with fiber nets to encode reducing the increase in computational load. CPD-ALS also suffers from high latency overhead due to the high number of point-to-point messages incurred as the processor count increases. A framework is proposed to limit the number of messages to O(log2 K), for a K-processor system, exchanged in log2 K stages. A hypergraph-based method is proposed to encapsulate the communication of the new log2 K-stage algorithm. In the existing stratified SGD implementations, the volume of communication is proportional to one of the dimensions of the input matrix and prohibits the scalability. Exchanging the essential data necessary for the correctness of the SSGD algorithm as point-to-point messages is proposed to reduce the volume. This, although invaluable for reducing the band-width overhead, would increase the upper bound on the number of exchanged messages from O(K) to O(K2) rendering the algorithm latency-bound. A novel Hold-and-Combine algorithm is proposed to exchange the essential communication volume with up to O(K logK) messages. Extensive experiments on HPC systems demonstrate the importance of the proposed algorithms and models in scaling CPD-ALS and stratified SGD.en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2022-08-29T11:36:39Z No. of bitstreams: 1 B161226.pdf: 2667322 bytes, checksum: 1d935dd3f8f7f55029a1a63a61c05bee (MD5)en
dc.description.provenanceMade available in DSpace on 2022-08-29T11:36:39Z (GMT). No. of bitstreams: 1 B161226.pdf: 2667322 bytes, checksum: 1d935dd3f8f7f55029a1a63a61c05bee (MD5) Previous issue date: 2022-07en
dc.description.statementofresponsibilityby Nabil F. T. Abubakeren_US
dc.embargo.release2023-01-26
dc.format.extentxiv, 117 leaves : charts ; 30 cm.en_US
dc.identifier.itemidB161226
dc.identifier.urihttp://hdl.handle.net/11693/110477
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectParallel algorithmsen_US
dc.subjectCombinatorial algorithmsen_US
dc.subjectHPCen_US
dc.subjectTensor decom-positionen_US
dc.subjectMatrix completionen_US
dc.subjectHypergraph partitioningen_US
dc.subjectCommunication cost minimizationen_US
dc.titleNovel algorithms and models for scaling parallel sparse tensor and matrix factorizationsen_US
dc.title.alternativeParalel seyrek tensör ve matris ayrışımı için yeni yöntem ve modelleren_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelDoctoral
thesis.degree.namePh.D. (Doctor of Philosophy)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B161226.pdf
Size:
2.54 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: