Novel algorithms and models for scaling parallel sparse tensor and matrix factorizations

Abubaker, Nabil F. T.

Novel algorithms and models for scaling parallel sparse tensor and matrix factorizations

Available

The embargo period has ended, and this item is now available.

Files

B161226.pdf (2.54 MB)

Date

2022-07

Authors

Abubaker, Nabil F. T.

Advisor

Aykanat, Cevdet

BUIR Usage Stats

10
views

66
downloads

Abstract

Two important and widely-used factorization algorithms, namely CPD-ALS for sparse tensor decomposition and distributed stratiﬁed SGD for low-rank matrix factorization, suﬀer from limited scalability. In CPD-ALS, the computational load associated with a tensor/subtensor assigned to a processor is a function of the nonzero counts as well as the ﬁber counts of the tensor when the CSF stor-age is utilized. The tensor ﬁbers fragment as a result of nonzero distributions, which makes balancing the computational loads a hard problem. Two strategies are proposed to tackle the balancing problem on an existing ﬁne-grain hyper-graph model: a novel weighting scheme to cover the cost of ﬁbers in the true load as well as an augmentation to the hypergraph with ﬁber nets to encode reducing the increase in computational load. CPD-ALS also suﬀers from high latency overhead due to the high number of point-to-point messages incurred as the processor count increases. A framework is proposed to limit the number of messages to O(log2 K), for a K-processor system, exchanged in log2 K stages. A hypergraph-based method is proposed to encapsulate the communication of the new log2 K-stage algorithm. In the existing stratiﬁed SGD implementations, the volume of communication is proportional to one of the dimensions of the input matrix and prohibits the scalability. Exchanging the essential data necessary for the correctness of the SSGD algorithm as point-to-point messages is proposed to reduce the volume. This, although invaluable for reducing the band-width overhead, would increase the upper bound on the number of exchanged messages from O(K) to O(K2) rendering the algorithm latency-bound. A novel Hold-and-Combine algorithm is proposed to exchange the essential communication volume with up to O(K logK) messages. Extensive experiments on HPC systems demonstrate the importance of the proposed algorithms and models in scaling CPD-ALS and stratiﬁed SGD.

Keywords

Parallel algorithms, Combinatorial algorithms, HPC, Tensor decom-position, Matrix completion, Hypergraph partitioning, Communication cost minimization

Degree Discipline

Computer Engineering

Degree Level

Doctoral

Degree Name

Ph.D. (Doctor of Philosophy)

Permalink

http://hdl.handle.net/11693/110477

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Novel algorithms and models for scaling parallel sparse tensor and matrix factorizations

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Novel algorithms and models for scaling parallel sparse tensor and matrix factorizations

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type