Minimizing staleness and communication overhead in distributed SGD for collaborative filtering

Abubaker, Nabil; Caglayan, O.; Karsavuran, M. O.; Aykanat, Cevdet

Minimizing staleness and communication overhead in distributed SGD for collaborative filtering

Files

Minimizing_staleness_and_communication_overhead_in_distributed_SGD_for_collaborative_filtering.pdf (1.42 MB)

Date

2023-09-06

Authors

BUIR Usage Stats

18
views

23
downloads

Citation Stats

Abstract

Distributed asynchronous stochastic gradient descent (ASGD) algorithms that approximate low-rank matrix factorizations for collaborative filtering perform one or more synchronizations per epoch where staleness is reduced with more synchronizations. However, high number of synchronizations would prohibit the scalability of the algorithm. We propose a parallel ASGD algorithm, η-PASGD, for efficiently handling η synchronizations per epoch in a scalable fashion. The proposed algorithm puts an upper limit of KK on η, for a KK-processor system, such that performing Kη=K synchronizations per epoch would eliminate the staleness completely. The rating data used in collaborative filtering are usually represented as sparse matrices. The sparsity allows for reduction in the staleness and communication overhead combinatorially via intelligently distributing the data to processors. We analyze the staleness and the total volume incurred during an epoch of η-PASGD. Following this analysis, we propose a hypergraph partitioning model to encapsulate reducing staleness and volume while minimizing the maximum number of synchronizations required for a stale-free SGD. This encapsulation is achieved with a novel cutsize metric that is realized via a new recursive-bipartitioning-based algorithm. Experiments on up to 512 processors show the importance of the proposed partitioning method in improving staleness, volume, RMSE and parallel runtime.

Source Title

IEEE Transactions on Computers

Publisher

IEEE Computer Society

Keywords

Recommender systems, Collaborative filtering, Matrix completion, Distributed-memory parallel stochastic gradient descent, Communication-efficient algorithms,, MPI, Hypergraph partitioning

Permalink

https://hdl.handle.net/11693/114904

Published Version (Please cite this version)

https://dx.doi.org/10.1109/TC.2023.3275107

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

Minimizing staleness and communication overhead in distributed SGD for collaborative filtering

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Minimizing staleness and communication overhead in distributed SGD for collaborative filtering

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type