Minimizing staleness and communication overhead in distributed SGD for collaborative filtering

Abubaker, Nabil; Caglayan, O.; Karsavuran, M. O.; Aykanat, Cevdet

Minimizing staleness and communication overhead in distributed SGD for collaborative filtering

buir.contributor.author	Abubaker, Nabil
buir.contributor.author	Aykanat, Cevdet
buir.contributor.orcid	Abubaker, Nabil\|0000-0002-5060-3059
buir.contributor.orcid	Aykanat, Cevdet\|0000-0002-4559-1321
dc.citation.epage	2937	en_US
dc.citation.issueNumber	10
dc.citation.spage	2925
dc.citation.volumeNumber	72
dc.contributor.author	Abubaker, Nabil
dc.contributor.author	Caglayan, O.
dc.contributor.author	Karsavuran, M. O.
dc.contributor.author	Aykanat, Cevdet
dc.date.accessioned	2024-03-18T13:10:40Z
dc.date.available	2024-03-18T13:10:40Z
dc.date.issued	2023-09-06
dc.department	Department of Computer Engineering
dc.description.abstract	Distributed asynchronous stochastic gradient descent (ASGD) algorithms that approximate low-rank matrix factorizations for collaborative filtering perform one or more synchronizations per epoch where staleness is reduced with more synchronizations. However, high number of synchronizations would prohibit the scalability of the algorithm. We propose a parallel ASGD algorithm, η-PASGD, for efficiently handling η synchronizations per epoch in a scalable fashion. The proposed algorithm puts an upper limit of KK on η, for a KK-processor system, such that performing Kη=K synchronizations per epoch would eliminate the staleness completely. The rating data used in collaborative filtering are usually represented as sparse matrices. The sparsity allows for reduction in the staleness and communication overhead combinatorially via intelligently distributing the data to processors. We analyze the staleness and the total volume incurred during an epoch of η-PASGD. Following this analysis, we propose a hypergraph partitioning model to encapsulate reducing staleness and volume while minimizing the maximum number of synchronizations required for a stale-free SGD. This encapsulation is achieved with a novel cutsize metric that is realized via a new recursive-bipartitioning-based algorithm. Experiments on up to 512 processors show the importance of the proposed partitioning method in improving staleness, volume, RMSE and parallel runtime.
dc.identifier.doi	10.1109/TC.2023.3275107	en_US
dc.identifier.eissn	1557-9956	en_US
dc.identifier.issn	0018-9340	en_US
dc.identifier.uri	https://hdl.handle.net/11693/114904	en_US
dc.language.iso	English	en_US
dc.publisher	IEEE Computer Society	en_US
dc.relation.isversionof	https://dx.doi.org/10.1109/TC.2023.3275107
dc.source.title	IEEE Transactions on Computers
dc.subject	Recommender systems
dc.subject	Collaborative filtering
dc.subject	Matrix completion
dc.subject	Distributed-memory parallel stochastic gradient descent
dc.subject	Communication-efficient algorithms,
dc.subject	MPI
dc.subject	Hypergraph partitioning
dc.title	Minimizing staleness and communication overhead in distributed SGD for collaborative filtering
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Minimizing_staleness_and_communication_overhead_in_distributed_SGD_for_collaborative_filtering.pdf
Size:: 1.42 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.01 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering