True load balancing for Matricized Tensor Times Khatri-Rao Product
buir.contributor.author | Abubaker, Nabil | |
buir.contributor.author | Aykanat, Cevdet | |
buir.contributor.orcid | Abubaker, Nabil|0000-0002-5060-3059 | |
buir.contributor.orcid | Aykanat, Cevdet|0000-0002-4559-1321 | |
dc.citation.epage | 1986 | en_US |
dc.citation.issueNumber | 8 | en_US |
dc.citation.spage | 1974 | en_US |
dc.citation.volumeNumber | 32 | en_US |
dc.contributor.author | Abubaker, Nabil | |
dc.contributor.author | Aykanat, Cevdet | |
dc.contributor.author | Acer, S. | |
dc.date.accessioned | 2022-01-31T11:22:12Z | |
dc.date.available | 2022-01-31T11:22:12Z | |
dc.date.issued | 2021-01-22 | |
dc.department | Department of Computer Engineering | en_US |
dc.description.abstract | MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes. | en_US |
dc.description.provenance | Submitted by Evrim Ergin (eergin@bilkent.edu.tr) on 2022-01-31T11:22:12Z No. of bitstreams: 1 True_load_balancing_for_Matricized_Tensor_Times_Khatri-Rao_Product.pdf: 1602420 bytes, checksum: dfec07c5cd0d8ec5398b2ad2ede3e584 (MD5) | en |
dc.description.provenance | Made available in DSpace on 2022-01-31T11:22:12Z (GMT). No. of bitstreams: 1 True_load_balancing_for_Matricized_Tensor_Times_Khatri-Rao_Product.pdf: 1602420 bytes, checksum: dfec07c5cd0d8ec5398b2ad2ede3e584 (MD5) Previous issue date: 2021-01-22 | en |
dc.identifier.doi | 10.1109/TPDS.2021.3053836 | en_US |
dc.identifier.eissn | 1558-2183 | |
dc.identifier.issn | 1045-9219 | |
dc.identifier.uri | http://hdl.handle.net/11693/76913 | |
dc.language.iso | English | en_US |
dc.publisher | IEEE | en_US |
dc.relation.isversionof | https://doi.org/10.1109/TPDS.2021.3053836 | en_US |
dc.source.title | IEEE Transactions on Parallel and Distributed Systems | en_US |
dc.subject | Load balancing | en_US |
dc.subject | Sparse tensors | en_US |
dc.subject | MTTKRP | en_US |
dc.subject | CP decomposition | en_US |
dc.subject | Fine-grain hypergraph partitioning | en_US |
dc.title | True load balancing for Matricized Tensor Times Khatri-Rao Product | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- True_load_balancing_for_Matricized_Tensor_Times_Khatri-Rao_Product.pdf
- Size:
- 1.53 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.69 KB
- Format:
- Item-specific license agreed upon to submission
- Description: