Parafac-spark: parallel tensor decompositions on spark
Bekçe, Selim Eren
Item Usage Stats
MetadataShow full item record
Tensors are higher order matrices, widely used in many data science applications and scienti c disciplines. The Canonical Polyadic Decomposition (also known as CPD/PARAFAC) is a widely adopted tensor factorization to discover and extract latent features of tensors usually applied via alternating squares (ALS) method. Developing e cient parallelization methods of PARAFAC on commodity clusters is important because as common tensor sizes reach billions of nonzeros, a naive implementation would require infeasibly huge intermediate memory sizes. Implementations of PARAFAC-ALS on shared and distributedmemory systems are available, but these systems require expensive cluster setups, are too low level, not compatible with modern tooling and not fault tolerant by design. Many companies and data science communities widely prefer Apache Spark, a modern distributed computing framework with in-memory caching, and Hadoop ecosystem of tools for their ease of use, compatibility, ability to run on commodity hardware and fault tolerance. We developed PARAFAC-SPARK, an e cient, parallel, open-source implementation of PARAFAC on Spark, written in Scala. It can decompose 3D tensors stored in common coordinate format in parallel with low memory footprint by partitioning them as grids and utilizing compressed sparse rows (CSR) format for e cient traversals. We followed and combined many of the algorithmic and methodological improvements of its predecessor implementations on Hadoop and distributed memory, and adapted them for Spark. During the kernel MTTKRP operation, by applying a multi-way dynamic partitioning scheme, we were also able to increase the number of reducers to be on par with the number of cores to achieve better utilization and reduced memory footprint. We ran PARAFAC-SPARK with some real world tensors and evaluated the e ectiveness of each improvement as a series of variants compared with each other, as well as with some synthetically generated tensors up to billions of rows to measure its scalability. Our fastest variant (PS-CSRSX ) is up to 67% faster than our baseline Spark implementation (PS-COO) and up to 10 times faster than the state of art Hadoop implementations.