High level synthesis based FPGA implementation of Matricized Tensor Times Khatri-Rao Product to accelerate canonical polyadic decomposition
Item Usage Stats
Tensor factorization has many applications such as network anomaly detection, structural damage detection and music genre classification. Most time consuming part of the CPD-ALS based tensor factorization is the Matricized Tensor Times Khatri-Rao Product (MTTKRP). In this thesis, the goal was to show that an FPGA implementation of the MTTKRP kernel can be comparable with the state of the art software implementations. To achieve this goal, a flat design consisting of a single loop is developed using Vivado HLS. In order to process the large tensors with the limited BRAM capacity of the FPGA board, a tiling methodology with optimized processing order is introduced. It has been shown that tiling has a negative impact on the general performance because of increasing DRAM access per subtensor. On the other hand, with the minimum tiling possible to process the tensors, the FPGA implementation achieves up to 3.40 speedup against the single threaded software.