Locality aware reordering for sparse triangular solve
Item Usage Stats
MetadataShow full item record
Sparse Triangular Solve (SpTS) is a commonly used kernel in a wide variety of scientific and engineering applications. Efficient implementation of this kernel on current architectures that involve deep cache hierarchy is crucial for attaining high performance. In this work, we propose an effective framework for cache-aware SpTS. Solution of sparse linear symmetric systems utilizing the direct methods require the triangular solve of the form LUz = b, where L is lower triangular factor and U is upper triangular factor. For cache utilization, we reorder the rows and columns of the L factor regarding the data dependencies of the triangular solve. We represent the data dependencies of the triangular solve as a directed hypergraph and construct an ordered partitioning model on this structure. For this purpose, we developed a variant of Fiduccia-Mattheyses (FM) algorithm which respects the dependency constraints. We also adopt the idea of splitting L factors into dense and sparse components and solving them seperately with different autotuned kernels for achieving more flexibility in this process. We investigate the performance variation of different storage schemes of L factors and the corresponding sparse and dense components. We utilize autotuning provided by Optimized Sparse Kernel Interface (OSKI) to reduce performance degradation that incurs due to the gap between processors and memory speeds. Experiments performed on real-world datasets verify the effectiveness of the proposed framework.