Browsing by Subject "Hypergraph Partitioning"
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Item Open Access A constructive multi-way circuit partitioning algorithm based on minimum degree ordering(1994) Çatalyürek, Ümit VCircuit partitioning has many important applications in VLSI. Circuit partitioning problem can be most properly modeled as hypergraph partitioning. In this work, we propose a novel k-v/ay hypergraph partitioning heuristic using the Minimum Degree (MD) ordering which is a well-known heuristic for reducing the amount of fills in the factorization of symmetric sparse matrices. The proposed algorithm operates on the dual graph of the given hypergraph. The algorithm grows node-clusters on the dual graph which induce cell-clusters with locally minimum net-cut sizes. The quotient graph concept, widely used in MD ordering, is exploited for the sake of efficient implementation. The proposed algorithm outperforms well-known heuristics, such as Kernighan-Lin (KL) based algorithms and Simulated Annealing, in terms of solution quality on various VLSI benchmark circuits. A nice property of the proposed algorithm is that its execution time reduces with increasing k as opposed to the existing iterative heuristics. It is even faster than the fast KL-based algorithms on the partitioning of the benchmark circuits for k > 16.Item Open Access Decomposing linear programs for parallel solution(1996) Pınar, AliMany current research efforts are based on better exploitation of sparsity— common in most large scaled problems—for computational efEciency. This work proposes different methods for permuting sparse matrices to block angular form with specified number of equal sized blocks for efficient parallelism. The problem has applications in linear programming, where there is a lot of work on the solution of problems with existing block angular structure. However, these works depend on the existing block angular structure of the matrix, and hence suffer from unscalability. We propose two hypergraph models for decomposition, and these models reduce the problem to the well-known hypergraph partitioning problem. We also propose a graph model, which reduces the problem to the graph partitioning by node separator problem. We were able to decompose very large problems, the results are quite attractive both in terms solution quality and running times.Item Open Access Hypergraph partitioning based models and methods for exploiting cache locality in sparse matrix-vector multiplication(Society for Industrial and Applied Mathematics, 2013-02-27) Akbudak, K.; Kayaaslan, E.; Aykanat, CevdetSparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers. Matrices with irregular sparsity patterns make it difficult to utilize cache locality effectively in SpMxV computations. In this work, we investigate single-and multiple-SpMxV frameworks for exploiting cache locality in SpMxV computations. For the single-SpMxV framework, we propose two cache-size-aware row/column reordering methods based on one-dimensional (1D) and two-dimensional (2D) top-down sparse matrix partitioning. We utilize the column-net hypergraph model for the 1D method and enhance the row-column-net hypergraph model for the 2D method. The primary aim in both of the proposed methods is to maximize the exploitation of temporal locality in accessing input vector entries. The multiple-SpMxV framework depends on splitting a given matrix into a sum of multiple nonzero-disjoint matrices. We propose a cache-size-aware splitting method based on 2D top-down sparse matrix partitioning by utilizing the row-column-net hypergraph model. The aim in this proposed method is to maximize the exploitation of temporal locality in accessing both input-and output-vector entries. We evaluate the validity of our models and methods on a wide range of sparse matrices using both cache-miss simulations and actual runs by using OSKI. Experimental results show that proposed methods and models outperform state-of-the-art schemes. (c)2013 Society for Industrial and Applied MathematicsItem Open Access Increasing data reuse in parallel sparse matrix-vector and matrix-transpose-vector multiply on shared-memory architectures(2014) Karsavuran, Mustafa OzanSparse matrix-vector and matrix-transpose-vector multiplications (Sparse AAT x) are the kernel operations used in iterative solvers. Sparsity pattern of the input matrix A, as well as its transpose, remains the same throughout the iterations. CPU cache could not be used properly during these Sparse AAT x operations due to irregular sparsity pattern of the matrix. We propose two parallelization strategies for Sparse AAT x. Our methods partition A matrix in order to exploit cache locality for matrix nonzeros and vector entries. We conduct experiments on the recently-released Intel R Xeon PhiTM coprocessor involving large variety of sparse matrices. Experimental results show that proposed methods achieve higher performance improvement than the state-of-the-art methods in the literature.Item Open Access Locality aware reordering for sparse triangular solve(2014) Torun, TuğbaSparse Triangular Solve (SpTS) is a commonly used kernel in a wide variety of scientific and engineering applications. Efficient implementation of this kernel on current architectures that involve deep cache hierarchy is crucial for attaining high performance. In this work, we propose an effective framework for cache-aware SpTS. Solution of sparse linear symmetric systems utilizing the direct methods require the triangular solve of the form LUz = b, where L is lower triangular factor and U is upper triangular factor. For cache utilization, we reorder the rows and columns of the L factor regarding the data dependencies of the triangular solve. We represent the data dependencies of the triangular solve as a directed hypergraph and construct an ordered partitioning model on this structure. For this purpose, we developed a variant of Fiduccia-Mattheyses (FM) algorithm which respects the dependency constraints. We also adopt the idea of splitting L factors into dense and sparse components and solving them seperately with different autotuned kernels for achieving more flexibility in this process. We investigate the performance variation of different storage schemes of L factors and the corresponding sparse and dense components. We utilize autotuning provided by Optimized Sparse Kernel Interface (OSKI) to reduce performance degradation that incurs due to the gap between processors and memory speeds. Experiments performed on real-world datasets verify the effectiveness of the proposed framework.Item Open Access Partitioning sparse rectangular matrices for parallel computing of AAtX(1999-09) Uçar, BoraMany scientific applications involve repeated sparse matrix-vector and matrixtranspose-vector product computations. Graph and hypergraph partitioning based approaches used in the literature aim at minimizing the total communication volume while maintaining computational load balance through one dimensional partitioning of sparse matrices. In this thesis, we consider two approaches which consider minimizing both the total message count and communication volume while maintaining balance on the communication loads of the processors. Two communication schemes are investigated for the fold and expand operations needed in the parallel algorithm. For the global communication scheme, we show that the problem of minimizing concurrent communication volume can be formulated as the problem of permuting the sparse matrix into a singly-bordered block-diagonal form, where the total and concurrent message count is determined by the interconnection topology. For the personalized communication scheme, a two stage approach is proposed. In the first stage, the total communication volume is minimized while maintaining balance on the computational loads of the processors. In the second stage, a novel communication hypergraph model is proposed which enables the minimization of the total message count while maintaining balance on the communication loads of the processors through hypergraph-partitioning-like methods. The solution methods are tested on various matrices and results, which are quite attractive in terms of solution quality and running times, are obtained.Item Open Access Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication(Society for Industrial and Applied Mathematics, 2014-10-23) Akbudak K.; Aykanat, CevdetFFor outer-product-parallel sparse matrix-matrix multiplication (SpGEMM) of the form C=A×B, we propose three hypergraph models that achieve simultaneous partitioning of input and output matrices without any replication of input data. All three hypergraph models perform conformable one-dimensional (1D) columnwise and 1D rowwise partitioning of the input matrices A and B, respectively. The first hypergraph model performs two-dimensional (2D) nonzero-based partitioning of the output matrix, whereas the second and third models perform 1D rowwise and 1D columnwise partitioning of the output matrix, respectively. This partitioning scheme induces a two-phase parallel SpGEMM algorithm, where communication-free local SpGEMM computations constitute the first phase and the multiple single-node-accumulation operations on the local SpGEMM results constitute the second phase. In these models, the two partitioning constraints defined on weights of vertices encode balancing computational loads of processors during the two separate phases of the parallel SpGEMM algorithm. The partitioning objective of minimizing the cutsize defined over the cut nets encodes minimizing the total volume of communication that will occur during the second phase of the parallel SpGEMM algorithm. An MPI-based parallel SpGEMM library is developed to verify the validity of our models in practice. Parallel runs of the library for a wide range of realistic SpGEMM instances on two large-scale parallel systems JUQUEEN (an IBM BlueGene/Q system) and SuperMUC (an Intel-based cluster) show that the proposed hypergraph models attain high speedup values. © 2014 Society for Industrial and Applied Mathematics.