Browsing by Subject "Combinatorial scientific computing"

Now showing 1 - 11 of 11

Open Access
1.5D parallel sparse matrix-vector multiply
(Society for Industrial and Applied Mathematics, 2018) Kayaaslan, E.; Aykanat, Cevdet; Uçar, B.
There are three common parallel sparse matrix-vector multiply algorithms: 1D row-parallel, 1D column-parallel, and 2D row-column-parallel. The 1D parallel algorithms offer the advantage of having only one communication phase. On the other hand, the 2D parallel algorithm is more scalable, but it suffers from two communication phases. Here, we introduce a novel concept of heterogeneous messages where a heterogeneous message may contain both input-vector entries and partially computed output-vector entries. This concept not only leads to a decreased number of messages but also enables fusing the input- and output-communication phases into a single phase. These findings are exploited to propose a 1.5D parallel sparse matrix-vector multiply algorithm which is called local row-column-parallel. This proposed algorithm requires a constrained fine-grain partitioning in which each fine-grain task is assigned to the processor that contains either its input-vector entry, its output-vector entry, or both. We propose two methods to carry out the constrained fine-grain partitioning. We conduct our experiments on a large set of test matrices to evaluate the partitioning qualities and partitioning times of these proposed 1.5D methods.
Open Access
Hypergraph partitioning-based fill-reducing ordering for symmetric matrices
(Society for Industrial and Applied Mathematics, 2011) Çatalyürek, Ü. V.; Aykanat, Cevdet; Kayaaslan, E.
A typical first step of a direct solver for the lin ear system Mx = b is reordering of the symmetric matrix M to improve execution time and space requirements of the solution process. In this work, we propose a novel nested-dissection-based ordering approach that utilizes hypergraph partitioning. Our approach is based on the formulation of graph partitioning by vertex separator (GPVS) problem as a hypergraph partitioning problem. This new formulation is immune to deficiency of GPVS in a multilevel framework and hence enables better orderings. In matrix terms, our method relies on the existence of a structural factorization of the input M matrix in the form of M = AA T (or M = AD2AT). We show that the partitioning of the row-net hypergraph representation of the rectangular matrix A induces a GPVS of the standard graph representation of matrix M. In the absence of such factorization, we also propose simple, yet effective structural factorization techniques that are based on finding an edge clique cover of the standard graph representation of matrix M, and hence applicable to any arbitrary symmetric matrix M. Our experimental evaluation has shown that the proposed method achieves better ordering in comparison to state-of-the-art graph-based ordering tools even for symmetric matrices where structural M = AAT factorization is not provided as an input. For matrices coming from linear programming problems, our method enables even faster and better orderings. © 2011 Societ y for Industrial and Applied Mathematics.
Open Access
Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems
(Elsevier BV, 2016) Acer, S.; Selvitopi, O.; Aykanat, Cevdet
We propose a comprehensive and generic framework to minimize multiple and different volume-based communication cost metrics for sparse matrix dense matrix multiplication (SpMM). SpMM is an important kernel that finds application in computational linear algebra and big data analytics. On distributed memory systems, this kernel is usually characterized with its high communication volume requirements. Our approach targets irregularly sparse matrices and is based on both graph and hypergraph partitioning models that rely on the widely adopted recursive bipartitioning paradigm. The proposed models are lightweight, portable (can be realized using any graph and hypergraph partitioning tool) and can simultaneously optimize different cost metrics besides total volume, such as maximum send/receive volume, maximum sum of send and receive volumes, etc., in a single partitioning phase. They allow one to define and optimize as many custom volume-based metrics as desired through a flexible formulation. The experiments on a wide range of about thousand matrices show that the proposed models drastically reduce the maximum communication volume compared to the standard partitioning models that only address the minimization of total volume. The improvements obtained on volume-based partition quality metrics using our models are validated with parallel SpMM as well as parallel multi-source BFS experiments on two large-scale systems. For parallel SpMM, compared to the standard partitioning models, our graph and hypergraph partitioning models respectively achieve reductions of 14% and 22% in runtime, on average. Compared to the state-of-the-art partitioner UMPa, our graph model is overall 14.5 ï¿½ faster and achieves an average improvement of 19% in the partition quality on instances that are bounded by maximum volume. For parallel BFS, we show on graphs with more than a billion edges that the scalability can significantly be improved with our models compared to a recently proposed two-dimensional partitioning model.
Open Access
Latency-centric models and methods for scaling sparse operations
(2016-07) Selvitopi, Oğuz
Parallelization of sparse kernels and operations on large-scale distributed memory systems remains as a major challenge due to ever-increasing scale of modern high performance computing systems and multiple con icting factors that affect the parallel performance. The low computational density and high memory footprint of sparse operations add to these challenges by implying more stressed communication bottlenecks and make fast and effcient parallelization models and methods imperative for scalable performance. Sparse operations are usually performed with structures related to sparse matrices and matrices are partitioned prior to the execution for distributing computations among processors. Although the literature is rich in this aspect, it still lacks the techniques that embrace multiple factors affecting communication performance in a complete and just manner. In this thesis, we investigate models and methods for intelligent partitioning of sparse matrices that strive for achieving a more correct approximation of the communication performance. To improve the communication performance of parallel sparse operations, we mainly focus on reducing the latency bottlenecks, which stand as a major component in the overall communication cost. Besides these, our approaches consider already adopted communication cost metrics in the literature as well and aim to address as many cost metrics as possible. We propose one-phase and two-phase partitioning models to reduce the latency cost in one-dimensional (1D) and two-dimensional (2D) sparse matrix partitioning, respectively. The model for 1D partitioning relies on the commonly adopted recursive bipartitioning framework and it uses novel structures to capture the relations that incur latency. The models for 2D partitioning aim to improve the performance of solvers for nonsymmetric linear systems by using different partitions for the vectors in the solver and uses that exibility to exploit the latency cost. Our findings indicate that the latency costs should deffinitely be considered in order to achieve scalable performance on distributed memory systems.
Open Access
ON two-dimensional sparse matrix partitioning: models, methods, and a recipe
(Society for Industrial and Applied Mathematics, 2010) Çatalyürek, U. V.; Aykanat, Cevdet; Uçar, A.
We consider two-dimensional partitioning of general sparse matrices for parallel sparse matrix-vector multiply operation. We present three hypergraph-partitioning-based methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces fine-grain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed two-dimensional partitioning methods together with the hypergraph-based one-dimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics. © 2010 Society for Industrial and Applied Mathematics.
Open Access
Parallelization of Sparse Matrix Kernels for big data applications
(Springer, 2016) Selvitopu, Oğuz; Akbudak, Kadir; Aykanat, Cevdet; Pop, F.; Kołodziej, J.; Di Martino, B.
Analysis of big data on large-scale distributed systems often necessitates efficient parallel graph algorithms that are used to explore the relationships between individual components. Graph algorithms use the basic adjacency list representation for graphs, which can also be viewed as a sparse matrix. This correspondence between representation of graphs and sparse matrices makes it possible to express many important graph algorithms in terms of basic sparse matrix operations, where the literature for optimization is more mature. For example, the graph analytic libraries such as Pegasus and Combinatorial BLAS use sparse matrix kernels for a wide variety of operations on graphs. In this work, we focus on two such important sparse matrix kernels: Sparse matrix–sparse matrix multiplication (SpGEMM) and sparse matrix–dense matrix multiplication (SpMM). We propose partitioning models for efficient parallelization of these kernels on large-scale distributed systems. Our models aim at reducing and improving communication volume while balancing computational load, which are two vital performance metrics on distributed systems. We show that by exploiting sparsity patterns of the matrices through our models, the parallel performance of SpGEMM and SpMM operations can be significantly improved.
Open Access
Partitioning hypergraphs in scientific computing applications through vertex separators on graphs
(Society for Industrial and Applied Mathematics, 2012) Kayaaslan, E.; Pinar, A.; Çatalyürek, U.; Aykanat, Cevdet
The modeling flexibility provided by hypergraphs has drawn a lot of interest from the combinatorial scientific community, leading to novel models and algorithms, their applications, and development of associated tools. Hypergraphs are now a standard tool in combinatorial scientific computing. The modeling flexibility of hypergraphs, however, comes at a cost: algorithms on hypergraphs are inherently more complicated than those on graphs, which sometimes translates to nontrivial increases in processing times. Neither the modeling flexibility of hypergraphs nor the runtime efficiency of graph algorithms can be overlooked. Therefore, the new research thrust should be how to cleverly trade off between the two. This work addresses one method for this trade-off by solving the hypergraph partitioning problem by finding vertex separators on graphs. Specifically, we investigate how to solve the hypergraph partitioning problem by seeking a vertex separator on its net intersection graph (NIG), where each net of the hypergraph is represented by a vertex, and two vertices share an edge if their nets have a common vertex. We propose a vertex-weighting scheme to attain good node-balanced hypergraphs, since the NIG model cannot preserve node-balancing information. Vertex-removal and vertex-splitting techniques are described to optimize cut-net and connectivity metrics, respectively, under the recursive bipartitioning paradigm. We also developed implementations of our proposed hypergraph partitioning formulations by adopting and modifying a state-of-the-art graph partitioning by vertex separator tool onmetis. Experiments conducted on a large collection of sparse matrices demonstrate the effectiveness of our proposed techniques. (c) 2012 Society for Industrial and Applied Mathematics.
Open Access
A recursive bipartitioning algorithm for permuting sparse square matrices into block diagonal form with overlap
(Society for Industrial and Applied Mathematics, 2013) Acer, S.; Kayaaslan, E.; Aykanat, Cevdet
We investigate the problem of symmetrically permuting a square sparse matrix into a block diagonal form with overlap. This permutation problem arises in the parallelization of an explicit formulation of the multiplicative Schwarz preconditioner and a more recent block overlapping banded linear solver as well as its application to general sparse linear systems. In order to formulate this permutation problem as a graph theoretical problem, we define a constrained version of the multiway graph partitioning by vertex separator (GPVS) problem, which is referred to as the ordered GPVS (oGPVS) problem. However, existing graph partitioning tools are unable to solve the oGPVS problem. So, we also show how the recursive bipartitioning framework can be utilized for solving the oGPVS problem. For this purpose, we propose a left-to-right bipartitioning approach together with a novel vertex fixation scheme so that existing 2-way GPVS tools that support fixed vertices can be effectively and efficiently utilized in the recursive bipartitioning framework. Experimental results on a wide range of matrices confirm the validity of the proposed approach. © 2013 Society for Industrial and Applied Mathematics.
Open Access
A recursive graph bipartitioning algorithm by vertex separators with fixed vertices for permuting sparse matrices into block diagonal form with overlap
(2011) Acer, Seher
Solving sparse system of linear equations Ax=b using preconditioners can be effi- ciently parallelized using graph partitioning tools. In this thesis, we investigate the problem of permuting a sparse matrix into a block diagonal form with overlap which is to be used in the parallelization of the multiplicative schwarz preconditioner. A matrix is said to be in block diagonal form with overlap if the diagonal blocks may overlap. In order to formulate this permutation problem as a graph-theoretical problem, we introduce a restricted version of the graph partitioning by vertex separator problem (GPVS), where the objective is to find a vertex partition whose parts are only connected by a vertex separator. The modified problem, we refer as ordered GPVS problem (oGPVS), is restricted such that the parts should exhibit an ordered form where the consecutive parts can only be connected by a separator. The existing graph partitioning tools are unable to solve the oGPVS problem. Thus, we present a recursive graph bipartitioning algorithm by vertex separators together with a novel vertex fixation scheme so that a GPVS tool supporting fixed vertices can effectively and efficiently be utilized. We also theoretically verified the correctness of the proposed approach devising a necessary and sufficient condition to the feasibility of a oGPVS solution. Experimental results on a wide range of matrices confirm the validity of the proposed approach.
Open Access
A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously
(IEEE Computer Society, 2017) Selvitopi, O.; Acer, S.; Aykanat, Cevdet
Intelligent partitioning models are commonly used for efficient parallelization of irregular applications on distributed systems. These models usually aim to minimize a single communication cost metric, which is either related to communication volume or message count. However, both volume- and message-related metrics should be taken into account during partitioning for a more efficient parallelization. There are only a few works that consider both of them and they usually address each in separate phases of a two-phase approach. In this work, we propose a recursive hypergraph bipartitioning framework that reduces the total volume and total message count in a single phase. In this framework, the standard hypergraph models, nets of which already capture the bandwidth cost, are augmented with message nets. The message nets encode the message count so that minimizing conventional cutsize captures the minimization of bandwidth and latency costs together. Our model provides a more accurate representation of the overall communication cost by incorporating both the bandwidth and the latency components into the partitioning objective. The use of the widely-adopted successful recursive bipartitioning framework provides the flexibility of using any existing hypergraph partitioner. The experiments on instances from different domains show that our model on the average achieves up to 52 percent reduction in total message count and hence results in 29 percent reduction in parallel running time compared to the model that considers only the total volume. © 2016 IEEE.
Open Access
Reducing communication volume overhead in large-scale parallel SpGEMM
(2016-12) Ünsal, Başak
Sparse matrix-matrix multiplication of the form of C = A x B, C = A x A and C = A x AT is a key operation in various domains and is characterized with high complexity and runtime overhead. There exist models for parallelizing this operation in distributed memory architectures such as outer-product (OP), inner-product (IP), row-by-row-product (RRP) and column-by-column-product (CCP). We focus on row-by-row-product due to its convincing performance, row preprocessing overhead and no symbolic multiplication requirement. The paral- lelization via row-by-row-product model can be achieved using bipartite graphs or hypergraphs. For an efficient parallelization, we can consider multiple volume- based metrics to be reduced such as total volume, maximum volume, etc. Existing approaches for RRP model do not encapsulate multiple volume-based metrics. In this thesis, we propose a two-phase approach to reduce multiple volume- based cost metrics. In the first phase, total volume is reduced with a bipartite graph model. In the second phase, we reduce maximum volume while trying to keep the increase in total volume as small as possible. Our experiments show that the proposed approach is effective at reducing multiple volume-based metrics for different forms of SpGEMM operations.