Browsing by Subject "Factorization"

Now showing 1 - 5 of 5

Open Access
Algorithms for efficient vectorization of repeated sparse power system network computations
(IEEE, 1995) Aykanat, Cevdet; Özgü, Ö.; Güven, N.
Standard sparsity-based algorithms used in power system appllcations need to be restructured for efficient vectorization due to the extremely short vectors processed. Further, intrinsic architectural features of vector computers such as chaining and sectioning should also be exploited for utmost performance. This paper presents novel data storage schemes and vectorization alsorim that resolve the recurrence problem, exploit chaining and minimize the number of indirect element selections in the repeated solution of sparse linear system of equations widely encountered in various power system problems. The proposed schemes are also applied and experimented for the vectorization of power mismatch calculations arising in the solution phase of FDLF which involves typical repeated sparse power network computations. The relative performances of the proposed and existing vectorization schemes are evaluated, both theoretically and experimentally on IBM 3090ArF.
Open Access
Lumpability of linear evolution equations in banach spaces
(American Institute of Mathematical Sciences, 2017) Atay, F. M.; Roncoroni, L.
We analyze the lumpability of linear systems on Banach spaces, namely, the possibility of projecting the dynamics by a linear reduction opera-tor onto a smaller state space in which a self-contained dynamical description exists. We obtain conditions for lumpability of dynamics defined by unbounded operators using the theory of strongly continuous semigroups. We also derive results from the dual space point of view using sun dual theory. Furthermore, we connect the theory of lumping to several results from operator factoriza-tion. We indicate several applications to particular systems, including delay differential equations. © 2017, American Institute of Mathematical Sciences. All rights reserved.
Open Access
Parafac-spark: parallel tensor decompositions on spark
(2019-08) Bekçe, Selim Eren
Tensors are higher order matrices, widely used in many data science applications and scienti c disciplines. The Canonical Polyadic Decomposition (also known as CPD/PARAFAC) is a widely adopted tensor factorization to discover and extract latent features of tensors usually applied via alternating squares (ALS) method. Developing e cient parallelization methods of PARAFAC on commodity clusters is important because as common tensor sizes reach billions of nonzeros, a naive implementation would require infeasibly huge intermediate memory sizes. Implementations of PARAFAC-ALS on shared and distributedmemory systems are available, but these systems require expensive cluster setups, are too low level, not compatible with modern tooling and not fault tolerant by design. Many companies and data science communities widely prefer Apache Spark, a modern distributed computing framework with in-memory caching, and Hadoop ecosystem of tools for their ease of use, compatibility, ability to run on commodity hardware and fault tolerance. We developed PARAFAC-SPARK, an e cient, parallel, open-source implementation of PARAFAC on Spark, written in Scala. It can decompose 3D tensors stored in common coordinate format in parallel with low memory footprint by partitioning them as grids and utilizing compressed sparse rows (CSR) format for e cient traversals. We followed and combined many of the algorithmic and methodological improvements of its predecessor implementations on Hadoop and distributed memory, and adapted them for Spark. During the kernel MTTKRP operation, by applying a multi-way dynamic partitioning scheme, we were also able to increase the number of reducers to be on par with the number of cores to achieve better utilization and reduced memory footprint. We ran PARAFAC-SPARK with some real world tensors and evaluated the e ectiveness of each improvement as a series of variants compared with each other, as well as with some synthetically generated tensors up to billions of rows to measure its scalability. Our fastest variant (PS-CSRSX ) is up to 67% faster than our baseline Spark implementation (PS-COO) and up to 10 times faster than the state of art Hadoop implementations.
Open Access
Parallel minimum norm solution of sparse block diagonal column overlapped underdetermined systems
(Association for Computing Machinery, 2017) Torun, F. S.; Manguoglu, M.; Aykanat, Cevdet
Underdetermined systems of equations in which the minimum norm solution needs to be computed arise in many applications, such as geophysics, signal processing, and biomedical engineering. In this article, we introduce a new parallel algorithm for obtaining the minimum 2-norm solution of an underdetermined system of equations. The proposed algorithm is based on the Balance scheme, which was originally developed for the parallel solution of banded linear systems. The proposed scheme assumes a generalized banded form where the coefficient matrix has column overlapped block structure in which the blocks could be dense or sparse. In this article, we implement the more general sparse case. The blocks can be handled independently by any existing sequential or parallel QR factorization library. A smaller reduced system is formed and solved before obtaining the minimum norm solution of the original system in parallel. We experimentally compare and confirm the error bound of the proposed method against the QR factorization based techniques by using true single-precision arithmetic. We implement the proposed algorithm by using the message passing paradigm. We demonstrate numerical effectiveness as well as parallel scalability of the proposed algorithm on both shared and distributed memory architectures for solving various types of problems. © 2017 ACM.
Open Access
Recursive bipartitioning models for performance improvement in sparse matrix computations
(2017-08) Acer, Seher
Sparse matrix computations are among the most important building blocks of linear algebra and arise in many scienti c and engineering problems. Depending on the problem type, these computations may be in the form of sparse matrix dense matrix multiplication (SpMM), sparse matrix vector multiplication (SpMV), or factorization of a sparse symmetric matrix. For both SpMM and SpMV performed on distributed-memory architectures, the associated data and task partitions among processors a ect the parallel performance in a great extent, especially for the sparse matrices with an irregular sparsity pattern. Parallel SpMM is characterized by high volumes of data communicated among processors, whereas both the volume and number of messages are important for parallel SpMV. For the factorization performed in envelope methods, the envelope size (i.e., pro le) is an important factor which determines the performance. For improving the performance in each of these sparse matrix computations, we propose graph/hypergraph partitioning models that exploit the advantages provided by the recursive bipartitioning (RB) paradigm in order to meet the speci c needs of the respective computation. In the models proposed for SpMM and SpMV, we utilize the RB process to enable targeting multiple volume-based communication cost metrics and the combination of volume- and number-based communication cost metrics in their partitioning objectives, respectively. In the model proposed for the factorization in envelope methods, the input matrix is reordered by utilizing the RB process in which two new quality metrics relating to pro le minimization are de ned and maintained. The experimantal results show that the proposed RB-based approach outperforms the state-of-the-art for each mentioned computation.