Bilkent Repository :: Browsing by Subject "Parallel algorithms"

Browsing by Subject "Parallel algorithms"

Now showing 1 - 20 of 26

Open Access
Architecture framework for mapping parallel algorithms to parallel computing platforms
(CEUR-WS, 2013) Tekinerdogan, Bedir; Arkin, E.
Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, and the mapping of the algorithm to the logical configuration platform. Unfortunately, in current parallel computing approaches there does not seem to be precise modeling approaches for supporting the mapping process. The lack of a clear and precise modeling approach for parallel computing impedes the communication and analysis of the decisions for supporting the mapping of parallel algorithms to parallel computing platforms. In this paper we present an architecture framework for modeling the various views that are related to the mapping process. An architectural framework organizes and structures the proposed architectural viewpoints. We propose five coherent set of viewpoints for supporting the mapping of parallel algorithms to parallel computing platforms. We illustrate the architecture framework for the mapping of array increment algorithm to the parallel computing platform. Copyright © 2013 for the individual papers by the papers' authors.
Open Access
Computational analysis of complicated metamaterial structures using MLFMA and nested preconditioners
(IEEE, 2007-11) Ergül, Özgür; Malas, Tahir; Yavuz, Ç; Ünal, Alper; Gürel, Levent
We consider accurate solution of scattering problems involving complicated metamaterial (MM) structures consisting of thin wires and split-ring resonators. The scattering problems are formulated by the electric-field integral equation (EFIE) discretized with the Rao-Wilton- Glisson basis functions defined on planar triangles. The resulting dense matrix equations are solved iteratively, where the matrix-vector multiplications that are required by the iterative solvers are accelerated with the multilevel fast multipole algorithm (MLFMA). Since EFIE usually produces matrix equations that are ill-conditioned and difficult to solve iteratively, we employ nested preconditioners to achieve rapid convergence of the iterative solutions. To further accelerate the simulations, we parallelize our algorithm and perform the solutions on a cluster of personal computers. This way, we are able to solve problems of MMs involving thousands of unit cells.
Open Access
A data-level parallel linear-quadratic penalty algorithm for multicommodity network flows
(Association for Computing Machinery, 1994) Pinar, M. C.; Zenios, S. A.
We describe the development of a data-level, massively parallel software system for the solution of multicommodity network flow problems. Using a smooth linear-quadratic penalty (LQP) algorithm we transform the multicommodity network flow problem into a sequence of independent min-cost network flow subproblems. The solution of these problems is coordinated via a simple, dense, nonlinear master program to obtain a solution that is feasible within some user-specified tolerance to the original multicommodity network flow problem. Particular emphasis is placed on the mapping of both the subproblem and master problem data to the processing elements of a massively parallel computer, the Connection Machine CM-2. As a result of this design we can solve large and sparse optimization problems on current SIMD massively parallel architectures. Details of the implementation are reported, together with summary computational results with a set of test problems drawn from a Military Airlift Command application.
Open Access
Distributed evaluation of an iterative function for all object pairs on an SIMD hypercube
(Elsevier BV, 1991) Erçal, F.
An efficient distributed algorithm for evaluating an iterative function on all pairwise combinations of C objects on an SIMD hypercube is presented. The algorithm achieves uniform load distribution and minimal, completely local interprocessor communication. © 1991.
Open Access
Distributed joint flow-radio and channel assignment using partially overlapping channels in multi-radio wireless mesh networks
(Springer, 2016) Ulucinar, A. R.; Korpeoglu, I.
Equipping mesh nodes with multiple radios that support multiple wireless channels is considered a promising solution to overcome the capacity limitation of single-radio wireless mesh networks. However, careful and intelligent radio resource management is needed to take full advantage of the extra radios on the mesh nodes. Flow-radio assignment and channel assignment procedures should obey the physical constraints imposed by the radios as well as the topological constraints imposed by routing. Varying numbers of wireless channels are available for the channel assignment procedure for different wireless communication standards. To further complicate the problem, the wireless communication standard implemented by the radios of the wireless mesh network may define overlapping as well as orthogonal channels, as in the case of the IEEE 802.11b/g family of standards. This paper presents Distributed Flow-Radio Channel Assignment, a distributed joint flow-radio and channel assignment scheme and the accompanying distributed protocol in the context of multi-channel multi-radio wireless mesh networks. The scheme’s performance is evaluated on small networks for which the optimal flow-radio and channel configuration can be computed, as well as on large random topologies.
Open Access
Efficient parallelization of the multilevel fast multipole algorithm for the solution of large-scale scattering problems
(Institute of Electrical and Electronics Engineers, 2008-08) Ergül, Özgür; Gürel, Levent
We present fast and accurate solutions of large-scale scattering problems involving three-dimensional closed conductors with arbitrary shapes using the multilevel fast multipole algorithm (MLFMA). With an efficient parallelization of MLFMA, scattering problems that are discretized with tens of millions of unknowns are easily solved on a cluster of computers. We extensively investigate the parallelization of MLFMA, identify the bottlenecks, and provide remedial procedures to improve the efficiency of the implementations. The accuracy of the solutions is demonstrated on a scattering problem involving a sphere of radius 110 discretized with 41 883 638 unknowns, the largest integral-equation problem solved to date. In addition to canonical problems, we also present the solution of real-life problems involving complicated targets with large dimensions
Open Access
Efficient solution of the combined-field integral equation with the parallel multilevel fast multipole algorithm
(IEEE, 2007-08) Gürel, Levent; Ergül, Özgür
We present fast and accurate solutions of large-scale scattering problems formulated with the combined-field integral equation. Using the multilevel fast multipole algorithm (MLFMA) parallelized on a cluster of computers, we easily solve scattering problems that are discretized with tens of millions of unknowns. For the efficient parallelization of MLFMA, we propose a hierarchical partitioning scheme based on distributing the multilevel tree among the processors with an improved load-balancing. The accuracy of the solutions is demonstrated on scattering problems involving spheres of various radii from 80λ to 110λ. In addition to canonical problems, we also present the solution of real-life problems involving complicated targets with large dimensions. © 2007 IEEE.
Open Access
Improving efficiency of parallel vertex-centric algorithms for irregular graphs
(IEEE Computer Society, 2019) Özdal, Muhammet Mustafa
Memory access is known to be the main bottleneck for shared-memory parallel graph applications especially for large and irregular graphs. Propagation blocking (PB) idea was proposed recently to improve the parallel performance of PageRank and sparse matrix and vector multiplication operations. The idea is based on separating parallel computation into two phases, binning and accumulation, such that random memory accesses are replaced with contiguous accesses. In this paper, we propose an algorithm that allows execution of these two phases concurrently. We propose several improvements to increase parallel throughput, reduce memory overhead, and improve work efficiency. Our experimental results show that our proposed algorithms improve shared-memory parallel throughput by a factor of up to 2× compared to the original PB algorithms. We also show that the memory overhead can be reduced significantly (from 170 percent down to less than 5 percent) without significant degradation of performance. Finally, we demonstrate that our concurrent execution model allows asynchronous parallel execution, leading to significant work efficiency in addition to throughput improvements.
Open Access
Iterative algorithms for solution of large sparse systems of linear equations on hypercubes
(IEEE, 1988) Aykanat, Cevdet; Özgüner, F.; Ercal, F.; Sadayappan, P.
Finite-element discretization produces linear equations in the form Ax=b, where A is large, sparse, and banded with proper ordering of the variables x. The solution of such equations on distributed-memory message-passing multiprocessors implementing the hypercube topology is addressed. Iterative algorithms based on the conjugate gradient method are developed for hypercubes designed for coarse-grained parallelism. The communication requirements of different schemes for mapping finite-element meshes onto the processors of a hypercube are analyzed with respect to the effect of communication parameters of the architecture. Experimental results for a 16-node Intel 80386-based iPSC/2 hypercube are presented and discussed.
Open Access
Mars: A tool-based modeling, animation, and parallel rendering system
(Springer, 1994) Aktıhanoğlu, M.; Özgüç, B.; Aykanat, Cevdet
This paper describes a system for modeling, animating, previewing and rendering articulated objects. The system has a modeler of objects that consists of joints and segments. The animator interactively positions the articulated object in its stick, control vertex, or rectangular prism representation and previews the motion in real time. Then the data representing the motion and the models is sent to a multicomputer [iPSC/2 Hypercube (Intel)]. The frames are rendered in parallel, exploiting the coherence between successive frames, thus cutting down the rendering time significantly. Our main aim is to make a detailed study on rendering of a sequence of 3D scenes. The results show that due to an inherent correlation between the 3D scenes, an efficient rendering can be achieved. © 1994 Springer-Verlag.
Open Access
MLFMA solutions of transmission problems Involving realistic metamaterial walls
(IEEE, 2007-08) Ergül, Özgür; Ünal, Alper; Gürel, Levent
We present the solution of multilayer metamaterial (MM) structures containing large numbers of unit cells, such as split-ring resonators. Integral-equation formulations of scattering problems are solved iteratively by employing a parallel implementation of the multilevel fast multipole algorithm. Due to ill-conditioned nature of the problems, advanced preconditioning techniques are used to obtain rapid convergence in the iterative solutions. By constructing a sophisticated simulation environment, we accurately and efficiently investigate large and complicated MM structures. © 2007 IEEE.
Open Access
Model-driven approach for supporting the mapping of parallel algorithms to parallel computing platforms
(Springer, Berlin, Heidelberg, 2013) Arkin, E.; Tekinerdogan, Bedir; Imre, K.M.
The trend from single processor to parallel computer architectures has increased the importance of parallel computing. To support parallel computing it is important to map parallel algorithms to a computing platform that consists of multiple parallel processing nodes. In general different alternative mappings can be defined that perform differently with respect to the quality requirements for power consumption, efficiency and memory usage. The mapping process can be carried out manually for platforms with a limited number of processing nodes. However, for exascale computing in which hundreds of thousands of processing nodes are applied, the mapping process soon becomes intractable. To assist the parallel computing engineer we provide a model-driven approach to analyze, model, and select feasible mappings. We describe the developed toolset that implements the corresponding approach together with the required metamodels and model transformations. We illustrate our approach for the well-known complete exchange algorithm in parallel computing. © 2013 Springer-Verlag.
Open Access
Model-driven transformations for mapping parallel algorithms on parallel computing platforms
(MDHPCL, 2013) Arkin, E.; Tekinerdoğan, Bedir
One of the important problems in parallel computing is the mapping of the parallel algorithm to the parallel computing platform. Hereby, for each parallel node the corresponding code for the parallel nodes must be implemented. For platforms with a limited number of processing nodes this can be done manually. However, in case the parallel computing platform consists of hundreds of thousands of processing nodes then the manual coding of the parallel algorithms becomes intractable and error-prone. Moreover, a change of the parallel computing platform requires considerable effort and time of coding. In this paper we present a model-driven approach for generating the code of selected parallel algorithms to be mapped on parallel computing platforms. We describe the required platform independent metamodel, and the model-to-model and the model-to-text transformation patterns. We illustrate our approach for the parallel matrix multiplication algorithm. Copyright © 2013 for the individual papers by the papers' authors.
Open Access
Novel algorithms and models for scaling parallel sparse tensor and matrix factorizations
(Bilkent University, 2022-07) Abubaker, Nabil F. T.
Two important and widely-used factorization algorithms, namely CPD-ALS for sparse tensor decomposition and distributed stratiﬁed SGD for low-rank matrix factorization, suﬀer from limited scalability. In CPD-ALS, the computational load associated with a tensor/subtensor assigned to a processor is a function of the nonzero counts as well as the ﬁber counts of the tensor when the CSF stor-age is utilized. The tensor ﬁbers fragment as a result of nonzero distributions, which makes balancing the computational loads a hard problem. Two strategies are proposed to tackle the balancing problem on an existing ﬁne-grain hyper-graph model: a novel weighting scheme to cover the cost of ﬁbers in the true load as well as an augmentation to the hypergraph with ﬁber nets to encode reducing the increase in computational load. CPD-ALS also suﬀers from high latency overhead due to the high number of point-to-point messages incurred as the processor count increases. A framework is proposed to limit the number of messages to O(log2 K), for a K-processor system, exchanged in log2 K stages. A hypergraph-based method is proposed to encapsulate the communication of the new log2 K-stage algorithm. In the existing stratiﬁed SGD implementations, the volume of communication is proportional to one of the dimensions of the input matrix and prohibits the scalability. Exchanging the essential data necessary for the correctness of the SSGD algorithm as point-to-point messages is proposed to reduce the volume. This, although invaluable for reducing the band-width overhead, would increase the upper bound on the number of exchanged messages from O(K) to O(K2) rendering the algorithm latency-bound. A novel Hold-and-Combine algorithm is proposed to exchange the essential communication volume with up to O(K logK) messages. Extensive experiments on HPC systems demonstrate the importance of the proposed algorithms and models in scaling CPD-ALS and stratiﬁed SGD.
Open Access
Online balancing two independent criteria
(Springer, 2008-10) Tse, Savio S.H.
We study the online bicriteria load balancing problem in this paper. We choose a system of distributed homogeneous file servers located in a cluster as the scenario and propose two online approximate algorithms for balancing their loads and required storage spaces. We first revisit the best existing solution for document placement, and rewrite it in our first algorithm by imposing some flexibilities. The second algorithm bounds the load and storage space of each server by less than three times of their trivial lower bounds, respectively; and more importantly, for each server, the value of at least one parameter is far from its worst case. The time complexities for both algorithm are O(logM). © 2008 Springer Berlin Heidelberg.
Open Access
Parallel algorithms for the solution of large sparse inequality systems on distributed memory architectures
(Bilkent University, 1998) Turna, Esma
In this thesis, several parallel algorithms are proposed and utilized for the solution of large sparse linear inequality systems. The parallelization schemes are developed from the coarse-grain parallel formulation of the surrogate constraint method, based on the partitioning strategy: 1D partitioning and 2D partitioning. Furthermore, a third parallelization scheme is developed for the explicit minimization of the communication overhead in 1D partitioning, by using hypergraph partitioning. Utilizing the hypergraph model, the communication overhead is maintained via a global communication scheme and a local communication scheme. In addition, new algorithms that use the bin packing heuristic are investigated for efficient load balancing in uniform rowwise stripped and checkerboard partitioning. A general class of image recovery problems is formulated as a linear inequality system. The restoration of images blurred by so called point spread functions arising from effects such as misfocus of the photographic device, atmospheric turbulence, etc. is successfully provided with the developed parallel algorithms.
Open Access
Parallel direct and hybrid methods based on row block partitioning for solving sparse linear systems
(Bilkent University, 2017-08) Torun, Fahreddin Şükrü
Solving system of linear equations is a kernel operation in many scienti c and industrial applications. These applications usually give rise to linear systems in which the coe cient matrix is very large and sparse. The need for solving these large and sparse systems within a reasonable time necessitates e cient and e ective parallel solution methods. In this thesis, three novel approaches are proposed for reducing the parallel solution time of linear systems. First, a new parallel algorithm, ParBaMiN, is proposed in order to nd the minimum 2-norm solution of underdetermined linear systems, where the coe cient matrix is in the form of column overlapping block diagonal. The conducted experiments demonstrate the scalability of ParBaMiN on both shared and distributed memory architectures. Secondly, a new graph theoretical partitioning method is introduced in order to reduce the number of iterations in block Cimmino algorithm. Experimental results validate the e ectiveness of the proposed partitioning method in terms of reducing the required number of iterations. Finally, we propose a new parallel hybrid method, BCDcols, which further reduces the number of iterations of block Cimmino algorithm for matrices with dense columns. BCDcols combines the block Cimmino iterative algorithm and a dense direct method for solving the system. Experimental results show that BCDcols signi cantly improves the convergence rate of block Cimmino method and hence reduces the parallel solution time.
Open Access
Parallel minimum norm solution of sparse block diagonal column overlapped underdetermined systems
(Association for Computing Machinery, 2017) Torun, F. S.; Manguoglu, M.; Aykanat, Cevdet
Underdetermined systems of equations in which the minimum norm solution needs to be computed arise in many applications, such as geophysics, signal processing, and biomedical engineering. In this article, we introduce a new parallel algorithm for obtaining the minimum 2-norm solution of an underdetermined system of equations. The proposed algorithm is based on the Balance scheme, which was originally developed for the parallel solution of banded linear systems. The proposed scheme assumes a generalized banded form where the coefficient matrix has column overlapped block structure in which the blocks could be dense or sparse. In this article, we implement the more general sparse case. The blocks can be handled independently by any existing sequential or parallel QR factorization library. A smaller reduced system is formed and solved before obtaining the minimum norm solution of the original system in parallel. We experimentally compare and confirm the error bound of the proposed method against the QR factorization based techniques by using true single-precision arithmetic. We implement the proposed algorithm by using the message passing paradigm. We demonstrate numerical effectiveness as well as parallel scalability of the proposed algorithm on both shared and distributed memory architectures for solving various types of problems. © 2017 ACM.
Open Access
The parallel surrogate constraint approach to the linear feasibility problem
(Springer, 1996) Özaktaş, Hakan; Akgül, Mustafa; Pınar, Mustafa Ç.
The linear feasibility problem arises in several areas of applied mathematics and medical science, in several forms of image reconstruction problems. The surrogate constraint algorithm of Yang and Murty for the linear feasibility problem is implemented and analyzed. The sequential approach considers projections one at a time. In the parallel approach, several projections are made simultaneously and their convex combination is taken to be used at the next iteration. The sequential method is compared with the parallel method for varied numbers of processors. Two improvement schemes for the parallel method are proposed and tested.
Open Access
Parallel-MLFMA solution of CFIE discretized with tens of millions of unknowns
(Institution of Engineering and Technology, 2007) Ergül, Özgür; Gürel, Levent
We consider the solution of large scattering problems in electromagnetics involving three-dimensional arbitrary geometries with closed surfaces. The problems are formulated accurately with the combined-field integral equation and the resulting dense matrix equations are solved iteratively by employing the multilevel fast multipole algorithm (MLFMA). With an efficient parallelization of MLFMA on relatively inexpensive computing platforms using distributed-memory architectures, we easily solve large-scale problems that are discretized with tens of millions of unknowns. Accuracy of the solutions is demonstrated on scattering problems involving spheres of various sizes, including a sphere of radius 110 λ discretized with 41,883,638 unknowns, which is the largest integral-equation problem ever solved, to the best of our knowledge. In addition to canonical problems, we also present the solution of real-life problems involving complicated targets with large dimensions.