Vectorization and parallelization of the conjugate gradient algorithm on hypercube-connected vector processors

Series

Abstract

Solution of large sparse linear systems of equations in the form constitutes a significant amount of the computations in the simulation of physical phenomena [1]. For example, the finite element discretization of a regular domain, with proper ordering of the variables x, renders a banded N × N coefficient matrix A. The Conjugate Gradient (CG) [2,3] algorithm is an iterative method for solving sparse matrix equations and is widely used because of its convergence properties. In this paper an implementation of the Conjugate Gradient algorithm, that exploits both vectorization and parallelization on a 2-dimensional hypercube with vector processors at each node (iPSC-VX/d2), is described. The implementation described here achieves efficient parallelization by using a version of the CG algorithm suitable for coarse grain parallelism [4,5] to reduce the communication steps required and by overlapping the computations on the vector processor with internode communication. With parallelization and vectorization, a speedup of 58 over a μVax II is obtained for large problems, on a two dimensional vector hypercube (iPSC-VX/d2).

Source Title

Microprocessing and Microprogramming

Publisher

Elsevier

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)

Language

English