Browsing by Subject "Hypercube"

Now showing 1 - 8 of 8

Open Access
Active pixel merging on hypercube multicomputers
(Springer, Berlin, Heidelberg, 1996) Kurç, Tahsin M.; Aykanat, Cevdet; Özgüç, Bülent
This paper presents algorithms developed for pixel merging phase of object-space parallel polygon rendering on hypercube-connected multicomputers. These algorithms reduce volume of communication in pixel merging phase by only exchanging local foremost pixels. In order to avoid message fragmentation, local foremost pixels should be stored in consecutive memory locations. An algorithm, called modified seanline z-buffer, is proposed to store local foremost pixels efficiently. This algorithm also avoids the initialization of scanline z-buffer for each scanline on the screen. Good processor utilization is achieved by subdividing the image-space among the processors in pixel merging phase. Efficient algorithms for load balancing in the pixel merging phase are also proposed and presented. Experimental results obtained on a 16-processor Intel's iPSC/2 hypercube multicomputer are presented. © Springer-Verlag Berlin Heidelberg 1996.
Open Access
Balanced parallel sort on hypercube multiprocessors
(IEEE, 1993) Abali, B.; Özgüner, F.; Bataineh, A.
A parallel sorting algorithm for sorting n elements evenly distributed over Zd = p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O( (n log n)/p + p log2 n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k = 1,. . . , (p - 1)) of the final sorted list in advance. These p - 1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2 n) time is presented. The speed of the sorting algorithm is further enhanced by the distanced communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the new sorting algorithm is competitive with the previous algorithms, and faster for skewed data distributions.
Open Access
Efficient fast hartley transform algorithms for hypercube-connected multicomputers
(IEEE, 1995) Aykanat, Cevdet; Derviş, A.
Although fast Hartley transform (FHT) provides efficient spectral analysis of real discrete signals, the literature that addresses the parallelization of FHT is extremely rare. FHT is a real transformation and does not necessitate any complex arithmetics. On the other hand, FHT algorithm has an irregular computational structure which makes efficient parallelization harder. In this paper, we propose a efficient restructuring for the sequential FHT algorithm which brings regularity and symmetry to the computational structure of the FHT. Then, we propose an efficient parallel FHT algorithm for medium-to-coarse grain hypercube multicomputers by introducing a dynamic mapping scheme for the restructured FHT. The proposed parallel algorithm achieves perfect load-balance, minimizes both the number and volume of concurrent communications, allows only nearest-neighbor communications and achieves in-place computation and communication. The proposed algorithm is implemented on a 32-node iPSC/21 hypercube multicomputer. High-efficiency values are obtained even for small size FHT problems. © 1995 IEEE
Open Access
Efficient overlapped FFT algorithms for hypercube-connected multicomputers
(1994) Aykanat, Cevdet; Dervis, A.
In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercube-connected multicomputers, which are more elegant and efficient than the existing ones. The proposed algorithms achieve perfect load-balance for the efficient simplified-butterfly scheme, minimize the communication overhead by decreasing both the number and the volume of concurrent communications. Communication and computation cannot be overlapped easily due to the strong data dependencies in the FFT algorithm. In this paper, we propose a restructuring for the FFT algorithm which enables overlapping each communication with one fifth of the local computations involved in a stage. Two of the proposed parallel FET algorithms achieve overlapping by exploiting this restructuring while using the efficient table-lookup scheme for complex coefficients. The proposed algorithms are implemented on an Intel’s 32-node iPSC/2 hypercube multicomputer. High efficiency values are obtained even for small size FFT problems. © 1994, Taylor & Francis Group, LLC. All rights reserved.
Open Access
Efficient parallel digital signal processing algorithms for hypercube-connected multicomputers
(1992) Derviş, Argun
In this thesis, efficient parallelization of Digital Signal Processing (DSP) algorithms, (FFT, FHT and FCT), on multicomputers implementing the hypercube interconnection topology are investigated. The proposed algorithms, maintain perfect load-balance, minimize communication overhead, can overlap communications with computations and achieve regular computational patterns. The proposed parallel algorithms are implemented on Intel’s iPSC/2^ hypercube multicomputer with 32 processors. High efficiency and almost linear speedup values are obtained for even small size problems.
Open Access
Iterative algorithms for solution of large sparse systems of linear equations on hypercubes
(IEEE, 1988) Aykanat, Cevdet; Özgüner, F.; Ercal, F.; Sadayappan, P.
Finite-element discretization produces linear equations in the form Ax=b, where A is large, sparse, and banded with proper ordering of the variables x. The solution of such equations on distributed-memory message-passing multiprocessors implementing the hypercube topology is addressed. Iterative algorithms based on the conjugate gradient method are developed for hypercubes designed for coarse-grained parallelism. The communication requirements of different schemes for mapping finite-element meshes onto the processors of a hypercube are analyzed with respect to the effect of communication parameters of the architecture. Experimental results for a 16-node Intel 80386-based iPSC/2 hypercube are presented and discussed.
Open Access
A new mapping heuristic based on mean field annealing
(Academic Press, 1992) Bultan, T.; Aykanat, Cevdet
A new mapping heuristic is developed, based on the recently proposed Mean Field Annealing (MFA) algorithm. An e cient implementation scheme, which decreases the complexity of the proposed algorithm by asymptotical factors, is also given. Performance of the proposed MFA algorithm is evaluated in comparison with two wellknown heuristics; Simulated Annealing and Kernighan-Lin. Results of the experiments indicate that MFA can be used as an alternative heuristic for solving the mapping problem. Inherent parallelism of MFA is exploited by designing an e cient parallel algorithm for the proposed MFA heuristic.
Open Access
Object-space parallel polygon rendering on hypercubes
(Pergamon Press, 1998) Kurç, T. M.; Aykanat, Cevdet; Özgüç, B.
This paper presents algorithms for object-space parallel polygon rendering on hypercube-connected multicomputers. A modified scanline z-buffer algorithm is proposed for local rendering phase. The proposed algorithm avoids message fragmentation by packing local foremost pixels in consecutive memory locations efficiently, and it eliminates the initialization of scanline z-buffer for each scanline. Several algorithms, utilizing different communication strategies and topological embeddings, are proposed for global z-buffering of local foremost pixels during the pixel merging phase. The performance comparison of these pixel merging algorithms are presented based on the communication overhead incurred in each scheme. Two adaptive screen subdivision heuristics are proposed for load balancing in the pixel merging phase. These heuristics utilize the distribution of foremost pixels on the screen for the subdivision. Experimental results obtained on an Intel's iPSC/2 hypercube multicomputer and a Parsytec CC system are presented. Rendering rates of 300K-700K triangles per second are attained on 16 processors of Parsytec CC system in the rendering of datasets from publicly available SPD database. (C) 1998 Elsevier Science Ltd. All rights reserved.