Browsing by Subject "Parallel"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Open Access AutopaR: An Automatic Parallelization Tool for Recursive Calls(IEEE, 2014-09) Kalender, Mert Emin; Mergenci, Cem; Öztürk, ÖzcanManycore systems are becoming more and more powerful with the integration of hundreds of cores on a single chip. However, writing parallel programs on these manycore systems has become a problem since the amount of available parallel tools and applications are limited. Although exploiting parallelism in software is possible, it requires different design decisions, significant programmer effort and is error prone. Different libraries and tools try to make the transition to parallelism easier, however there is no concrete system to make it transparent to software developer. To this end, our proposed tool is a step forward to improve the current state. Our approach, Autopar, specifically aims at achieving automatic parallelization of recursive applications using static program analysis. It first decides on the recursive functions of a given program. Then, it performs analysis and collects information about these recursive functions. Our analysis module automatically collects program information without requiring any modification in the program design or developer involvement. Finally, it achieves automatic parallelization by introducing necessary OpenMP pragmas in appropriate places in the application. © 2014 IEEE.Item Open Access Minimizing communication through computational redundancy in parallel iterative solvers(Bilkent University, 2011) Torun, Fahreddin ŞükrüSparse matrix vector multiplication (SpMxV) of the form y = Ax is a kernel operation in iterative linear solvers used in scientific applications. In these solvers, the SpMxV operation is performed repeatedly with the same sparse matrix through iterations until convergence. Depending on the matrix and its decomposition, parallel SpMxV operation necessitates communication among processors in the parallel environment. The communication can be reduced by intelligent decomposition. However, we can further decrease the communication through data replication and redundant computation. The communication occurs due to the transfer of x-vector entries in row-parallel SpMxV computation. The input vector x of the next iteration is computed from the output vector of the current iteration through linear vector operations. Hence, a processor may compute a y-vector entry redundantly, which leads to a x-vector entry in the following iteration, instead of receiving that x-vector entry from another processor. Thus, redundant computation of that y-vector entry may lead to reduction in communication. In this thesis, we devise a directed-graph-based model that correctly captures the computation and communication pattern for above-mentioned iterative solvers. Moreover, we formulate the communication minimization by utilizing redundant computation of y-vector entries as a combinatorial problem on this directed graph model. We propose two heuristics to solve this combinatorial problem. Experimental results indicate that the communication reducing strategy by redundantly computing is promising.Item Open Access Object-space parallel polygon rendering on hypercubes(Pergamon Press, 1998) Kurç, T. M.; Aykanat, Cevdet; Özgüç, B.This paper presents algorithms for object-space parallel polygon rendering on hypercube-connected multicomputers. A modified scanline z-buffer algorithm is proposed for local rendering phase. The proposed algorithm avoids message fragmentation by packing local foremost pixels in consecutive memory locations efficiently, and it eliminates the initialization of scanline z-buffer for each scanline. Several algorithms, utilizing different communication strategies and topological embeddings, are proposed for global z-buffering of local foremost pixels during the pixel merging phase. The performance comparison of these pixel merging algorithms are presented based on the communication overhead incurred in each scheme. Two adaptive screen subdivision heuristics are proposed for load balancing in the pixel merging phase. These heuristics utilize the distribution of foremost pixels on the screen for the subdivision. Experimental results obtained on an Intel's iPSC/2 hypercube multicomputer and a Parsytec CC system are presented. Rendering rates of 300K-700K triangles per second are attained on 16 processors of Parsytec CC system in the rendering of datasets from publicly available SPD database. (C) 1998 Elsevier Science Ltd. All rights reserved.