Slicing based code parallelization for minimizing inter-processor communication
Muralidhara, S. P.
Narayanan, S. H. K.
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
87 - 95
Item Usage Stats
MetadataShow full item record
One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested. Copyright 2009 ACM.
KeywordsAutomatic code parallelization
Code analysis and optimization
Iteration space slicing
Iteration space slicing
Published Version (Please cite this version)http://dx.doi.org/10.1145/1629395.1629409
Showing items related by title, author, creator and subject.
Arkin, E.; Tekinerdoğan, Bedir (MDHPCL, 2013)One of the important problems in parallel computing is the mapping of the parallel algorithm to the parallel computing platform. Hereby, for each parallel node the corresponding code for the parallel nodes must be implemented. ...
Schneider, S.; Hirzel, M.; Gedik, Buğra; Wu, K. -L. (2012)Streaming applications transform possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. The streaming ...
Model-driven approach for supporting the mapping of parallel algorithms to parallel computing platforms Arkin, E.; Tekinerdogan, Bedir; Imre, K.M. (Springer, Berlin, Heidelberg, 2013)The trend from single processor to parallel computer architectures has increased the importance of parallel computing. To support parallel computing it is important to map parallel algorithms to a computing platform that ...