Safe data parallelism for general streaming
IEEE Transactions on Computers
Institute of Electrical and Electronics Engineers
504 - 517
Item Usage Stats
Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming applications use stateful, selective, and user-defined operators. The stream programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, data parallelism must either be manually introduced by programmers, or extracted as an optimization by compilers. Previous data parallel optimizations did not apply to selective, stateful and user-defined operators. This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing. Data-parallelization is safe if the transformed program has the same semantics as the original sequential version. The compiler forms parallel regions while considering operator selectivity, state, partitioning, and graph dependencies. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for parallel regions that are computation-bound, and near linear scalability when tuples are shuffled across parallel regions.
Distributed computer systems
Published Version (Please cite this version)http://dx.doi.org/10.1109/TC.2013.221
Showing items related by title, author, creator and subject.
Model-driven approach for supporting the mapping of parallel algorithms to parallel computing platforms Arkin, E.; Tekinerdogan, Bedir; Imre, K.M. (Springer, Berlin, Heidelberg, 2013)The trend from single processor to parallel computer architectures has increased the importance of parallel computing. To support parallel computing it is important to map parallel algorithms to a computing platform that ...
Arkin, E.; Tekinerdoğan, Bedir (MDHPCL, 2013)One of the important problems in parallel computing is the mapping of the parallel algorithm to the parallel computing platform. Hereby, for each parallel node the corresponding code for the parallel nodes must be implemented. ...
Schneider, S.; Hirzel, M.; Gedik, Buğra; Wu, K. -L. (2012)Streaming applications transform possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. The streaming ...