Safe data parallelism for general streaming

Schneider S.; Hirzel M.; Gedik, B.; Wu, Kun-Lung

Safe data parallelism for general streaming

dc.citation.epage	517	en_US
dc.citation.issueNumber	2	en_US
dc.citation.spage	504	en_US
dc.citation.volumeNumber	64	en_US
dc.contributor.author	Schneider S.	en_US
dc.contributor.author	Hirzel M.	en_US
dc.contributor.author	Gedik, B.	en_US
dc.contributor.author	Wu, Kun-Lung	en_US
dc.date.accessioned	2016-02-08T10:02:43Z
dc.date.available	2016-02-08T10:02:43Z
dc.date.issued	2015	en_US
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming applications use stateful, selective, and user-defined operators. The stream programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, data parallelism must either be manually introduced by programmers, or extracted as an optimization by compilers. Previous data parallel optimizations did not apply to selective, stateful and user-defined operators. This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing. Data-parallelization is safe if the transformed program has the same semantics as the original sequential version. The compiler forms parallel regions while considering operator selectivity, state, partitioning, and graph dependencies. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for parallel regions that are computation-bound, and near linear scalability when tuples are shuffled across parallel regions.	en_US
dc.identifier.doi	10.1109/TC.2013.221	en_US
dc.identifier.eissn	1557-9956	en_US
dc.identifier.issn	0018-9340	en_US
dc.identifier.uri	http://hdl.handle.net/11693/22636	en_US
dc.language.iso	English	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/TC.2013.221	en_US
dc.source.title	IEEE Transactions on Computers	en_US
dc.subject	Data processing	en_US
dc.subject	Distributed computing	en_US
dc.subject	Data handling	en_US
dc.subject	Data processing	en_US
dc.subject	Distributed computer systems	en_US
dc.subject	Parallel programming	en_US
dc.subject	Scalability	en_US
dc.subject	Semantics	en_US
dc.subject	Data parallelism	en_US
dc.subject	Data parallelization	en_US
dc.subject	Distributed runtime	en_US
dc.subject	Efficient strategy	en_US
dc.subject	Pipeline parallelisms	en_US
dc.subject	Stream processing	en_US
dc.subject	Stream programming	en_US
dc.subject	Streaming applications	en_US
dc.subject	Program compilers	en_US
dc.title	Safe data parallelism for general streaming	en_US
dc.type	Article	en_US

Collections

Scholarly Publications - Computer Engineering

Safe data parallelism for general streaming

Files

Collections