Auto-parallelizing stateful distributed streaming applications
Author
Schneider, S.
Hirzel, M.
Gedik, Buğra
Wu, K. -L.
Date
2012Source Title
PACT '12 Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Print ISSN
1089-795X
Pages
53 - 63
Language
English
Type
Conference PaperItem Usage Stats
147
views
views
131
downloads
downloads
Abstract
Streaming applications transform possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. The streaming programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, it does not naturally expose data parallelism, which must instead be extracted from streaming applications. This paper presents a compiler and runtime system that automatically extract data parallelism for distributed stream processing. Our approach guarantees safety, even in the presence of stateful, selective, and userdefined operators. When constructing parallel regions, the compiler ensures safety by considering an operator's selectivity, state, partitioning, and dependencies on other operators in the graph. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for standard parallel regions, and near linear scalability when tuples are shuffled across parallel regions. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM).
Keywords
Automatic parallelizationDistributed stream processing
Auto-parallelizing
Automatic Parallelization
Data parallelism
Data tuples
Distributed streaming
Efficient strategy
High throughput
Large clusters
Low latency
Parallel system
Programming models
Runtime systems
Stream processing
Streaming applications
Distributed parameter control systems
Parallel architectures
Program compilers
Permalink
http://hdl.handle.net/11693/28156Published Version (Please cite this version)
http://dx.doi.org/10.1145/2370816.2370826Collections
Related items
Showing items related by title, author, creator and subject.
-
Safe data parallelism for general streaming
Schneider S.; Hirzel M.; Gedik, B.; Wu, Kun-Lung (Institute of Electrical and Electronics Engineers, 2015)Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming ... -
Tutorial: Stream processing optimizations
Schneider, S.; Hirzel, M.; Gedik, Buğra (ACM, 2013)This tutorial starts with a survey of optimizations for streaming applications. The survey is organized as a catalog that introduces uniform terminology and a common categorization of optimizations across disciplines, such ... -
Elastic scaling for data stream processing
Gedik, B.; Schneider S.; Hirzel M.; Wu, Kun-Lung (IEEE Computer Society, 2014)This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's ...