Safe data parallelism for general streaming
dc.citation.epage | 517 | en_US |
dc.citation.issueNumber | 2 | en_US |
dc.citation.spage | 504 | en_US |
dc.citation.volumeNumber | 64 | en_US |
dc.contributor.author | Schneider S. | en_US |
dc.contributor.author | Hirzel M. | en_US |
dc.contributor.author | Gedik, B. | en_US |
dc.contributor.author | Wu, Kun-Lung | en_US |
dc.date.accessioned | 2016-02-08T10:02:43Z | |
dc.date.available | 2016-02-08T10:02:43Z | |
dc.date.issued | 2015 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description.abstract | Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming applications use stateful, selective, and user-defined operators. The stream programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, data parallelism must either be manually introduced by programmers, or extracted as an optimization by compilers. Previous data parallel optimizations did not apply to selective, stateful and user-defined operators. This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing. Data-parallelization is safe if the transformed program has the same semantics as the original sequential version. The compiler forms parallel regions while considering operator selectivity, state, partitioning, and graph dependencies. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for parallel regions that are computation-bound, and near linear scalability when tuples are shuffled across parallel regions. | en_US |
dc.description.provenance | Made available in DSpace on 2016-02-08T10:02:43Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2015 | en |
dc.identifier.doi | 10.1109/TC.2013.221 | en_US |
dc.identifier.eissn | 1557-9956 | en_US |
dc.identifier.issn | 0018-9340 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/22636 | en_US |
dc.language.iso | English | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/TC.2013.221 | en_US |
dc.source.title | IEEE Transactions on Computers | en_US |
dc.subject | Data processing | en_US |
dc.subject | Distributed computing | en_US |
dc.subject | Data handling | en_US |
dc.subject | Data processing | en_US |
dc.subject | Distributed computer systems | en_US |
dc.subject | Parallel programming | en_US |
dc.subject | Scalability | en_US |
dc.subject | Semantics | en_US |
dc.subject | Data parallelism | en_US |
dc.subject | Data parallelization | en_US |
dc.subject | Distributed runtime | en_US |
dc.subject | Efficient strategy | en_US |
dc.subject | Pipeline parallelisms | en_US |
dc.subject | Stream processing | en_US |
dc.subject | Stream programming | en_US |
dc.subject | Streaming applications | en_US |
dc.subject | Program compilers | en_US |
dc.title | Safe data parallelism for general streaming | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Safe data parallelism for general streaming.pdf
- Size:
- 1.48 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version