Show simple item record

dc.contributor.authorSchneider S.en_US
dc.contributor.authorHirzel M.en_US
dc.contributor.authorGedik, B.en_US
dc.contributor.authorWu, Kun-Lungen_US
dc.date.accessioned2016-02-08T10:02:43Z
dc.date.available2016-02-08T10:02:43Z
dc.date.issued2015en_US
dc.identifier.issn0018-9340
dc.identifier.urihttp://hdl.handle.net/11693/22636
dc.description.abstractStreaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming applications use stateful, selective, and user-defined operators. The stream programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, data parallelism must either be manually introduced by programmers, or extracted as an optimization by compilers. Previous data parallel optimizations did not apply to selective, stateful and user-defined operators. This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing. Data-parallelization is safe if the transformed program has the same semantics as the original sequential version. The compiler forms parallel regions while considering operator selectivity, state, partitioning, and graph dependencies. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for parallel regions that are computation-bound, and near linear scalability when tuples are shuffled across parallel regions.en_US
dc.language.isoEnglishen_US
dc.source.titleIEEE Transactions on Computersen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/TC.2013.221en_US
dc.subjectData processingen_US
dc.subjectDistributed computingen_US
dc.subjectData handlingen_US
dc.subjectData processingen_US
dc.subjectDistributed computer systemsen_US
dc.subjectParallel programmingen_US
dc.subjectScalabilityen_US
dc.subjectSemanticsen_US
dc.subjectData parallelismen_US
dc.subjectData parallelizationen_US
dc.subjectDistributed runtimeen_US
dc.subjectEfficient strategyen_US
dc.subjectPipeline parallelismsen_US
dc.subjectStream processingen_US
dc.subjectStream programmingen_US
dc.subjectStreaming applicationsen_US
dc.subjectProgram compilersen_US
dc.titleSafe data parallelism for general streamingen_US
dc.typeArticleen_US
dc.departmentDepartment of Computer Engineering
dc.citation.spage504en_US
dc.citation.epage517en_US
dc.citation.volumeNumber64en_US
dc.citation.issueNumber2en_US
dc.identifier.doi10.1109/TC.2013.221en_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.identifier.eissn1557-9956


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record