Browsing by Subject "Data flow graphs"

Now showing 1 - 2 of 2

Open Access
CAPSULE: Language and system support for efficient state sharing in distributed stream processing systems
(ACM, 2012) Losa, G.; Kumar, V.; Andrade, H.; Gedik, Buğra; Hirzel, M.; Soulé, R.; Wu, K. -L.
Data stream processing applications are often expressed as data flow graphs, composed of operators connected via streams. This structured representation provides a simple yet powerful paradigm for building large-scale, distributed, high-performance applications. However, there are many tasks that require sharing data across operators, and across operators and the runtime using a less structured mechanism than point-to-point data flows. Examples include updating control variables, sending notifications, collecting metrics, building collective models, etc. In this paper we describe CAPSULE, which fills this gap. CAPSULE is a code generation and runtime framework that offers an easy to use and highly flexible framework for developers to realize shared variables (CAPSULE term for shared state) by specifying a data structure (at the programming-language level), and a few associated configuration parameters that qualify the expected usage scenario. Besides the easy of use and flexibility, CAPSULE offers the following important benefits: (1) Custom Code Generation - CAPSULE makes use of user-specified configuration parameters and information from the runtime to generate shared variable servers that are tailored for the specific usage scenario, (2) Composability - CAPSULE supports deployment time composition of the shared variable servers to achieve desired levels of scalability, performance and fault-tolerance, and (3) Extensibility - CAPSULE provides simple interfaces for extending the CAPSULE framework with more protocols, transports, caching mechanisms, etc. We describe the motivation for CAPSULE and its design, report on its implementation status, and then present experimental results. Copyright © 2012 ACM.
Open Access
Elastic scaling for data stream processing
(IEEE Computer Society, 2014) Gedik, B.; Schneider S.; Hirzel M.; Wu, Kun-Lung
This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution. © 1990-2012 IEEE.