CAPSULE: Language and system support for efficient state sharing in distributed stream processing systems

Date
2012
Advisor
Instructor
Source Title
DEBS '12 Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Print ISSN
Electronic ISSN
Publisher
ACM
Volume
Issue
Pages
268 - 277
Language
English
Type
Conference Paper
Journal Title
Journal ISSN
Volume Title
Abstract

Data stream processing applications are often expressed as data flow graphs, composed of operators connected via streams. This structured representation provides a simple yet powerful paradigm for building large-scale, distributed, high-performance applications. However, there are many tasks that require sharing data across operators, and across operators and the runtime using a less structured mechanism than point-to-point data flows. Examples include updating control variables, sending notifications, collecting metrics, building collective models, etc. In this paper we describe CAPSULE, which fills this gap. CAPSULE is a code generation and runtime framework that offers an easy to use and highly flexible framework for developers to realize shared variables (CAPSULE term for shared state) by specifying a data structure (at the programming-language level), and a few associated configuration parameters that qualify the expected usage scenario. Besides the easy of use and flexibility, CAPSULE offers the following important benefits: (1) Custom Code Generation - CAPSULE makes use of user-specified configuration parameters and information from the runtime to generate shared variable servers that are tailored for the specific usage scenario, (2) Composability - CAPSULE supports deployment time composition of the shared variable servers to achieve desired levels of scalability, performance and fault-tolerance, and (3) Extensibility - CAPSULE provides simple interfaces for extending the CAPSULE framework with more protocols, transports, caching mechanisms, etc. We describe the motivation for CAPSULE and its design, report on its implementation status, and then present experimental results. Copyright © 2012 ACM.

Course
Other identifiers
Book Title
Keywords
Consistency models, Distributed shared state, Stream processing, Caching mechanism, Code Generation, Composability, Configuration parameters, Consistency model, Control variable, Data flow, Data stream processing, Deployment time, Distributed shared state, Flexible framework, High performance applications, Runtimes, Shared variables, Stream processing, Stream processing systems, System supports, Usage scenarios, Data flow analysis, Data flow graphs, Data structures, Fault tolerance, Network components, Software architecture, Distributed parameter control systems
Citation
Published Version (Please cite this version)