C-Stream: A coroutine-based elastic stream processing engine
Stream processing is a computational paradigm for on-the-fly processing of live data. This paradigm lends itself to implementations that can provide high throughput and low latency, by taking advantage of various forms of parallelism that is naturally captured by the stream processing model of computation, such as pipeline, task, and data parallelism. In this thesis, we describe the design and implementation of C-Stream, which is an elastic stream processing engine. C-Stream encompasses three unique properties. First, in contrast to the widely adopted event-based interface for developing stream processing operators, C-Stream provides an interface wherein each operator has its own control loop and rely on data availability APIs to decide when to perform its computations. The self-control based model significantly simplifies development of operators that require multi-port synchronization. Second, C-Stream contains a multi-threaded dynamic scheduler that manages the execution of the operators. The scheduler, which is customizable via plug-ins, enables the execution of the operators as co-routines, using any number of threads. The base scheduler implements back-pressure, provides data availability APIs, and manages preemption and termination handling. Last, C-Stream provides elastic parallelization. It can dynamically adjust the number of threads used to execute an application, and can also adjust the number of replicas of data-parallel operators to resolve bottlenecks. We provide an experimental evaluation of C-Stream. The results show that C-Stream is scalable, highly customizable, and can resolve bottlenecks by dynamically adjusting the level of data parallelism used.