Browsing by Subject "Data communication systems"

Now showing 1 - 11 of 11

Open Access
Discriminative fine-grained mixing for adaptive compression of data streams
(Institute of Electrical and Electronics Engineers, 2014) Gedik, B.
This paper introduces an adaptive compression algorithm for transfer of data streams across operators in stream processing systems. The algorithm is adaptive in the sense that it can adjust the amount of compression applied based on the bandwidth, CPU, and workload availability. It is discriminative in the sense that it can judiciously apply partial compression by selecting a subset of attributes that can provide good reduction in the used bandwidth at a low cost. The algorithm relies on the significant differences that exist among stream attributes with respect to their relative sizes, compression ratios, compression costs, and their amenability to application of custom compressors. As part of this study, we present a modeling of uniform and discriminative mixing, and provide various greedy algorithms and associated metrics to locate an effective setting when model parameters are available at run-time. Furthermore, we provide online and adaptive algorithms for real-world systems in which system parameters that can be measured at run-time are limited. We present a detailed experimental study that illustrates the superiority of discriminative mixing over uniform mixing. © 2013 IEEE.
Open Access
Diversity and novelty in web search, recommender systems and data streams
(Association for Computing Machinery, 2014-02) Santos, R. L. T.; Castells, P.; Altingovde, I. S.; Can, Fazlı
This tutorial aims to provide a unifying account of current research on diversity and novelty in the domains of web search, recommender systems, and data stream processing.
Open Access
Elastic scaling for data stream processing
(IEEE Computer Society, 2014) Gedik, B.; Schneider S.; Hirzel M.; Wu, Kun-Lung
This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution. © 1990-2012 IEEE.
Open Access
Generic windowing support for extensible stream processing systems
(John Wiley & Sons Ltd., 2014) Gedik, B.
Stream processing applications process high volume, continuous feeds from live data sources, employ data-in-motion analytics to analyze these feeds, and produce near real-time insights with low latency. One of the fundamental characteristics of such applications is the on-the-fly nature of the computation, which does not require access to disk resident data. Stream processing applications store the most recent history of streams in memory and use it to perform the necessary modeling and analysis tasks. This recent history is often managed using windows. All data stream management systems provide some form of windowing functionality. Windowing makes it possible to implement streaming versions of the traditionally blocking relational operators, such as streaming aggregations, joins, and sorts, as well as any other analytic operator that requires keeping the most recent tuples as state, such as time series analysis operators and signal processing operators. In this paper, we provide a categorization of different window types and policies employed in stream processing applications and give detailed operational semantics for various window configurations. We describe an extensibility mechanism that makes it possible to integrate windowing support into user-defined operators, enabling consistent syntax and semantics across system-provided and third-party toolkits of streaming operators. We describe the design and implementation of a runtime windowing library that significantly simplifies the construction of window-based operators by decoupling the handling of window policies and operator logic from each other. We present our experience using the windowing library to implement a relational operators toolkit and compare the efficacy of the solution to an earlier implementation that did not employ a common windowing library. Copyright © 2013 John Wiley & Sons, Ltd.
Open Access
Implementing the Han-Kobayashi scheme using low density parity check codes over Gaussian interference channels
(Institute of Electrical and Electronics Engineers Inc., 2015) Sharifi S.; Tanc, A. K.; Duman, T. M.
We focus on Gaussian interference channels (GICs) and study the Han-Kobayashi coding strategy for the two-user case with the objective of designing implementable (explicit) channel codes. Specifically, low-density parity-check codes are adopted for use over the channel, their benefits are studied, and suitable codes are designed. Iterative joint decoding is used at the receivers, where independent and identically distributed channel adapters are used to prove that log-likelihood-ratios exchanged among the nodes of the Tanner graph enjoy symmetry when BPSK or QPSK with Gray coding is employed. This property is exploited in the proposed code optimization algorithm adopting a random perturbation technique. Code optimization and convergence threshold computations are carried out for different GICs employing finite constellations by tracking the average mutual information. Furthermore, stability conditions for the admissible degree distributions under strong and weak interference levels are determined. Via examples, it is observed that the optimized codes using BPSK or QPSK with Gray coding operate close to the capacity boundary for strong interference. For the case of weak interference, it is shown that nontrivial rate pairs are achievable via the newly designed codes, which are not possible by single user codes with time sharing. Performance of the designed codes is also studied for finite block lengths through simulations of specific codes picked with the optimized degree distributions with random constructions, where, for one instance, the results are compared with those of some structured designs. © 1972-2012 IEEE.
Open Access
Joint source-channel coding and guessing with application to sequential decoding
(Institute of Electrical and Electronics Engineers, 1998-09) Arikan, E.; Merhav, N.
We extend our earlier work on guessing subject to distortion to the joint source-channel coding context. We consider a system in which there is a source connected to a destination via a channel and the goal is to reconstruct the source output at the destination within a prescribed distortion level with respect to (w.r.t.) some distortion measure. The decoder is a guessing decoder in the sense that it is allowed to generate successive estimates of the source output until the distortion criterion is met. The problem is to design the encoder and the decoder so as to minimize the average number of estimates until successful reconstruction. We derive estimates on nonnegative moments of the number of guesses, which are asymptotically tight as the length of the source block goes to infinity. Using the close relationship between guessing and sequential decoding, we give a tight lower bound to the complexity of sequential decoding in joint source-channel coding systems, complementing earlier works by Koshelev and Hellman. Another topic explored here is the probability of error for list decoders with exponential list sizes for joint source-channel coding systems, for which we obtain tight bounds as well. It is noteworthy that optimal performance w.r.t. the performance measures considered here can be achieved in a manner that separates source coding and channel coding.
Open Access
The optimal electromagnetic carrier frequency balancing structural and metrical information densities with respect to heat removal requirements
(1992) Özaktaş, Haldun M.; Goodman J.W.
The use of higher electromagnetic carrier frequencies for communication in a computing results in both increased spatial information density and larger available modulation bandwidth. However, assuming that the communication energies are dissipated, the heat that must be removed from unit volume per unit time increases quickly with higher frequencies, resulting in a maximum useful frequency based on our limited ability to remove heat. We show that this frequency is relatively insensitive to system specific parameters and estimate its order of magnitude to lie near the infrared and visible bands of the spectrum. © 1992.
Open Access
Pipelined fission for stream programs with dynamic selectivity and partitioned state
(Academic Press, 2016) Gedik, B.; Özsema, H. G.; Öztürk, Ö.
There is an ever increasing rate of digital information available in the form of online data streams. In many application domains, high throughput processing of such data is a critical requirement for keeping up with the soaring input rates. Data stream processing is a computational paradigm that aims at addressing this challenge by processing data streams in an on-the-fly manner, in contrast to the more traditional and less efficient store-and-then process approach. In this paper, we study the problem of automatically parallelizing data stream processing applications in order to improve throughput. The parallelization is automatic in the sense that stream programs are written sequentially by the application developers and are parallelized by the system. We adopt the asynchronous data flow model for our work, which is typical in Data Stream Processing Systems (DSPS), where operators often have dynamic selectivity and are stateful. We solve the problem of pipelined fission, in which the original sequential program is parallelized by taking advantage of both pipeline parallelism and data parallelism at the same time. Our pipelined fission solution supports partitioned stateful data parallelism with dynamic selectivity and is designed for shared-memory multi-core machines. We first develop a cost-based formulation that enables us to express pipelined fission as an optimization problem. The bruteforce solution of this problem takes a long time for moderately sized stream programs. Accordingly, we develop a heuristic algorithm that can quickly, but approximately, solve the pipelined fission problem. We provide an extensive evaluation studying the performance of our pipelined fission solution, including simulations as well as experiments with an industrial-strength DSPS. Our results show good scalability for applications that contain sufficient parallelism, as well as close to optimal performance for the heuristic pipelined fission algorithm.
Open Access
Reducing Router-Crossings in a Mobile Intranet
(Springer, 1998) Korpeoglu, I.; Dube, R.; Tripathi, S. K.
Current general purpose mobility solutions like Mobile-IP involve multiple router-crossings even when the mobile host moves within an intranet from one subnet of a router to another. An environment consisting of a large number of mobile hosts would congest the router causing hosts to experience high latency and jitter. This paper presents a mechanism to eliminate multiple router-crossings in a mobile intranet by making the routers aware of mobility, which reduces the load on the routers and the hand-off and data latency at the mobile hosts.
Open Access
Test case verification by model checking
(Kluwer Academic Publishers, 1993) Naik, K.; Sarikaya, B.
Verification of a test case for testing the conformance of protocol implementations against the formal description of the protocol involves verifying three aspects of the test case: expected input/output test behavior, test verdicts, and the test purpose. We model the safety and liveness properties of a test case using branching time temporal logic. There are four types of safety properties: transmission safety, reception safety, synchronization safety, and verdict safety. We model a test purpose as a liveness property and give a set of notations to formally specify a test purpose. All these properties expressed as temporal formulas are verified using model checking on an extended state machine graph representing the composed behavior of a test case and protocol specification. This methodology is shown to be effective in finding errors in manually developed conformance test suites. © 1993 Kluwer Academic Publishers.
Open Access
A theoretical framework on the ideal number of classifiers for online ensembles in data streams
(ACM, 2016-10) Bonab, Hamed R.; Can, Fazlı
A priori determining the ideal number of component classifiers of an ensemble is an important problem. The volume and velocity of big data streams make this even more crucial in terms of prediction accuracies and resource requirements. There is a limited number of studies addressing this problem for batch mode and none for online environments. Our theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy. We prove the existence of an ideal number of classifiers for an ensemble, using the weighted majority voting aggregation rule. In our experiments, we use two state-of-the-art online ensemble classifiers with six synthetic and six real-world data streams. The violation of providing independent component classifiers for our theoretical framework makes determining the exact ideal number of classifiers nearly impossible. We suggest upper bounds for the number of classifiers that gives the highest accuracy. An important implication of our study is that comparing online ensemble classifiers should be done based on these ideal values, since comparing based on a fixed number of classifiers can be misleading. © 2016 ACM.