Browsing by Subject "Data streaming"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Open Access Aggregate profile clustering for streaming analytics(Oxford University Press, 2015) Abbasoğlu, M. A.; Gedk, B.; Ferhatosmanoğu H.Many analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them. In this paper, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Owing to the potentially large number of users and high rate of interactions, maintaining profile clusters can have high processing and memory resource requirements. To tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over nodes such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, in order to adapt to potentially changing user interaction patterns, the partitioning of profiles to nodes should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We develop a re-partitioning technique that achieves all these goals. To achieve this, we keep micro-cluster summaries at each node and periodically collect these summaries at a central node to perform re-partitioning. We use a greedy algorithm with novel affinity heuristics to revise the partitioning and update the routing tables without introducing a lengthy pause. We showcase the effectiveness of our approach using an application that clusters customers of a telecommunications company based on their aggregate calling profiles.Item Open Access A disk-based graph database system with incremental storage layout optimization(2016-01) Çağatay, DoğukanThe world has become ever more connected, where the data generated by people, software systems, and the physical world is more accessible than before and is much larger in volume, variety, and velocity. In many application domains, such as telecommunications and social media, live data recording the relationships between people, systems, and the environment is available. This data often takes the form of a temporally evolving graph, where entities are the vertices and the relationships between them are the edges. For this reason, managing dynamic relationships represented by a graph structure has been a common requirement for modern data processing applications. Graph databases gained importance with the proliferation of such data processing applications. In this work, we developed a disk-based graph database system, which is able to manage incremental updates on the graph structure. The updates arrive in a streaming manner and the system creates and maintains an optimized storage layout for the graph in an incremental way. This optimized storage layout enables the database to support traversal based graph algorithms more e ciently, by minimizing the disk I/O required to execute them. The storage layout optimizations we develop aim at taking advantage of spatial locality of edges to minimize the traversal I/O cost, but achieves this in an incremental way as new edges/vertices get added and removed.