Aggregate profile clustering for streaming analytics

dc.citation.epage2108en_US
dc.citation.issueNumber9en_US
dc.citation.spage2092en_US
dc.citation.volumeNumber58en_US
dc.contributor.authorAbbasoğlu, M. A.en_US
dc.contributor.authorGedk, B.en_US
dc.contributor.authorFerhatosmanoğu H.en_US
dc.date.accessioned2016-02-08T11:02:36Z
dc.date.available2016-02-08T11:02:36Z
dc.date.issued2015en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractMany analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them. In this paper, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Owing to the potentially large number of users and high rate of interactions, maintaining profile clusters can have high processing and memory resource requirements. To tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over nodes such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, in order to adapt to potentially changing user interaction patterns, the partitioning of profiles to nodes should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We develop a re-partitioning technique that achieves all these goals. To achieve this, we keep micro-cluster summaries at each node and periodically collect these summaries at a central node to perform re-partitioning. We use a greedy algorithm with novel affinity heuristics to revise the partitioning and update the routing tables without introducing a lengthy pause. We showcase the effectiveness of our approach using an application that clusters customers of a telecommunications company based on their aggregate calling profiles.en_US
dc.identifier.doi10.1093/comjnl/bxv023en_US
dc.identifier.issn0010-4620
dc.identifier.urihttp://hdl.handle.net/11693/26631
dc.language.isoEnglishen_US
dc.publisherOxford University Pressen_US
dc.relation.isversionofhttp://dx.doi.org/10.1093/comjnl/bxv023en_US
dc.source.titleThe Computer Journalen_US
dc.subjectAggregate profile clusteringen_US
dc.subjectData streamingen_US
dc.subjectDistributed clusteringen_US
dc.subjectDistributed parameter control systemsen_US
dc.subjectClustering accuracyen_US
dc.subjectDistributed stateen_US
dc.subjectDistributed stream processingen_US
dc.subjectGreedy algorithmsen_US
dc.subjectOnline processingen_US
dc.subjectPartitioning techniquesen_US
dc.subjectAggregatesen_US
dc.titleAggregate profile clustering for streaming analyticsen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Aggregate profile clustering for streaming analytics.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
Description:
Full Printable Version