Abbasoğlu, M.A.Gedik, B.Ferhatosmanoğlu H.2016-02-082016-02-08201321508097http://hdl.handle.net/11693/20873Many telco analytics require maintaining call profiles based on recent customer call patterns. Such call profiles are typically organized as aggregations computed at different time scales over the recent customer interactions. Customer call profiles are key inputs for analytics targeted at improving operations, marketing, and sales of telco providers. Many of these analytics require clustering customer call profiles, so that customers with similar calling patterns can be modeled as a group. Example applications include optimizing tariffs, customer segmentation, and usage forecasting. In this demo, we present our system for scalable aggregate profile clustering in a streaming setting. We focus on managing anonymized segments of customers for tariff optimization. Due to the large number of customers, maintaining profile clusters have high processing and memory resource requirements. In order to tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over machines (nodes) such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, to adapt to potentially changing customer calling patterns, the partitioning of profiles to machines should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We provide a re-partitioning technique that achieves all these goals. We keep micro-cluster summaries at each node, collect these summaries at a centralize node, and use a greedy algorithm with novel affinity heuristics to revise the partitioning. We present a demo that showcases our Storm and Hbase based implementation of the proposed solution in the context of a customer segmentation application. © 2013 VLDB Endowment.EnglishClustering accuracyClustering customersCustomer interactionCustomer segmentationDifferent time scaleDistributed stream processingGreedy algorithmsOnline processingAggregatesDistributed parameter control systemsOptimizationSalesAggregate profile clustering for telco analyticsArticle