Scalable streaming profile clustering for telco analytics

View/ Open
Author
Abbasoğlu, Mehmet Ali
Advisor
Güdükbay, Uğur
Date
2013Publisher
Bilkent University
Language
English
Type
Thesis
Metadata
Show full item recordPlease cite this item using this persistent URL
http://hdl.handle.net/11693/15857Abstract
Many telco analytics require maintaining call pro les based on recent customer
call patterns. Such pro les are typically organized as aggregations computed
at di erent time scales over the recent customer interactions. Clustering these
pro les is needed to group customers with similar calling patterns and to build
aggregate models for them. Example applications include optimizing tari s, segmentation,
and usage forecasting. In this thesis, we present an approach for
clustering pro les that are incrementally maintained over a stream of updates.
Due to the large number of customers, maintaining pro le clusters have high
processing and memory resource requirements. In order to tackle this problem,
we apply distributed stream processing. However, in the presence of distributed
state, it is a major challenge to partition the pro les over machines (nodes) such
that memory and computation balance is maintained, while keeping the clustering
accuracy high. Furthermore, to adapt to potentially changing customer
calling patterns, the partitioning of pro les to machines should be continuously
revised, yet one should minimize the migration of pro les so as not to disturb
the online processing of updates. We provide a re-partitioning technique that
achieves all these goals. We keep micro-cluster summaries at each node, collect
these summaries at a centralized node, and use a greedy algorithm with novel
a nity heuristics to revise the partitioning. We present a demo application that
showcases our Storm and Hbase based implementation in the context of a customer
segmentation application.