Data analytics for alarm management systems
Solmaz, Selçuk Emre
Item Usage Stats
Mobile network operators run Operations Support Systems (OSS) that produce vast amounts of alarm events. These events can have different significance levels, domains, and also can trigger other ones. Network Operators face the challenge to identify the significance and root causes of these system problems in real-time and to keep the number of remedial actions at an optimal level, so that customer satisfaction rates can be guaranteed at a reasonable cost. A solution containing alarm correlation, rule mining and root cause analysis is described to help scalable streaming alarm management systems. This solution is applied to Alarm Collector and Analyzer (ALACA), which is operated in the network operation center of a major mobile telecom provider. It is used for alarm event analyses, where the alarms are correlated and processed to find root-causes in a streaming fashion. The developed system includes a dynamic index for matching active alarms, an algorithm for generating candidate alarm rules, a sliding-window based approach to save system resources, and a graph based solution to identify root causes. ALACA helps operators to enhance the design of their alarm management systems by allowing continuous analysis of data and event streams and predict network behavior with respect to potential failures by using the results of root cause analysis. The experimental results that provide insights on performance of real-time alarm data analytics systems are presented.