Prioritized binary transformation method for efficient multi-label classification of data streams with many labels

Series

Abstract

Real-time data processing systems generate huge amounts of data that need to be classified. The volume, variety, velocity, and veracity (uncertainty) of this data necessitate new approaches and the adaptation of existing classification methods. Moreover, the arriving data can belong to more than one class at the same time. As the number of labels grows larger, a significant portion of the multi-label data stream classification methods become computationally inefficient. We propose a novel online approach: the Prioritized Binary Transformation (PBT) method, which can classify data with large numbers of labels by ordering the labels using Principal Component Analysis (PCA) within a fixed-size window. This order is then used to transform the label vectors for classification. We perform an empirical analysis on 12 datasets and compare PBT to four prominent baselines using four evaluation metrics. PBT achieves the best average ranking in three of the four evaluation metrics. Moreover, we investigate efficiency under average execution time per data item and memory consumption where PBT achieves second and first average rankings, respectively. © 2024 Owner/Author.

Source Title

International Conference on Information and Knowledge Management, Proceedings

Publisher

Association for Computing Machinery

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)

Language

English