Ranking instances by maximizing the area under ROC curve
Author
Guvenir, H. A.
Kurtcephe, M.
Date
2013Source Title
IEEE Transactions on Knowledge & Data Engineering
Print ISSN
1041-4347
Publisher
Institute of Electrical and Electronics Engineers
Volume
25
Issue
10
Pages
2356 - 2366
Language
English
Type
ArticleItem Usage Stats
136
views
views
171
downloads
downloads
Abstract
In recent years, the problem of learning a real-valued function that induces a ranking over an instance space has gained importance in machine learning literature. Here, we propose a supervised algorithm that learns a ranking function, called ranking instances by maximizing the area under the ROC curve (RIMARC). Since the area under the ROC curve (AUC) is a widely accepted performance measure for evaluating the quality of ranking, the algorithm aims to maximize the AUC value directly. For a single categorical feature, we show the necessary and sufficient condition that any ranking function must satisfy to achieve the maximum AUC. We also sketch a method to discretize a continuous feature in a way to reach the maximum AUC as well. RIMARC uses a heuristic to extend this maximization to all features of a data set. The ranking function learned by the RIMARC algorithm is in a human-readable form; therefore, it provides valuable information to domain experts for decision making. Performance of RIMARC is evaluated on many real-life data sets by using different state-of-the-art algorithms. Evaluations of the AUC metric show that RIMARC achieves significantly better performance compared to other similar methods. © 1989-2012 IEEE.
Keywords
Data miningDecision support
Machine learning
Ranking
Area under roc curve (AUC)
Categorical features
Machine learning literature
Real - valued functions
State - of - the - art algorithms
Decision support systems
Information retrieval
Learning systems
Algorithms