A discretization method based on maximizing the area under receiver operating characteristic curve
dc.citation.epage | 26 | en_US |
dc.citation.issueNumber | 1 | en_US |
dc.citation.spage | 1 | en_US |
dc.citation.volumeNumber | 27 | en_US |
dc.contributor.author | Kurtcephe, M. | en_US |
dc.contributor.author | Güvenir H. A. | en_US |
dc.date.accessioned | 2016-02-08T09:41:01Z | |
dc.date.available | 2016-02-08T09:41:01Z | |
dc.date.issued | 2013 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description.abstract | Many machine learning algorithms require the features to be categorical. Hence, they require all numeric-valued data to be discretized into intervals. In this paper, we present a new discretization method based on the receiver operating characteristics (ROC) Curve (AUC) measure. Maximum area under ROC curve-based discretization (MAD) is a global, static and supervised discretization method. MAD uses the sorted order of the continuous values of a feature and discretizes the feature in such a way that the AUC based on that feature is to be maximized. The proposed method is compared with alternative discretization methods such as ChiMerge, Entropy-Minimum Description Length Principle (MDLP), Fixed Frequency Discretization (FFD), and Proportional Discretization (PD). FFD and PD have been recently proposed and are designed for Naïve Bayes learning. ChiMerge is a merging discretization method as the MAD method. Evaluations are performed in terms of M-Measure, an AUC-based metric for multi-class classification, and accuracy values obtained from Naïve Bayes and Aggregating One-Dependence Estimators (AODE) algorithms by using real-world datasets. Empirical results show that MAD is a strong candidate to be a good alternative to other discretization methods. © 2013 World Scientific Publishing Company. | en_US |
dc.description.provenance | Made available in DSpace on 2016-02-08T09:41:01Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2013 | en |
dc.identifier.doi | 10.1142/S021800141350002X | en_US |
dc.identifier.issn | 0218-0014 | |
dc.identifier.uri | http://hdl.handle.net/11693/21091 | |
dc.language.iso | English | en_US |
dc.publisher | World Scientific Publishing Co. Pte. Ltd. | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1142/S021800141350002X | en_US |
dc.source.title | International Journal of Pattern Recognition and Artificial Intelligence | en_US |
dc.subject | Area under ROC curve | en_US |
dc.subject | Data mining | en_US |
dc.subject | Discretization | en_US |
dc.subject | Area under roc curve (AUC) | en_US |
dc.subject | Discretization method | en_US |
dc.title | A discretization method based on maximizing the area under receiver operating characteristic curve | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- A discretization method based on maximizing the area under receiver operating characteristic curve.pdf
- Size:
- 358.37 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version