Regression by selecting best feature(s)
Güvenir, Halil Altay
Item Usage Stats
MetadataShow full item record
Two new machine learning methods, Regression by Selecting Best Feature Projections (RSBFP) and Regression by Selecting Best Features (RSBF), are presented for regression problems. These methods heavily make use of least squares regression to induce eager, parametric and context-sensitive models. Famous regression approaches of machine learning and statistics literature such as DART, MARS, RULE and kNN can not construct models that are both predictive and have reasonable training and/or querying time durations. We developed RSBFP and RSBF to fill the gap in the literature for a regression method having higher predictive accuracy and faster training and querying time durations. RSBFP constructs a decision list consisting of simple linear regression lines belonging to linear features and/or categorical feature segments. RSBF is the extended version of RSBFP such that the decision list consists of both simple, belonging to categorical feature segments, and/or multiple, belonging to linear features, linear regression lines. A relevancy heuristic has been developed to determine the features involved in the multiple regression lines. It is shown that the proposed methods are robust to irrelevant features, missing feature values and target feature noise, which make them suitable prediction tools for real-world databases. In terms of robustness, RSBFP and RSBF give better results when compared to other famous regression methods.