Browsing by Subject "Decision trees"

Now showing 1 - 12 of 12

Open Access
Classifying human leg motions with uniaxial piezoelectric gyroscopes
(2009) Tunçel O.; Altun, K.; Barshan, B.
This paper provides a comparative study on the different techniques of classifying human leg motions that are performed using two low-cost uniaxial piezoelectric gyroscopes worn on the leg. A number of feature sets, extracted from the raw inertial sensor data in different ways, are used in the classification process. The classification techniques implemented and compared in this study are: Bayesian decision making (BDM), a rule-based algorithm (RBA) or decision tree, least-squares method (LSM), k-nearest neighbor algorithm (k-NN), dynamic time warping (DTW), support vector machines (SVM), and artificial neural networks (ANN). A performance comparison of these classification techniques is provided in terms of their correct differentiation rates, confusion matrices, computational cost, and training and storage requirements. Three different cross-validation techniques are employed to validate the classifiers. The results indicate that BDM, in general, results in the highest correct classification rate with relatively small computational cost. © 2009 by the authors.
Open Access
Energy minimizing vehicle routing problem
(Springer, 2007) Kara, İ.; Kara, Bahar Y.; Yetiş, M. K.
This paper proposes a new cost function based on distance and load of the vehicle for the Capacitated Vehicle Routing Problem. The vehicle-routing problem with this new load-based cost objective is called the Energy Minimizing Vehicle Routing Problem (EMVRP). Integer linear programming formulations with O(n 2) binary variables and O(n2) constraints are developed for the collection and delivery cases, separately. The proposed models are tested and illustrated by classical Capacitated Vehicle Routing Problem (CVRP) instances from the literature using CPLEX 8.0.
Open Access
Estimating the chance of success in IVF treatment using a ranking algorithm
(Springer, 2015) Güvenir, H. A.; Misirli, G.; Dilbaz, S.; Ozdegirmenci, O.; Demir, B.; Dilbaz, B.
In medicine, estimating the chance of success for treatment is important in deciding whether to begin the treatment or not. This paper focuses on the domain of in vitro fertilization (IVF), where estimating the outcome of a treatment is very crucial in the decision to proceed with treatment for both the clinicians and the infertile couples. IVF treatment is a stressful and costly process. It is very stressful for couples who want to have a baby. If an initial evaluation indicates a low pregnancy rate, decision of the couple may change not to start the IVF treatment. The aim of this study is twofold, firstly, to develop a technique that can be used to estimate the chance of success for a couple who wants to have a baby and secondly, to determine the attributes and their particular values affecting the outcome in IVF treatment. We propose a new technique, called success estimation using a ranking algorithm (SERA), for estimating the success of a treatment using a ranking-based algorithm. The particular ranking algorithm used here is RIMARC. The performance of the new algorithm is compared with two well-known algorithms that assign class probabilities to query instances. The algorithms used in the comparison are Naïve Bayes Classifier and Random Forest. The comparison is done in terms of area under the ROC curve, accuracy and execution time, using tenfold stratified cross-validation. The results indicate that the proposed SERA algorithm has a potential to be used successfully to estimate the probability of success in medical treatment.
Open Access
Expert advice ensemble for thyroid disease diagnosis
(IEEE, 2017) Qureshi, Muhammad Anjum; Ekşioğlu, Kubilay
Thyroid gland influences the metabolic processes of human body due to the fact that it produces hormones. Hyperthyroidism in caused due to increase in the production of thyroid hormones. In this paper a methodology using an online ensemble of decision trees to detect thyroid-related diseases is proposed. The aim of this work is to improve the diagnostic accuracy of thyroid disease. Initially, feature rejection method is applied to discard 10 irrelevant and redundant features from 29 features. Then, it's shown that the offline ensemble of decision trees provides higher performance than state-of-the-art methodologies. Afterwards, the exponential weights based online ensemble method is implemented which reaches comparable classification performance with offline methodology. The proposed system consists of three stages: feature rejection, training decision trees with different cost schemes and the online classification stage where each classifier is weighted using an exponential weight based algorithm. The performance of online algorithm increases as the number of samples increases, because it continuously updates the weights to improve accuracy. The achieved classification accuracy proves the robustness and effectiveness of online version of proposed system in thyroid disease diagnosis.
Open Access
Land cover classification with multi-sensor fusion of partly missing data
(American Society for Photogrammetry and Remote Sensing, 2009-05) Aksoy, S.; Koperski, K.; Tusk, C.; Marchisio, G.
We describe a system that uses decision tree-based tools for seamless acquisition of knowledge for classification of remotely sensed imagery. We concentrate on three important problems in this process: information fusion, model understandability, and handling of missing data. Importance of multi-sensor information fusion and the use of decision tree classifiers for such problems have been well-studied in the literature. However, these studies have been limited to the cases where all data sources have a full coverage for the scene under consideration. Our contribution in this paper is to show how decision tree classifiers can be learned with alternative (surrogate) decision nodes and result in models that are capable of dealing with missing data during both training and classification to handle cases where one or more measurements do not exist for some locations. We present detailed performance evaluation regarding the effectiveness of these classifiers for information fusion and feature selection, and study three different methods for handling missing data in comparative experiments. The results show that surrogate decisions incorporated into decision tree classifiers provide powerful models for fusing information from different data layers while being robust to missing data. © 2009 American Society for Photogrammetry and Remote Sensing.
Open Access
Mixed-integer second-order cone programming for lower hedging of American contingent claims in incomplete markets
(2013) Pınar, M. Ç.
We describe a challenging class of large mixed-integer second-order cone programming models which arise in computing the maximum price that a buyer is willing to disburse to acquire an American contingent claim in an incomplete financial market with no arbitrage opportunity. Taking the viewpoint of an investor who is willing to allow a controlled amount of risk by replacing the classical no-arbitrage assumption with a "no good-deal assumption" defined using an arbitrage-adjusted Sharpe ratio criterion we formulate the problem of computing the pricing and hedging of an American option in a financial market described by a multi-period, discrete-time, finite-state scenario tree as a large-scale mixed-integer conic optimization problem. We report computational results with off-the-shelf mixed-integer conic optimization software.
Open Access
Modeling non-stationary dynamics of spatio-temporal sequences with self-organizing point process models
(2021-06) Karaahmetoğlu, Oğuzhan
We investigate the challenging problem of modeling the non-stationary dynam-ics of spatio-temporal sequences for prediction applications. Spatio-temporal se-quence modeling has critical real-life applications such as natural disaster, social, and criminal event prediction. Even though this problem has been thoroughly studied, many approaches do not address the non-stationarity and sparsity of the spatio-temporal sequences, which are frequently observed in real-life sequences. Here, we introduce a novel prediction algorithm that is capable of modeling non-stationarity in both time and space. Moreover, our algorithm can model both densely and sparsely populated sequences. We partition the spatial region with a decision tree, where each node of the tree corresponds to a subregion. We model the event occurrences in di˙erent subregions in space with individual but inter-acting point processes. Our algorithm can jointly optimize the partitioning tree and the interacting point processes through a gradient-based optimization. We compare our approach with statistical models, probabilistic approaches, and deep learning based approaches, and show that our model achieves the best forecasting performance on real-life datasets such as earthquake and criminal event records.
Open Access
Novelty detection using soft partitioning and hierarchical models
(IEEE, 2017) Ergen, Tolga; Gökçesu, Kaan; Şimşek, Mustafa; Kozat, Süleyman Serdar
In this paper, we study novelty detection problem and introduce an online algorithm. The algorithm sequentially receives an observation, generates a decision and then updates its parameters. In the first step, to model the underlying distribution, algorithm constructs a score function. In the second step, this score function is used to make the final decision for the observed data. After thresholding procedure is applied, the final decision is made. We obtain the score using versatile and adaptive nested decision tree. We employ nested soft decision trees to partition the observation space in an hierarchical manner. Based on the sequential performance, we optimize all the components of the tree structure in an adaptive manner. Although this in time adaptation provides powerful modeling abilities, it might suffer from overfitting. To circumvent overfitting problem, we employ the intermediate nodes of tree in order to generate subtrees and we then combine them in an adaptive manner. The experiments illustrate that the introduced algorithm significantly outperforms the state of the art methods.
Open Access
Online classification with contextual exponential weights for disease diagnostics
(IEEE, 2017) Ekşioğlu, Kubilay; Qureshi, Muhammad Anjum; Tekin, Cem
In this paper, a novel online scheme for classification, which is based on the contextual-variant of Weighted Average Forecaster Algorithm is proposed. The proposed method adaptively partitions the data space based on contexts, and tradeoffs exploration and exploitation when fusing the predictions of the experts. The proposed algorithm is verified on disease data available in UCI Online Machine Learning Repository. These results prove the robustness, effectiveness and versatility in terms of performance and low computational cost of the proposed system in the field of medical diagnostics.
Open Access
Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units
(Oxford University Press, 2014-11) Barshan, B.; Yüksek, M. C.
This study provides a comparative assessment on the different techniques of classifying human activities performed while wearing inertial and magnetic sensor units on the chest, arms and legs. The gyroscope, accelerometer and the magnetometer in each unit are tri-axial. Naive Bayesian classifier, artificial neural networks (ANNs), dissimilarity-based classifier, three types of decision trees, Gaussian mixture models (GMMs) and support vector machines (SVMs) are considered. A feature set extracted from the raw sensor data using principal component analysis is used for classification. Three different cross-validation techniques are employed to validate the classifiers. A performance comparison of the classifiers is provided in terms of their correct differentiation rates, confusion matrices and computational cost. The highest correct differentiation rates are achieved with ANNs (99.2%), SVMs (99.2%) and a GMM (99.1%). GMMs may be preferable because of their lower computational requirements. Regarding the position of sensor units on the body, those worn on the legs are the most informative. Comparing the different sensor modalities indicates that if only a single sensor type is used, the highest classification rates are achieved with magnetometers, followed by accelerometers and gyroscopes. The study also provides a comparison between two commonly used open source machine learning environments (WEKA and PRTools) in terms of their functionality, manageability, classifier performance and execution times. © 2013 © The British Computer Society 2013. All rights reserved.
Open Access
Scene classification with random forests and object and color distributions
(IEEE, 2013) İşcen, Ahmet; Gölge, Eren; Armağan, Anıl; Duygulu, Pınar
We propose a method to recognize the scene of an image by finding the objects and the colors it contains. We approach this problem by creating a binary vector of detected objects and a histogram of the colors that the image contains. We then use these features to train a random forest classifier in order to determine the scene of each image. For class-based classifiers, our method gives comparable results with the state of art methods, such as Object Bank method, for the indoor scene dataset that we used. Additionally, while well-known methods are computationally expensive, our method has a low computational cost. © 2013 IEEE.
Open Access
Sequential churn prediction and analysis of cellular network users-a multi-class, multi-label perspective
(IEEE, 2017) Khan, Farhan; Kozat, Süleyman Serdar
We investigate the problem of churn detection and prediction using sequential cellular network data. We introduce a cleaning and preprocessing of the dataset that makes it suitable for the analysis. We draw a comparison of the churn prediction results from the-state-of-the-art algorithms such as the Gradient Boosting Trees, Random Forests, basic Long Short-Term Memory (LSTM) and Support Vector Machines (SVM). We achieve significant performance boost by incorporating the sequential nature of the data, imputing missing information and analyzing the effects of various features. This in turns makes the classifier rigorous enough to give highly accurate results. We emphasize on the sequential nature of the problem and seek algorithms that can track the variations in the data. We test and compare the performance of proposed algorithms using performance measures on real life cellular network data for churn detection.