Browsing by Subject "Statistical learning"

Now showing 1 - 2 of 2

Open Access
Online anomaly detection with nested trees
(Institute of Electrical and Electronics Engineers Inc., 2016) Delibalta, I.; Gokcesu, K.; Simsek, M.; Baruh, L.; Kozat, S. S.
We introduce an online anomaly detection algorithm that processes data in a sequential manner. At each time, the algorithm makes a new observation, produces a decision, and then adaptively updates all its parameters to enhance its performance. The algorithm mainly works in an unsupervised manner since in most real-life applications labeling the data is costly. Even so, whenever there is a feedback, the algorithm uses it for better adaptation. The algorithm has two stages. In the first stage, it constructs a score function similar to a probability density function to model the underlying nominal distribution (if there is one) or to fit to the observed data. In the second state, this score function is used to evaluate the newly observed data to provide the final decision. The decision is given after the well-known thresholding. We construct the score using a highly versatile and completely adaptive nested decision tree. Nested soft decision trees are used to partition the observation space in a hierarchical manner. We adaptively optimize every component of the tree, i.e., decision regions and probabilistic models at each node as well as the overall structure, based on the sequential performance. This extensive in-time adaptation provides strong modeling capabilities; however, it may cause overfitting. To mitigate the overfitting issues, we first use the intermediate nodes of the tree to produce several subtrees, which constitute all the models from coarser to full extend, and then adaptively combine them. By using a real-life dataset, we show that our algorithm significantly outperforms the state of the art. © 1994-2012 IEEE.
Open Access
Two learning approaches for protein name extraction
(Academic Press, 2009) Tatar, S.; Cicekli, I.
Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. In the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. Our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature. © 2009 Elsevier Inc. All rights reserved.