Browsing by Subject "data mining"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Open Access Big-data streaming applications scheduling based on staged multi-armed bandits(Institute of Electrical and Electronics Engineers, 2016) Kanoun, K.; Tekin, C.; Atienza, D.; Van Der Schaar, M.Several techniques have been recently proposed to adapt Big-Data streaming applications to existing many core platforms. Among these techniques, online reinforcement learning methods have been proposed that learn how to adapt at run-time the throughput and resources allocated to the various streaming tasks depending on dynamically changing data stream characteristics and the desired applications performance (e.g., accuracy). However, most of state-of-the-art techniques consider only one single stream input in its application model input and assume that the system knows the amount of resources to allocate to each task to achieve a desired performance. To address these limitations, in this paper we propose a new systematic and efficient methodology and associated algorithms for online learning and energy-efficient scheduling of Big-Data streaming applications with multiple streams on many core systems with resource constraints. We formalize the problem of multi-stream scheduling as a staged decision problem in which the performance obtained for various resource allocations is unknown. The proposed scheduling methodology uses a novel class of online adaptive learning techniques which we refer to as staged multi-armed bandits (S-MAB). Our scheduler is able to learn online which processing method to assign to each stream and how to allocate its resources over time in order to maximize the performance on the fly, at run-time, without having access to any offline information. The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that has full knowledge of the input stream where the differences in throughput, observed quality, resource usage and energy efficiency are less than 1, 0.3, 0.2 and 4 percent respectively.Item Open Access Characteristics of Web-based textual communications(2012) Küçükyılmaz, TayfunIn this thesis, we analyze different aspects of Web-based textual communications and argue that all such communications share some common properties. In order to provide practical evidence for the validity of this argument, we focus on two common properties by examining these properties on various types of Web-based textual communications data. These properties are: All Web-based communications contain features attributable to their author and reciever; and all Web-based communications exhibit similar heavy tailed distributional properties. In order to provide practical proof for the validity of our claims, we provide three practical, real life research problems and exploit the proposed common properties of Web-based textual communications to find practical solutions to these problems. In this work, we first provide a feature-based result caching framework for real life search engines. To this end, we mined attributes from user queries in order to classify queries and estimate a quality metric for giving admission and eviction decisions for the query result cache. Second, we analyzed messages of an online chat server in order to predict user and mesage attributes. Our results show that several user- and message-based attributes can be predicted with significant occuracy using both chat message- and writing-style based features of the chat users. Third, we provide a parallel framework for in-memory construction of term partitioned inverted indexes. In this work, in order to minimize the total communication time between processors, we provide a bucketing scheme that is based on term-based distributional properties of Web page contents.Item Open Access Development of a WEB application(2011) Kaya, Koray DoğanmicroRNAs, small non-coding RNA molecules with important roles in cellular machinery, target mRNAs for silencing by binding generally to their 3’ UTR sequences via partial base complementation. Thus, microRNAs with similar sequences also might exhibit expression and/or functional similarities. In this study, a modular tool, mESAdb (http://konulab.fen.bilkent.edu.tr/mirna/), was developed allowing for multivariate analysis of sequences and expression of microRNAs from multiple taxa. Its framework comprises PHP, JavaScript, packages in the R language, and a database storing mature microRNA sequences along with microRNA targets and selected expression data sets for human, mouse and zebrafish. mESAdb allows for: (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by a sequence motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HuGE Navigator, KEGG and GO. mESAdb also permits user specified dataset upload for these analyses. Herein, utility of mESAdb was illustrated using different datasets and case studies. First, it was shown that microRNAs carrying the embryonic stem cell specific seed sequence, ‘AAGTGC’, were able to discriminate between normal and tumor tissues from hepatocellular carcinoma patients using dataset GSE10694. Second, mRNA targets of a set of liver specific microRNAs were annotated with human diseases based on HuGE Navigator. Third, the similarity between mouse and human tissue specificity of a given set of microRNAs was demonstrated. Forth, CHRNA5 targeting microRNAs were associated with estrogen receptor status in breast cancer using dataset GSE15885. Finally, a related tool under development for mRNA arrays planned for integration with mESAdb was presented.Item Open Access Modeling interestingness of streaming association rules as a benefit maximizing classification problem(2009) Aydın, TolgaIn a typical application of association rule learning from market basket data, a set of transactions for a fixed period of time is used as input to rule learning algorithms. For example, the well-known Apriori algorithm can be applied to learn a set of association rules from such a transaction set. However, learning association rules from a set of transactions is not a one-time only process. For example, a market manager may perform the association rule learning process once every month over the set of transactions collected through the previous month. For this reason, we will consider the problem where transaction sets are input to the system as a stream of packages. The sets of transactions may come in varying sizes and in varying periods. Once a set of transactions arrives, the association rule learning algorithm is run on the last set of transactions, resulting in a new set of association rules. Therefore, the set of association rules learned will accumulate and increase in number over time, making the mining of interesting ones out of this enlarging set of association rules impractical for human experts. We refer to this sequence of rules as “association rule set stream” or “streaming association rules” and the main motivation behind this research is to develop a technique to overcome the interesting rule selection problem. A successful association rule mining system should select and present only the interesting rules to the domain experts. However, definition of interestingness of association rules on a given domain usually differs from one expert to the other and also over time for a given expert. In this thesis, we propose a post-processing method to learn a subjective model for the interestingness concept description of the streaming association rules. The uniqueness of the proposed method is its ability to formulate the interestingness issue of association rules as a benefit-maximizing classification problem and obtain a different interestingness model for each user. In this new classification scheme, the determining features are the selective objective interestingness factors, including the rule’s content itself, related to the interestingness of the association rules; and the target feature is the interestingness label of those rules. The proposed method works incrementally and employs user interactivity at a certain level. It is evaluated on a real supermarket dataset. The results show that the model can successfully select the interesting ones.Item Open Access Predicting risk of mortality in patients undergoing cardiovascular surgery(2008) Tunca, AyşenIt is very important to inform the patients and their relatives about the risk of mortality before a cardiovascular operation. For this respect, a model called EuroSCORE (The European System for Cardiac Operative Risk Evaluation) has been developed by European cardiovascular surgeons. This system gives the risk of mortality during or 30 days after the operation, based on the values of some parameters measured before the operation. The model used by EuroSCORE has been developed by statistical data gathered from large number of operations performed in Europe. Even though due to the surgical techniques that have been developed recently and the risk of mortality has been reduced in a large extent, predicting that risk as accurately as possible is still primary concern for the patients and their relatives in cardiovascular operations. The risk of operation also essentially tells the surgeon how a patient with similar comorbidity would be expected to fare based on a standard care. The risk of patient is also important for the health insurance companies, both public or private. In the context of this project, a model that can be used for mortality is developed. In this research project, a database system for storing data about cardiovascular operations performed in Turkish hospitals, a web application for gathering data, and a machine learning system on this database to learn a risk model, similar to EuroSCORE, are developed. This thesis proposes a risk estimation system for predicting the risk of mortality in patients undergoing cardiovascular operations by maximizing the Area under the Receiver Operating Characteristic (ROC) Curve (AUC). When the genetic characteristics and life styles of Turkish patients are taken into consideration, it is highly probable that the mortality risks of Turkish patients may be different than European patients. This thesis also intends to investigate this issue.Item Open Access Using a data mining approach for the prediction of user movements in mobile environments(2003) Yavaş, GökhanMobility prediction is one of the most essential issues that need to be explored for mobility management in mobile computing systems. In this thesis, we propose a new algorithm for predicting the next inter-cell movement of a mobile user in a Personal Communication Systems network. In the first phase of our three-phase algorithm, user mobility patterns are mined from the history of mobile user trajectories. In the second phase, mobility rules are extracted from these patterns, and in the last phase, mobility predictions are accomplished by using these rules. The performance of the proposed algorithm is evaluated through simulation as compared to two other prediction methods. The performance results obtained in terms of Precision and Recall indicate that our method can make more accurate predictions than the other methods.