Browsing by Subject "information retrieval"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Open Access Div-blast: Diversification of sequence search results(Public Library of Science, 2014) Eser, E.; Can, T.; Ferhatosmanoglu H.Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSIBLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequencebased and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST © 2014 Eser et al.Item Open Access On-line new event detection and tracking in a multi-resource environment(2001) Kurt, HakanAs the amount of electronically available information resources increase, the need for information also increases. Today, it is almost impossible for a person to keep track all the information resources and find new events as soon as possible. In this thesis, we present an on-line new event detection and tracking system, which automatically detects new events from multiple news resources and immediately start tracking events as they evolve. Since we implemented the on-line version of event detection approach, the novelty decision about a news story is done before processing the next one. We also present a new threshold, called support threshold, used in detection process to decrease the number of new event alarms, that are caused by informative and one-time-only news. The support threshold can be used to tune the weights of news resources. We implemented the tracking phase as an unsupervised learning process, that is, detected events are automatically tracked by training the system using the first news story of an event. Since events evolve over time, an unsupervised adaptation is used to retrain the tracking system in order to increase the tracking system performance. Adaptation is achieved by adding predicted documents to the training process. From the corpus observations, we conclude that one news story can be associated to more than one event. For this reason, the tracking system can relate a news story to more than one event. The on-line new event detection and tracking system has been tested on the Reuters news feed, available on the Internet. The Reuters news feed, that we used, comprises four independent news resources. The news stories are in Turkish.