Diverse sequence search and alignment
Author
Eser, Elif
Advisor
Ferhatosmanoğlu, Hakan
Date
2013Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
63
views
views
16
downloads
downloads
Abstract
Sequence similarity tools, such as BLAST, seek sequences from a database most
similar to a query. They return results signi cantly similar to the query sequence
that are typically also highly similar to each other. Most sequence analysis tasks
in bioinformatics require an exploratory approach where the initial results guide
the user to new searches. However, diversity has not been considered as an
integral component of sequence search tools yet. Repetitions in the result can be
avoided by introducing non-redundancy during database construction; however,
it is not feasible to dynamically set a level of non-redundancy tailored to a query
sequence. We introduce the problem of diverse search and browsing in sequence
databases that produces non-redundant results optimized for any given query. We
de ne diversity measures for sequences, and propose methods to obtain diverse
results extracted from current sequence similarity search tools. We propose a new
measure to evaluate the diversity of a set of sequences that is returned as a result
of a similarity query. We evaluate the e ectiveness of the proposed methods in
post-processing PSI-BLAST results. We also assess the functional diversity of the
returned results based on available Gene Ontology annotations. Our experiments
show that the proposed methods are able to achieve more diverse yet similar result
sets compared to static non-redundancy approaches. In both sequence based and
functional diversity evaluation, the proposed diversi cation methods outperform
original BLAST results signi cantly. We built an online diverse sequence search
tool Div-BLAST that supports queries using BLAST web services. It re-ranks
the results diversely according to given parameters.