Show simple item record

dc.contributor.authorEser, E.en_US
dc.contributor.authorCan, T.en_US
dc.contributor.authorFerhatosmanoglu H.en_US
dc.date.accessioned2016-02-08T10:33:39Z
dc.date.available2016-02-08T10:33:39Z
dc.date.issued2014en_US
dc.identifier.issn19326203
dc.identifier.urihttp://hdl.handle.net/11693/24736
dc.description.abstractSequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSIBLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequencebased and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST © 2014 Eser et al.en_US
dc.language.isoEnglishen_US
dc.source.titlePLoS ONEen_US
dc.relation.isversionofhttp://dx.doi.org/10.1371/journal.pone.0115445en_US
dc.subjectalgorithmen_US
dc.subjectamino acid sequenceen_US
dc.subjectArticleen_US
dc.subjectbioinformaticsen_US
dc.subjectcontrolled studyen_US
dc.subjectdata analysis softwareen_US
dc.subjectdata extractionen_US
dc.subjectgene ontologyen_US
dc.subjectinformation processingen_US
dc.subjectinformation retrievalen_US
dc.subjectintermethod comparisonen_US
dc.subjectsequence alignmenten_US
dc.subjectsequence databaseen_US
dc.titleDiv-blast: Diversification of sequence search resultsen_US
dc.typeArticleen_US
dc.departmentDepartment of Computer Engineeringen_US
dc.citation.volumeNumber9en_US
dc.citation.issueNumber12en_US
dc.identifier.doi10.1371/journal.pone.0115445en_US
dc.publisherPublic Library of Scienceen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record