MAGNET: understanding and improving the accuracy of genome pre-alignment filtering

Date

2017

Authors

Alser, M.
Mutlu, O.
Alkan C.

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Transactions on Internet Research

Print ISSN

1820-4503

Electronic ISSN

Publisher

I P S I

Volume

13

Issue

2

Pages

1 - 10

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

In the era of high throughput DNA sequencing (HTS) technologies, calculating the edit distance (i.e.,the minimum number of substitutions, insertions, and deletionsbetween a pair of sequences) forbillions of genomicsequences is the computational bottleneck intoday’s read mappers. The shifted Hamming distance (SHD) algorithm proposes afast filtering strategy that can rapidly filter out invalid mappings that have more edits than allowed. However, SHD shows high inaccuracy in its filtering by admitting invalid mappings to be marked as correct ones. This wastesthe execution time and imposesa large computational burden. In this work, we comprehensively investigate foursources that lead to the filtering inaccuracy. We propose MAGNET, anewfiltering strategy that maintains high accuracy across different edit distance thresholds and data sets. It significantly improvestheaccuracy of pre-alignment filtering by one to twoordersof magnitude.The MATLAB implementationsof MAGNETand SHDareopen source and available at:https://github.com/BilkentCompGen/MAGNET.

Course

Other identifiers

Book Title

Citation

item.page.isversionof