Alser, M.Mutlu, O.Alkan C.2019-02-132019-02-1320171820-4503http://hdl.handle.net/11693/49418In the era of high throughput DNA sequencing (HTS) technologies, calculating the edit distance (i.e.,the minimum number of substitutions, insertions, and deletionsbetween a pair of sequences) forbillions of genomicsequences is the computational bottleneck intoday’s read mappers. The shifted Hamming distance (SHD) algorithm proposes afast filtering strategy that can rapidly filter out invalid mappings that have more edits than allowed. However, SHD shows high inaccuracy in its filtering by admitting invalid mappings to be marked as correct ones. This wastesthe execution time and imposesa large computational burden. In this work, we comprehensively investigate foursources that lead to the filtering inaccuracy. We propose MAGNET, anewfiltering strategy that maintains high accuracy across different edit distance thresholds and data sets. It significantly improvestheaccuracy of pre-alignment filtering by one to twoordersof magnitude.The MATLAB implementationsof MAGNETand SHDareopen source and available at:https://github.com/BilkentCompGen/MAGNET.EnglishHigh throughput DNA sequencingRead mappingRead alignmentFalse positivesMAGNET: understanding and improving the accuracy of genome pre-alignment filteringArticle