Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Fırtına, Can; Bar-Joseph, Z.; Alkan, Can; Çicek, A. Ercüment

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Files

Hercules_a_profile_HMM-based_hybrid_error_correction.pdf (1.08 MB)

Date

2018

Authors

BUIR Usage Stats

3
views

24
downloads

Citation Stats

Abstract

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies.We proposeHercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17- 227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.

Source Title

Nucleic Acids Research

Publisher

Oxford University Press

Permalink

http://hdl.handle.net/11693/50559

Published Version (Please cite this version)

https://doi.org/10.1093/nar/gky724

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type