Hercules: a profile HMM-based hybrid error correction algorithm for long reads

buir.contributor.authorFırtına, Can
buir.contributor.authorAlkan, Can
buir.contributor.authorÇicek, A. Ercüment
dc.citation.issueNumber21en_US
dc.citation.volumeNumber46en_US
dc.contributor.authorFırtına, Canen_US
dc.contributor.authorBar-Joseph, Z.en_US
dc.contributor.authorAlkan, Canen_US
dc.contributor.authorÇicek, A. Ercümenten_US
dc.date.accessioned2019-02-23T07:23:43Z
dc.date.available2019-02-23T07:23:43Z
dc.date.issued2018en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractChoosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies.We proposeHercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17- 227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.en_US
dc.identifier.doi10.1093/nar/gky724en_US
dc.identifier.eissn1362-4962en_US
dc.identifier.issn0305-1048en_US
dc.identifier.urihttp://hdl.handle.net/11693/50559en_US
dc.language.isoEnglishen_US
dc.publisherOxford University Pressen_US
dc.relation.isversionofhttps://doi.org/10.1093/nar/gky724en_US
dc.source.titleNucleic Acids Researchen_US
dc.titleHercules: a profile HMM-based hybrid error correction algorithm for long readsen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hercules_a_profile_HMM-based_hybrid_error_correction.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: