Apollo: A sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Fırtına, C.; Kim, J. S.; Alser, M.; Şenol Cali, D.; Çiçek, A. Ercüment; Alkan, Can; Mutlu, Onur

Apollo: A sequencing-technology-independent, scalable and accurate assembly polishing algorithm

buir.contributor.author	Çiçek, A. Ercüment
buir.contributor.author	Alkan, Can
buir.contributor.author	Mutlu, Onur
dc.citation.epage	3679	en_US
dc.citation.issueNumber	12	en_US
dc.citation.spage	3669	en_US
dc.citation.volumeNumber	36	en_US
dc.contributor.author	Fırtına, C.	en_US
dc.contributor.author	Kim, J. S.	en_US
dc.contributor.author	Alser, M.	en_US
dc.contributor.author	Şenol Cali, D.	en_US
dc.contributor.author	Çiçek, A. Ercüment	en_US
dc.contributor.author	Alkan, Can	en_US
dc.contributor.author	Mutlu, Onur	en_US
dc.date.accessioned	2021-02-25T13:07:43Z
dc.date.available	2021-02-25T13:07:43Z
dc.date.issued	2020-03
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	Motivation: Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technologydependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results: We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and thirdgeneration). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward– Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.	en_US
dc.identifier.doi	10.1093/bioinformatics/btaa179	en_US
dc.identifier.issn	1367-4803	en_US
dc.identifier.uri	http://hdl.handle.net/11693/75595	en_US
dc.language.iso	English	en_US
dc.publisher	Oxford University Press	en_US
dc.relation.isversionof	https://dx.doi.org/10.1093/bioinformatics/btaa179	en_US
dc.source.title	Bioinformatics	en_US
dc.title	Apollo: A sequencing-technology-independent, scalable and accurate assembly polishing algorithm	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Apollo_A_sequencing-technology-independent_scalable_and_accurate_assembly_polishing_algorithm.pdf
Size:: 453.81 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Scholarly Publications - Computer Engineering