TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering

buir.contributor.authorAlkan, Can
buir.contributor.orcidAlkan, Can|0000-0002-5443-0706
dc.citation.epage1429306-13
dc.citation.spage1429306-1
dc.citation.volumeNumber15
dc.contributor.authorCavlak, Meryem Banu
dc.contributor.authorSingh, Gagandeep
dc.contributor.authorAlser, Mohammed
dc.contributor.authorFirtina, Can
dc.contributor.authorLindegger, Joel
dc.contributor.authorSadrosadati, Mohammad
dc.contributor.authorMansouri Ghiasi, Nika
dc.contributor.authorAlkan, Can
dc.contributor.authorMutlu, Onur
dc.date.accessioned2025-02-14T06:22:21Z
dc.date.available2025-02-14T06:22:21Z
dc.date.issued2024-10-28
dc.departmentDepartment of Computer Engineering
dc.description.abstractBasecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, that is, reads. State-of-the-art basecallers use complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally inefficient and memory-hungry, bottlenecking the entire genome analysis pipeline. However, for many applications, most reads do not match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall’s key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads, and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. Our thorough experimental evaluations show that TargetCall 1) improves the end-to-end basecalling runtime performance of the state-of-the-art basecaller by 3.31 × while maintaining high ( 98.88 % ) recall in keeping on-target reads, 2) maintains high accuracy in downstream analysis, and 3) achieves better runtime performance, throughput, recall, precision, and generality than prior works.
dc.identifier.doi10.3389/fgene.2024.1429306
dc.identifier.eissn1664-8021
dc.identifier.urihttps://hdl.handle.net/11693/116253
dc.language.isoEnglish
dc.publisherFrontiers Research Foundation
dc.relation.isversionofhttps://doi.org/10.3389/fgene.2024.1429306
dc.rightsCC BY 4.0 DEED (Attribution 4.0 International)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.titleFrontiers in Genetics
dc.subjectNanopore sequencing
dc.subjectBasecalling
dc.subjectDeep learning
dc.subjectFiltering
dc.subjectEfficiency
dc.titleTargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TargetCall_eliminating_the_wasted_computation_in_basecalling_via_pre-basecalling_filtering.pdf
Size:
1.77 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: