Browsing by Author "Cavlak, Meryem Banu"

Now showing 1 - 3 of 3

Open Access
AirLift: a fast and comprehensive technique for remapping alignments between reference genomes
(IEEE, 2024-08-19) Kim, Jeremie S.; Firtina, Can; Cavlak, Meryem Banu; Çalı, Damla Şenol; Hajinazar, Nastaran; Alser, Mohammed; Alkan, Can; Mutlu, Onur
AirLift is the first read remapping tool that enables users to quickly and comprehensively map a read set, that had been previously mapped to one reference genome, to another similar reference. Users can then quickly run a downstream analysis of read sets for each latest reference release. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces the overall execution time to remap read sets between two reference genome versions by up to 27.4×. We validate our remapping results with GATK and find that AirLift provides high accuracy in identifying ground truth SNP/INDEL variants
Restricted
Mimar Sinan Dergisi
(Bilkent University, 2018) Biner, Burak Can; Erkan, Cihan; Cavlak, Meryem Banu; Öztürk, Mustafa Selçuk; Aktürk, Sait
Open Access
TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering
(Frontiers Research Foundation, 2024-10-28) Cavlak, Meryem Banu; Singh, Gagandeep; Alser, Mohammed; Firtina, Can; Lindegger, Joel; Sadrosadati, Mohammad; Mansouri Ghiasi, Nika; Alkan, Can; Mutlu, Onur
Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, that is, reads. State-of-the-art basecallers use complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally inefficient and memory-hungry, bottlenecking the entire genome analysis pipeline. However, for many applications, most reads do not match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall’s key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads, and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. Our thorough experimental evaluations show that TargetCall 1) improves the end-to-end basecalling runtime performance of the state-of-the-art basecaller by 3.31 × while maintaining high ( 98.88 % ) recall in keeping on-target reads, 2) maintains high accuracy in downstream analysis, and 3) achieves better runtime performance, throughput, recall, precision, and generality than prior works.