Large structural variation discovery using long reads with several degrees of error

Limited Access
This item is unavailable until:
2021-07-28
Date
2020-12
Editor(s)
Advisor
Alkan, Can
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Bilkent University
Volume
Issue
Pages
Language
English
Journal Title
Journal ISSN
Volume Title
Series
Abstract

Genomic structural variations (SVs) are briefly defined as large-scale alterations of DNA content, copy, and organization. Although significant progress has been made since the introduction of high throughput sequencing (HTS) in character-izing SVs, accurate detection of complex SVs and balanced rearrangements still remains elusive due to the sequence complexity at the breakpoints. Until very recently, the difficulty of read mapping in such regions when the reads were short and the high error rates of long read platforms kept the problem challenging. However, with the introduction of the Pacific Biosciences’ High Fidelity (HiFi) sequencing methodology, powerful SV detection and breakpoint resolution be-came possible as a result of its capability to produce highly accurate (> 99%) long reads (10 − 20 kbps). Here, we introduce DALEK, a novel algorithm that aims to use long-read tech-nologies to discover large structural variations with high break-point resolution. DALEK uses split read and read depth signatures from long read data to dis-cover large (≥ 10 kbps) deletions, inversions and segmental duplications. We also develop methods to detect large SVs in existing high-error Oxford Nanopore Technologies data.

Course
Other identifiers
Book Title
Citation
Published Version (Please cite this version)