Browsing by Subject "Structural variation"
Now showing 1 - 8 of 8
- Results Per Page
- Sort Options
Item Open Access Breakpoint refinement of genomic structural variation using split read analysis(2019-09) İçen, BalanurGenomic variations that vary from single nucleotide polymorphisms (SNPs), small INDELs to structural variations (SVs) are discovered to have significant phenotypic effects on individuals. Among these genomic variations, SVs are changes that affect more than 50 nucleotides of DNA. SVs are linked to the sources of many genetic diseases such as autism, schizophrenia and chronic myelogenous leukemia. Accurate and precise characterization of these structural variants not only enables us to diagnose genetic diseases that are previously correlated with them but also it provides more reliable information to pursue higher levels of research in the genomic research pipelines. There are many SV detection tools that aim to find the approximate locations of SVs in genome, a further step in the pipeline is to refine those breakpoints of variants by a closer and more focused examination. By this means, genotyping step of structural variations would be faster using k-mer based alignment-free methods and more accurate since locations of SVs will be known with 1-5 base pair resolution compared to 300 - 500 base pair long confidence intervals. Moreover, further steps in the genomic pipelines based on the results of SV detection algorithms would have more definite data to build up on. In this thesis, we propose BROSV (Breakpoint Refinement of Structural Variation), a breakpoint refinement algorithm to obtain better resolution on SV breakpoints with split read analysis and local assembly methods using Illumina short reads and BWA alignment tool. Implementation is available at https://github.com/BilkentCompGen/brosv.Item Open Access Characterization of structural variation through assembly-to-assembly comparison(2024-09) Çoktalaş, Muhammet RafiStructural variations (SVs) are genomic variations affecting more than 50 nucleotides of DNA. SVs play a crucial role in evolution and have critical phenotypic effects on organisms, such as genetic diseases in humans like autism, schizophrenia, epilepsy, and cancer. Thus, SV characterization is of great significance. In the past, read-based methodologies were utilized due to the infeasibility of constructing genome assemblies. However, with technological advancements, assembling genomes has become significantly more feasible, and complete assemblies of human and other primate genomes have been constructed. Despite the high-quality assemblies, SV discovery in human genomes remains challenging due to the genome's repetitive nature and complex rearrangements caused by a combination of SVs. Most existing SV discovery tools operating on genome assemblies require whole genome alignments, leading to high preprocessing times and memory usage. Therefore, new algorithms are still needed to efficiently discover SVs. Here, we propose Strive, a linear time algorithm that operates on genome assembly sketches instead of whole genome alignments to characterize insertions, deletions, and inversions. We evaluated the performance Strive with two experiments: simulated data from the human reference genome (GRCh38.p14 / hg38) and real data using a full genome assembly from the Telomere to Telomere Consortium (CHM13). Strive is able to accurately detect insertions, deletions, and inversions in 11 to 12 seconds in addition to preprocessing times ranging from 50 to 55 seconds. Strive achieved over 95% precision and recall values in the simulations without duplications. In the simulations that included segmental duplications and SNPs and in the experiment with CHM13 assembly, although still maintaining over 95% recall in inversion discovery, the precision and recall for insertions and deletions were lower, suggesting a need for increased robustness to duplications.Item Open Access Discovery of large genomic inversions using long range information(BioMed Central Ltd., 2017) Rasekh, M. E.; Chiatante, G.; Miroballo, M.; Tang, J.; Ventura M.; Amemiya, C. T.; Eichler, E. E.; Antonacci, F.; Alkan C.Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies. Results: Here we propose a novel algorithm, Valor, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of Valor using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of Valor against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data. Conclusions: In this paper, we show that Valor is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using Valor, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. Valor is available at https://github.com/BilkentCompGen/Valor. © 2017 The Author(s).Item Open Access Large structural variation discovery using long reads with several degrees of error(2020-12) Ebren, EzgiGenomic structural variations (SVs) are briefly defined as large-scale alterations of DNA content, copy, and organization. Although significant progress has been made since the introduction of high throughput sequencing (HTS) in character-izing SVs, accurate detection of complex SVs and balanced rearrangements still remains elusive due to the sequence complexity at the breakpoints. Until very recently, the difficulty of read mapping in such regions when the reads were short and the high error rates of long read platforms kept the problem challenging. However, with the introduction of the Pacific Biosciences’ High Fidelity (HiFi) sequencing methodology, powerful SV detection and breakpoint resolution be-came possible as a result of its capability to produce highly accurate (> 99%) long reads (10 − 20 kbps). Here, we introduce DALEK, a novel algorithm that aims to use long-read tech-nologies to discover large structural variations with high break-point resolution. DALEK uses split read and read depth signatures from long read data to dis-cover large (≥ 10 kbps) deletions, inversions and segmental duplications. We also develop methods to detect large SVs in existing high-error Oxford Nanopore Technologies data.Item Open Access Paralog-specific gene copy number discovery within segmental duplications(2019-09) Doğru, EmreWith the advancing technology in genome sequencing and analysis, it has become evident that the structural variations are the main source of alteration in human genome. Despite their signi cance in understanding disease susceptibility, there is no algorithm yet to nd all types and sizes of structural variations at once. Structural variation discovery remained problematic since they often overlap with the segmental duplications, nearly identical segments of DNA that appear more than once in the genome. Researchers often excluded these regions that made up 5% of the genome because of the complexity it brings to their studies. Only few of them are working in these regions, however, they require a special sequence alignment le where reads are mapped to multiple locations. Here, we present ParaCoND to discover paralog speci c gene copy number within segmental duplications using a sequence alignment le with unique mapping. We utilize the singly unique nucleotides (SUN) that distinguish paralogs from each other in the sequence alignment of the duplicated regions. Our method is based on read depth and is limited to detect only duplications and deletions. We computed the absolute copy numbers of genes using only read depth of SUN. Furthermore, we also computed the paralog speci c absolute copy numbers for genes residing in the same segmental duplication.Item Open Access Rates and patterns of great ape retrotransposition(National Academy of Sciences, 2013) Hormozdiari, F.; Konkel, M. K.; Prado-Martinez, J.; Chiatante, G.; Herraez, I. H.; Walker, J. A.; Nelson, B.; Alkan, C.; Sudmant, P. H.; Huddleston, J.; Catacchio, C. R.; Ko, A.; Malig, M.; Baker, C.; Marques-Bonet, T.; Ventura, M.; Batzer, M. A.; Eichler, E. E.We analyzed 83 fully sequenced great ape genomes for mobile element insertions, predicting a total of 49,452 fixed and polymorphic Alu and long interspersed element 1 (L1) insertions not present in the human reference assembly and assigning each retrotransposition event to a different time point during great ape evolution. We used these homoplasy-free markers to construct a mobile element insertions-based phylogeny of humans and great apes and demonstrate their differential power to discern ape subspecies and populations. Within this context, we find a good correlation between L1 diversity and single-nucleotide polymorphism heterozygosity (r2 =0.65) in contrast to Alu repeats, which show little correlation (r2 =0.07). We estimate that the rate of Alu retrotransposition has differed by a factor of 15-fold in these lineages. Humans, chimpanzees, and bonobos show the highest rates of Alu accumulation-the latter two since divergence 1.5 Mya. The L1 insertion rate, in contrast, has remained relatively constant, with rates differing by less than a factor of three. We conclude that Alu retrotransposition has been the most variable form of genetic variation during recent human-great ape evolution, with increases and decreases occurring over very short periods of evolutionary time.Item Open Access Toolkit for automated and rapid discovery of structural variants(Academic Press, 2017) Soylev, A.; Kockan, C.; Hormozdiari, F.; Alkan C.Structural variations (SV) are broadly defined as genomic alterations that affect >50 bp of DNA, which are shown to have significant effect on evolution and disease. The advent of high throughput sequencing (HTS) technologies and the ability to perform whole genome sequencing (WGS), makes it feasible to study these variants in depth. However, discovery of all forms of SV using WGS has proven to be challenging as the short reads produced by the predominant HTS platforms (<200 bp for current technologies) and the fact that most genomes include large amounts of repeats make it very difficult to unambiguously map and accurately characterize such variants. Furthermore, existing tools for SV discovery are primarily developed for only a few of the SV types, which may have conflicting sequence signatures (i.e. read pairs, read depth, split reads) with other, untargeted SV classes. Here we are introduce a new framework, TARDIS, which combines multiple read signatures into a single package to characterize most SV types simultaneously, while preventing such conflicts. TARDIS also has a modular structure that makes it easy to extend for the discovery of additional forms of SV. © 2017 Elsevier Inc.Item Open Access VALOR2: characterization of large-scale structural variants using linked-reads(BioMed Central Ltd., 2020-03) Karaoğlanoğlu, Fatih; Ricketts, C.; Ebren, Ezgi; Rasekh, M. E.; Hajirasouliha, I.; Alkan, CanMost existing methods for structural variant detection focus on discovery and genotyping of deletions, insertions, and mobile elements. Detection of balanced structural variants with no gain or loss of genomic segments, for example, inversions and translocations, is a particularly challenging task. Furthermore, there are very few algorithms to predict the insertion locus of large interspersed segmental duplications and characterize translocations. Here, we propose novel algorithms to characterize large interspersed segmental duplications, inversions, deletions, and translocations using linked-read sequencing data. We redesign our earlier algorithm, VALOR, and implement our new algorithms in a new software package, called VALOR2.