Browsing by Subject "Sequencing"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Open Access Assessment and correction of errors in DNA sequencing technologies(2017-12) Fırtına, CanNext Generation Sequencing technologies differ by several parameters where the choice to use whether short or long read sequencing platforms often leads to trade-offs between accuracy and read length. In this thesis, I first demonstrate the problems in reproducibility in analyses using short reads. Our comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data shows that repeats might be prone to ambiguous mapping. Short reads are more vulnerable to repeats and, thus, may cause reproducibility problems. Next, I introduce a novel algorithm Hercules, the first machine learning-based long read error correction algorithm. Several studies require long and accurate reads including de novo assembly, fusion and structural variation detection. In such cases researchers often combine both technologies and the more erroneous long reads are corrected using the short reads. Current approaches rely on various graph based alignment techniques and do not take the error profile of the underlying technology into account. Memory- and time- efficient machine learning algorithms that address these shortcomings have the potential to achieve better and more accurate integration of these two technologies. Our algorithm models every long read as a profile Hidden Markov Model with respect to the underlying platform's error profile. The algorithm learns a posterior transition/ emission probability distribution for each long read and uses this to correct errors in these reads. Using datasets from two DNA-seq BAC clones (CH17-157L1 and CH17-227A2), and human brain cerebellum polyA RNA-seq, we show that Hercules-corrected reads have the highest mapping rate among all competing algorithms and highest accuracy when most of the basepairs of a long read are covered with short reads.Item Open Access Bias correction in finding copy number variation with using read depth-based methods in exome sequencing data(2014) Balcı, FatmaMedical research has striven for identifying the causes of disorders with the ultimate goal of establishing therapeutic treatments and finding cures since its early years. This aim is now becoming a reality thanks to recent developments in whole genome (WGS) and whole exome sequencing (WES). Despite the decrease in the cost of sequencing, WGS is still a very costly approach because of the need to evaluate large number of populations for more concise results. Therefore, sequencing only the protein coding regions (WES) is a more cost effective alternative. With the help of WES approach, most of the functionally important variants can be detected. Additionally, single nucleotide polymorphisms (SNPs) that are located within coding regions are the most common causes for Mendelian diseases (i.e. diseases caused by a single mutation). Moreover, WES approaches require less analysis effort compared to whole genome sequencing approaches since only 1% of whole genome is sequenced. Besides the advantages, there are also some shortcomings that need to be addressed such as biases in GC−content and probe efficiency. Although there are some previous studies on correcting GC−content related issues, there are no studies on correcting probe efficiency effect. In this thesis, we provide a formal study on the effects of both GC−content and probe efficiency on the distribution of read depth in exome sequencing data. The correction of probe efficiency will make it possible to develop new CNV discovery methods using exome sequencing data.Item Open Access The genetic structure of the Turkish population reveals high levels of variation and admixture(National Academy of Sciences, 2020-12-18) Kars, Meltem Ece; Başak, A. N.; Onat, Onur Emre; Bilguvar, K.; Choi, J.; Itan, Y.; Çağlar, C.; Palvadeau, R.; Casanova, J.-L.; Cooper, D. N.; Stenson, P. D.; Yavuz, A.; Buluş, H.; Günel, M.; Friedman, J. M.; Özçelik, TayfunThe construction of population-based variomes has contributed substantially to our understanding of the genetic basis of human inherited disease. Here, we investigated the genetic structure of Turkey from 3,362 unrelated subjects whose whole exomes (n = 2,589) or whole genomes (n = 773) were sequenced to generate a Turkish (TR) Variome that should serve to facilitate disease gene discovery in Turkey. Consistent with the history of present-day Turkey as a crossroads between Europe and Asia, we found extensive admixture between Balkan, Caucasus, Middle Eastern, and European populations with a closer genetic relationship of the TR population to Europeans than hitherto appreciated. We determined that 30% of TR individuals had high inbreeding coefficients (≥0.0156) with runs of homozygosity longer than 4 Mb being found exclusively in the TR population when compared to 1000 Genomes Project populations. We also found that 28% of exome and 49% of genome variants in the very rare range (allele frequency < 0.005) are unique to the modern TR population. We annotated these variants based on their functional consequences to establish a TR Variome containing alleles of potential medical relevance, a repository of homozygous loss-of-function variants and a TR reference panel for genotype imputation using high-quality haplotypes, to facilitate genome-wide association studies. In addition to providing information on the genetic structure of the modern TR population, these data provide an invaluable resource for future studies to identify variants that are associated with specific phenotypes as well as establishing the phenotypic consequences of mutations in specific genes.