Browsing by Subject "Whole genome sequencing"

Now showing 1 - 3 of 3

Open Access
Automatic characterization of copy number polymorphism using high throughput sequencing
(TÜBİTAK, 2020) Alkan, Can
Genome structural variation, broadly defined as alterations longer than 50 bp, are important sources for genetic variation among humans, including those that cause complex diseases such as autism, developmental delay, and schizophrenia. Although there has been considerable progress in characterizing structural variation since the beginnings of the 1000 Genomes Project, one form of structural variation called segmental duplications (SDs) remained largely understudied in large cohorts. This is mostly because SDs cannot be accurately discovered using the alignment files generated with standard read mapping tools. Instead, they can only be found when multiple map locations are considered. There is still a single algorithm available for SD discovery, which includes various tools and scripts that are not portable and are difficult to use. Additionally, this algorithm relies on a priori information for regions where no structural variations are discovered in large number of genomes. Therefore, there is a need for fully automated, portable, and user-friendly tools to make SD characterization a part of genome analyses. Here we introduce such an algorithm and efficient implementation, called mrCaNaVaR, that aims to fill this gap in genome analysis toolbox.
Open Access
Characterization of the fine-scale genetic structure of the Turkish population
(2022-01) Kars, Meltem Ece
The construction of population-based genetic resources plays a pivotal role in the study of human biology and disease. In this study, the ﬁne-scale genetic structure of the Turkish (TR) population was characterized using the whole-exome (WES, n =2,589)andwhole-genome(WGS, n =773)sequencesof3,362unrelatedin-dividuals from Turkey. Signiﬁcant levels of admixture from Balkan, Caucasus, Middle East, and Europe were detected in the TR subregions, consistent with the history of Anatolia. Results of the population structure analyses showed that the TR and European populations have a closer genetic relationship than previously appreciated. Inbreeding coeﬃcient calculations and runs of homozygosity analysis reﬂected the unique eﬀects of the high rate of consanguineous marriage on the TR genome. A TR Variome comprising over 40 million variants was constructed using the data generated in this study. Derived allele frequency (DAF) calculations revealed that 28% of TR-WES and 49% of TR-WGS variants in the very rare frequency bins (DAF < 0.005) were not listed in the Genome Aggregation Database. The lists of clinically-relevant variants and human gene knockouts in the TR Variome were also listed in this study, presenting the potential of the TR Variome being an invaluable resource for future disease gene identiﬁcation studies. Additionally, a reference panel for genotype imputation was generated using TR-WGS data. Since this panel signiﬁcantly increased imputation accuracy in both TR and neighboring populations, it will probably facilitate genome-wide association studies in these populations. In the second part of the study, the sequencing data of a total of 3,599 unrelated TR individuals were assessed for previously reported pathogenic (RP) variants and predicted pathogenic (PP) variants in Online Inheritance in Men (OMIM) genes associated with a pheno-type. Analyses revealed that no less than 70% of TR people have at least 1 RP variant, and all individuals possess at least one RP and/or PP variant in their genome. Moreover, 25% of individuals carried at least one RP variant in the newborn screening genes. Each individual in the study also had at least a 1 in 17 chance of carrying an RP variant in one of the 73 American College of Medical Genetics recommended actionable genes. MEFV, ABCA4, CYP21A2, PAH,and CFTR displayed the highest cumulative carrier frequencies (CF), consistent with the high prevalence of the phenotypes they are responsible for. By estimating the CF and genetic prevalence in 3,251 OMIM genes using RP and PP variants, this study presents the most comprehensive data so far demonstrating the landscape of genetic disease in the TR population.
Open Access
Characterizing microsatellite polymorphisms using assembly-based and mapping-based tools
(Scientific and Technical Research Council of Turkey, 2019) Demir, Gülfem; Alkan, Can
Microsatellite polymorphism has always been a challenge for genome assembly and sequence alignment due to sequencing errors, short read lengths, and high incidence of polymerase slippage in microsatellite regions. Despite the information they carry being very valuable, microsatellite variations have not gained enough attention to be a routine step in genome sequence analysis pipelines. After the completion of the 1000 Genomes Project, which aimed to establish the most detailed genetic variation catalog for humans, the consortium released only two microsatellite prediction sets generated by two tools. Many other large research efforts have failed to shed light on microsatellite variations. We evaluated the performance of three different local assembly methods on three different experimental settings, focusing on genotype-based performance, coverage impact, and preprocessing including flanking regions. All these experiments supported our initial expectations on assembly. We also demonstrate that overlap-layout-consensus (OLC)-based assembly methods show higher sensitivity to microsatellite variant calling when compared to a de Bruijn graph-based approach. We conclude that assembly with OLC is the better method for genotyping microsatellites. Our pipeline is available at https://github.com/ gulfemd/STRAssembly.