• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Computer Engineering
      • Dept. of Computer Engineering - Master's degree
      • View Item
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Computer Engineering
      • Dept. of Computer Engineering - Master's degree
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Spadis: selecting predictive and diverse SNPS in GWAS

      Thumbnail
      View / Download
      9.3 Mb
      Author
      Yılmaz, Serhan
      Advisor
      Çiçek, A. Ercüment.
      Date
      2018-08
      Publisher
      Bilkent University
      Language
      English
      Type
      Thesis
      Item Usage Stats
      129
      views
      63
      downloads
      Abstract
      Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identi ed in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected Single Nucleotide Polymorphisms (SNPs) on SNP-SNP networks has been proven successful in nding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that a ect similar biological processes and therefore does not necessarily yield better predictive performance. To this end, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary e ects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor (1 − 1=e) approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous owering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identi es more candidate genes and runs faster. We also investigate the use of Hi-C data to construct SNP-SNP network in the context of SNP selection problem for the rst time, which yields improvements in regression performance across all methods.
      Keywords
      GWAS
      SNP Selection
      SNP-SNP Networks
      Hi-C
      Submodularity
      Permalink
      http://hdl.handle.net/11693/47721
      Collections
      • Dept. of Computer Engineering - Master's degree 511
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      Copyright © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy