Spadis: selecting predictive and diverse SNPS in GWAS

buir.advisorÇiçek, A. Ercüment.
dc.contributor.authorYılmaz, Serhan
dc.date.accessioned2018-08-02T11:51:37Z
dc.date.available2018-08-02T11:51:37Z
dc.date.copyright2018-05
dc.date.issued2018-05
dc.date.submitted2018-08-02
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2018.en_US
dc.descriptionIncludes bibliographical references (leaves 31-37).en_US
dc.description.abstractPhenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identi ed in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected Single Nucleotide Polymorphisms (SNPs) on SNP-SNP networks has been proven successful in nding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that a ect similar biological processes and therefore does not necessarily yield better predictive performance. To this end, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary e ects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor (1 − 1=e) approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous owering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identi es more candidate genes and runs faster. We also investigate the use of Hi-C data to construct SNP-SNP network in the context of SNP selection problem for the rst time, which yields improvements in regression performance across all methods.en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2018-08-02T11:51:37Z No. of bitstreams: 1 serhan_yilmaz_thesis.pdf: 9747540 bytes, checksum: f1c7fe42c6bc1bcf69297f078d4aaab8 (MD5)en
dc.description.provenanceMade available in DSpace on 2018-08-02T11:51:37Z (GMT). No. of bitstreams: 1 serhan_yilmaz_thesis.pdf: 9747540 bytes, checksum: f1c7fe42c6bc1bcf69297f078d4aaab8 (MD5) Previous issue date: 2018-08en
dc.description.statementofresponsibilityby Serhan Yılmaz.en_US
dc.format.extentxii, 53 leaves : charts, tables (some color) ; 30 cm.en_US
dc.identifier.itemidB158749
dc.identifier.urihttp://hdl.handle.net/11693/47721
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectGWASen_US
dc.subjectSNP Selectionen_US
dc.subjectSNP-SNP Networksen_US
dc.subjectHi-Cen_US
dc.subjectSubmodularityen_US
dc.titleSpadis: selecting predictive and diverse SNPS in GWASen_US
dc.title.alternativeSpadis: GWAS çalışmalarında açıklayıcı ve çeşitli SNP seçimien_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
serhan_yilmaz_thesis.pdf
Size:
9.3 MB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: