Diverse SNP selection for epistasis test prioritization
Author(s)
Advisor
Çiçek, A. ErcümentDate
2019-08Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
216
views
views
122
downloads
downloads
Abstract
Genome-wide association studies explain a fraction of the underlying heritability
of genetic diseases. Epistatic interactions between two or more loci help closing
the gap and identifying those complex interactions provides a promising road
to a better understanding of complex traits. Unfortunately, sheer number of
loci combinations to consider and hypotheses to test prohibit the process both
computationally and statistically. This is true even if only pairs of loci are considered.
Epistasis prioritization algorithms have proven useful for reducing the
computational burden and limiting the number of tests to perform. While current
methods aim at avoiding linkage disequilibrium and covering the case cohort,
none aims at diversifying the topological layout of the selected SNPs which can
detect complementary variants. In this thesis, a two stage pipeline to prioritize
epistasis test is proposed. In the first step, a submodular set function is
optimized to select a diverse set of SNPs that span the underlying genome to
(i) avoid linkage disequilibrium and (ii) pair SNPs that relate to complementary
function. In the second step, selected SNPs are used as seeds to a fast epistasis
detection algorithm. The algorithm is compared with the state-of-the-art method
LinDen on three datasets retrieved from Wellcome Trust Case Control Consortium:
type two diabates, hypertension and bipolar disorder. The results show
that the pipeline drastically reduces the number of tests to perform while the
number of statistically significant epistatic pairs discovered increases.