Potpourri: an epistasis test prioritization algorithm via diverse SNP selection
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Series
Abstract
Genome-wide association studies explain a fraction of the underlying heritability of genetic diseases. Investigating epistatic interactions between two or more loci help closing this gap. Unfortunately, sheer number of loci combinations to process and hypotheses to test prohibit the process both computationally and statistically. Epistasis test prioritization algorithms rank likely-epistatic SNP pairs to limit the number of tests. Yet, they still su_er from very low precision. It was shown in the literature that selecting SNPs that are individually correlated with the phenotype and also diverse with respect to genomic location, leads to better phenotype prediction due to genetic complementation. Here, we hypothesize that an algorithm that pairs SNPs from such diverse regions and carefully ranks the pairs can detect statistically more meaningful pairs and can improve prediction power. We propose an epistasis test prioritization algorithm which optimizes a submodular set function to select a diverse and complementary set of genomic regions that span the underlying genome. SNP pairs from these regions are then further ranked w.r.t. their co-coverage of the case cohort. We compare our algorithm with the state- of-the-art on three GWAS and show that (i) we substantially improve precision (from 0.003 to 0.652) while maintaining the signi_cance of selected pairs, (ii) decrease the number of tests by 25 folds, and (iii) decrease the runtime by 4 folds. We also show that promoting SNPs from regulatory/coding regions improves the precision (up to 0.8).