Quantifying and protecting genomic privacy
Author(s)
Advisor
Date
2018-08Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
140
views
views
14
downloads
downloads
Abstract
Today, genome sequencing is more accessible and affordable than ever. It is also
possible for individuals to share their genomic data with service providers or on
public websites. Although genomic data has significant impact and widespread
usage on medical research, it puts individuals' privacy in danger, even if they
anonymously or partially share their genomic data. In this work, first, we improve
the existing work on inference attack on genomic privacy using observable
Markov model, recombination model between the haplotypes, kinship relations,
and phenotypic traits. Then to address this privacy concern, we present a differential
privacy-based framework for sharing individuals' genomic data while
preserving their privacy. Different from existing differential privacy-based solutions
for genomic data (which consider privacy-preserving release of summary
statistics), we focus on privacy-preserving sharing of actual genomic data. We assume
an individual with some sensitive portion on his genome (e.g., mutations or
single nucleotide polymorphisms - SNPs that reveal sensitive information about
the individual). The goals of the individual are to (i) preserve the privacy of
his sensitive data, (ii) preserve the privacy of interdependent data (data that belongs
to other individuals that is correlated with his data), and (iii) share as much
data as possible to maximize utility of data sharing. As opposed to traditional
differential privacy-based data sharing schemes, the proposed scheme does not
intentionally add noise to data; it is based on selective sharing of data points.
Previous studies show that hiding the sensitive SNPs while sharing the others
does not preserve individual's (or other interdependent peoples') privacy. By exploiting
auxiliary information, an attacker can run e cient inference attacks and
infer the sensitive SNPs of individuals. In this work, we also utilize such inference
attacks, which we discuss in details first, in our differential privacy-based
data sharing framework and propose a SNP sharing platform for individuals that
provides differential privacy guarantees. We show that the proposed framework
does not provide sensitive information to the attacker while it provides a high
data sharing utility. Through experiments on real data, we extensively study the
relationship between utility and several parameters that effect privacy. We also
compare the proposed technique with the previous ones and show our advantage
both in terms of privacy and data sharing utility.