Re-identification of individuals in genomic data-sharing beacons via allele inference

buir.advisorCicek, A. Ercument
dc.contributor.authorThenen, Nora von
dc.date.accessioned2017-11-01T12:18:46Z
dc.date.available2017-11-01T12:18:46Z
dc.date.copyright2017-10
dc.date.issued2017-10
dc.date.submitted2017-11-01
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionIncludes bibliographical references (leaves 31-33).en_US
dc.description.abstractGenomic datasets are often associated with sensitive phenotypes. Therefore, the leak of membership information is a major privacy risk. Genomic beacons aim to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of speci c alleles in the dataset. Previously deemed secure against re-identi cation attacks, beacons were shown to be vulnerable despite their stringent policy. Recent studies have demonstrated that it is possible to determine whether the victim is in the dataset, by repeatedly querying the beacon for his/her single nucleotide polymorphisms (SNPs). In this thesis, we propose a novel re-identi cation attack and show that the privacy risk is more serious than previously thought. Using the proposed attack, even if the victim systematically hides informative SNPs (i.e., SNPs with very low minor allele frequency -MAF-), it is possible to infer the alleles at positions of interest as well as the beacon query results with very high con dence. Our method is based on the fact that alleles at di erent loci are not necessarily independent. We use the linkage disequilibrium and a high-order Markov chain-based algorithm for the inference. We show that in a simulated beacon with 65 individuals from the CEU population, we can infer membership of individuals with 95% con dence with only 5 queries, even when SNPs with MAF less than 0.05 are hidden. This means, we need less than 0.5% of the number of queries that existing works require, to determine beacon membership under the same conditions. We further show that countermeasures such as hiding certain parts of the genome or setting a query budget for the user would fail to protect the privacy of the participants under our adversary model.en_US
dc.description.statementofresponsibilityby Nora von Thenen.en_US
dc.embargo.release2018-04-30
dc.format.extentxii, 35 leaves : charts (some color) ; 30 cmen_US
dc.identifier.itemidB156807
dc.identifier.urihttp://hdl.handle.net/11693/33865
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectBeaconen_US
dc.subjectGenome privacyen_US
dc.subjectRe-identi cation attacken_US
dc.titleRe-identification of individuals in genomic data-sharing beacons via allele inferenceen_US
dc.title.alternativeGenom verisi paylaşan beacon sistemlerine karşı alel çıkarımı yapan kimlik tespiti ataklarıen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10169029.pdf
Size:
2.67 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: