On the tradeoff between privacy and utility in genomic studies: differential privacy under dependent tuples

buir.advisorUlusoy, Özgür
dc.contributor.authorAlserr, Nour M. N.
dc.date.accessioned2020-11-26T13:48:11Z
dc.date.available2020-11-26T13:48:11Z
dc.date.copyright2020-08
dc.date.issued2020-08
dc.date.submitted2020-11-24
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (Ph.D.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2020.en_US
dc.descriptionIncludes bibliographical references (leaves 107-117).en_US
dc.description.abstractThe rapid progress in genome sequencing and the decrease in the sequencing costs have led to the high availability of genomic data. Studying these data can greatly help answer the key questions about disease associations and our evolution. However, due to growing privacy concerns about the sensitive information of participants, accessing key results and data of genomic studies (such as genomewide association studies - GWAS) is restricted to only trusted individuals. On the other hand, paving the way to biomedical breakthroughs and discoveries requires granting open access to genomic datasets. Privacy-preserving mechanisms can be a solution for granting wider access to such data while protecting their owners. In particular, there has been growing interest in applying the concept of differential privacy (DP) while sharing summary statistics about genomic data. DP provides a mathematically rigorous approach to prevent the risk of membership inference while sharing statistical information about a dataset. However, DP has a known drawback as it does not take into account the correlation between dataset tuples, which is a common situation for genomic datasets due to the inherent correlations between the genomes of family members. This may degrade the privacy guarantees offered by the DP. In this Thesis, focusing on static and dynamic genomic datasets, we show this drawback of the DP and we propose techniques to mitigate it. First, using a real-world genomic dataset, we demonstrate the feasibility of an attribute inference attack on differentially private query results by utilizing the correlations between the entries in the dataset. We show the privacy loss in count, minor allele frequency (MAF), and chi-square queries. The results explain the scale of vulnerability when we have dependent tuples in the dataset. Our results demonstrate that the adversary can infer sensitive genomic data about a user from the differentially private results of a sum query by exploiting the correlations between the genomes of family members. Our results also show that using the results of differentially-private MAF queries on static and dynamic genomic datasets and utilizing the dependency between tuples, an adversary can reveal up to 50% more sensitive information about the genome of a target (compared to original privacy guarantees of standard DP-based mechanisms), while differentially-privacy chi-square queries can reveal up to 40% more sensitive information. Furthermore, we show that the adversary can use the inferred genomic data obtained from the attribute inference attack to infer the membership of a target in another genomic dataset (e.g., associated with a sensitive trait). Using a log-likelihood-ratio (LLR) test, our results also show that the inference power of the adversary can be significantly high in such an attack even by using inferred (and hence partially incorrect) genomes. Finally, we propose a mechanism for privacy-preserving sharing of statistics from genomic datasets to attain privacy guarantees while taking into consideration the dependence between tuples. By evaluating our mechanism on different genomic datasets, we empirically demonstrate that our proposed mechanism can achieve up to 50% better privacy than traditional DP-based solutions.en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2020-11-26T13:48:11Z No. of bitstreams: 1 Thesis_Nour_Alser.pdf: 24012584 bytes, checksum: da7b95f1dcc2bc51395c4f16ecde40f4 (MD5)en
dc.description.provenanceMade available in DSpace on 2020-11-26T13:48:11Z (GMT). No. of bitstreams: 1 Thesis_Nour_Alser.pdf: 24012584 bytes, checksum: da7b95f1dcc2bc51395c4f16ecde40f4 (MD5) Previous issue date: 2020-11en
dc.description.statementofresponsibilityby Nour M. N. Alserren_US
dc.format.extentxx, 118 leaves : charts ; 30 cm.en_US
dc.identifier.itemidB157645
dc.identifier.urihttp://hdl.handle.net/11693/54645
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectGenomic datasetsen_US
dc.subjectDifferential privacyen_US
dc.subjectInference attacksen_US
dc.titleOn the tradeoff between privacy and utility in genomic studies: differential privacy under dependent tuplesen_US
dc.title.alternativeGenomik çalışmalarda gizlilik ve verinin işe yararlılığı üzerine analiz: bağımlı elemanlar altında diferansiyel gizliliken_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelDoctoral
thesis.degree.namePh.D. (Doctor of Philosophy)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_Nour_Alser.pdf
Size:
22.9 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: