A utility maximizing and privacy preserving approach for protecting kinship in genomic databases

Limited Access
This item is unavailable until:
2020-03-01
Date
2017-03
Editor(s)
Advisor
Okan, Öznur Taştan
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Bilkent University
Volume
Issue
Pages
Language
English
Journal Title
Journal ISSN
Volume Title
Series
Abstract

Rapid and low cost sequencing of genomic data enables widespread use of genomic information in research studies and personalized customer applications, where people share their genomic data in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred if the stored data is not shared in a privacypreserving manner. Proper handling of kinship information is one such caveat that needs to be addressed to avoid exposure of privacy-sensitive information. In this work, we show that by using only the publicly available single nucleotide polymorphism (SNP) data of anonymized individuals, kinship relationships can be inferred. We present two scenarios that result in privacy leakage; one based on genomic similarity of the individuals; the other, through the outlier allele pair counts of the family members. In the proposed models, we assume that the family members join to the database sequentially and we systematically identify minimal portions of data to withhold as the new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem. Therein, the number of positions to mask is minimized subject to several privacy constraints that ensure the kinship information among any pair of the family members is not leaked. We evaluate the proposed technique on real genomic data of two different families of size five by considering different sequential arrival orders for the family members. Results indicate that concurrent sharing of data pertaining to a parent and an of spring results in high risks of privacy leakages, whereas the sharing data from further relatives together is often safer. We also show that different arrival orders of the members can lead to different levels of privacy risks and the utility of shared data can vary. Adoption of the proposed method shall allow safe sharing of genomic data in terms of kinship privacy in future research studies and public genomic services.

Course
Other identifiers
Book Title
Citation
Published Version (Please cite this version)