Privacy preserving and robust watermarking on sequential genome data using belief propagation and local differential privacy
Files
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Attention Stats
Usage Stats
views
downloads
Series
Abstract
Genome data is a subject of study for both biology and computer science since the start of Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. For research, these genome data can be shared on public websites or with service providers. However, this sharing process compromises the privacy of donors even under partial sharing conditions. In this work, we mainly focus on the liability aspect ensued by unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is watermarking mechanism. In order to detect malicious correspondents and service providers (SPs) -whose aim is to share genome data without individuals’ consent and undetected-, we propose a novel watermarking method on sequential genome data using belief propagation algorithm. In our method, we have three criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries can not temper the watermark by modification and are identified with high probability (ii) Achieving -local differential privacy in all data sharings with SPs and (iii) Preserving the utility by keeping the watermark length short and the watermarks non-conflicting. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Also, considering the fact that the attackers may know our optimality strategy in watermarking, we incorporate local differential privacy as plausible deniability factor that induces malicious inference strength. As opposed to traditional differential privacy-based data sharing schemes in which the noise is added based on summary statistic of the population data, noise is added in local setting based on local probabilities.