Privacy preserving and robust watermarking on sequential genome data using belief propagation and local differential privacy
Embargo Lift Date: 2021-02-28
Öksüz, Abdullah Çağlar
Item Usage Stats
Genome data is a subject of study for both biology and computer science since the start of Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and aﬀordable. For research, these genome data can be shared on public websites or with service providers. However, this sharing process compromises the privacy of donors even under partial sharing conditions. In this work, we mainly focus on the liability aspect ensued by unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is watermarking mechanism. In order to detect malicious correspondents and service providers (SPs) -whose aim is to share genome data without individuals’ consent and undetected-, we propose a novel watermarking method on sequential genome data using belief propagation algorithm. In our method, we have three criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries can not temper the watermark by modiﬁcation and are identiﬁed with high probability (ii) Achieving -local diﬀerential privacy in all data sharings with SPs and (iii) Preserving the utility by keeping the watermark length short and the watermarks non-conﬂicting. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Also, considering the fact that the attackers may know our optimality strategy in watermarking, we incorporate local diﬀerential privacy as plausible deniability factor that induces malicious inference strength. As opposed to traditional diﬀerential privacy-based data sharing schemes in which the noise is added based on summary statistic of the population data, noise is added in local setting based on local probabilities.