BUIR Repository :: Browsing by Author "Ayday, Erman"

Browsing by Author "Ayday, Erman"

Now showing 1 - 20 of 27

Open Access
Can you really anonymize the donors of genomic data in today’s digital world?
(Springer, 2016-09) Alser, Mohammed; Almadhoun, Nour; Nouri, Azita; Alkan, Can; Ayday, Erman
The rapid progress in genome sequencing technologies leads to availability of high amounts of genomic data. Accelerating the pace of biomedical breakthroughs and discoveries necessitates not only collecting millions of genetic samples but also granting open access to genetic databases. However, one growing concern is the ability to protect the privacy of sensitive information and its owner. In this work, we survey a wide spectrum of cross-layer privacy breaching strategies to human genomic data (using both public genomic databases and other public non-genomic data). We outline the principles and outcomes of each technique, and assess its technological complexity and maturation. We then review potential privacy-preserving countermeasure mechanisms for each threat. © Springer International Publishing Switzerland 2016.
Open Access
Cryptographic solutions for genomic privacy
(Springer, 2016-02) Ayday, Erman
With the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn about their (genetic) predispositions to diseases, their ancestries, and even their (genetic) compatibilities with potential partners. This trend has also caused the launch of health-related websites and online social networks (OSNs), in which individuals share their genomic data (e.g., OpenSNP or 23andMe). On the other hand, genomic data carries much sensitive information about its owner. By analyzing the DNA of an individual, it is now possible to learn about his disease predispositions (e.g., for Alzheimer’s or Parkinson’s), ancestries, and physical attributes. The threat to genomic privacy is magnified by the fact that a person’s genome is correlated to his family members’ genomes, thus leading to interdependent privacy risks. In this work, focusing on our existing and ongoing work on genomic privacy, we will first highlight one serious threat for genomic privacy. Then, we will present the high level descriptions of our cryptographic solutions to protect the privacy of genomic data. © International Financial Cryptography Association 2016.
Open Access
A demonstration of privacy-preserving aggregate queries for optimal location selection
(IEEE, 2018) Eryonucu, Cihan; Ayday, Erman; Zeydan, E.
In recent years, service providers, such as mobile operators providing wireless services, collected location data in enormous extent with the increase of the usages of mobile phones. Vertical businesses, such as banks, may want to use this location information for their own scenarios. However, service providers cannot directly provide these private data to the vertical businesses because of the privacy and legal issues. In this demo, we show how privacy preserving solutions can be utilized using such location-based queries without revealing each organization's sensitive data. In our demonstration, we used partially homomorphic cryptosystem in our protocols and showed practicality and feasibility of our proposed solution.
Open Access
Differential privacy under dependent tuples—the case of genomic privacy
(Oxford University Press, 2020-03) Almadhoun, Nour; Ayday, Erman; Ulusoy, Özgür
Motivation: The rapid progress in genome sequencing has led to high availability of genomic data. Studying these data can greatly help answer the key questions about disease associations and our evolution. However, due to growing privacy concerns about the sensitive information of participants, accessing key results and data of genomic studies (such as genome-wide association studies) is restricted to only trusted individuals. On the other hand, paving the way to biomedical breakthroughs and discoveries requires granting open access to genomic datasets. Privacy-preserving mechanisms can be a solution for granting wider access to such data while protecting their owners. In particular, there has been growing interest in applying the concept of differential privacy (DP) while sharing summary statistics about genomic data. DP provides a mathematically rigorous approach to prevent the risk of membership inference while sharing statistical information about a dataset. However, DP does not consider the dependence between tuples in the dataset, which may degrade the privacy guarantees offered by the DP. Results: In this work, focusing on genomic datasets, we show this drawback of the DP and we propose techniques to mitigate it. First, using a real-world genomic dataset, we demonstrate the feasibility of an inference attack on differentially private query results by utilizing the correlations between the entries in the dataset. The results show the scale of vulnerability when we have dependent tuples in the dataset. We show that the adversary can infer sensitive genomic data about a user from the differentially private results of a query by exploiting the correlations between the genomes of family members. Second, we propose a mechanism for privacy-preserving sharing of statistics from genomic datasets to attain privacy guarantees while taking into consideration the dependence between tuples. By evaluating our mechanism on different genomic datasets, we empirically demonstrate that our proposed mechanism can achieve up to 50% better privacy than traditional DP-based solutions. Availability and implementation: https://github.com/nourmadhoun/Differential-privacy-genomic-inference-attack.
Open Access
Differential privacy with bounded priors: Reconciling utility and privacy in genome-wide association studies
(ACM, 2015-10) Tramèr, F.; Huang, Z.; Hubaux J.-P.; Ayday, Erman
Differential privacy (DP) has become widely accepted as a rigorous definition of data privacy, with stronger privacy guarantees than traditional statistical methods. However, recent studies have shown that for reasonable privacy budgets, differential privacy significantly affects the expected utility. Many alternative privacy notions which aim at relaxing DP have since been proposed, with the hope of providing a better tradeoff between privacy and utility. At CCS'13, Li et al. introduced the membership privacy framework, wherein they aim at protecting against set membership disclosure by adversaries whose prior knowledge is captured by a family of probability distributions. In the context of this framework, we investigate a relaxation of DP, by considering prior distributions that capture more reasonable amounts of background knowledge. We show that for different privacy budgets, DP can be used to achieve membership privacy for various adversarial settings, thus leading to an interesting tradeoff between privacy guarantees and utility. We re-evaluate methods for releasing differentially private χ2-statistics in genome-wide association studies and show that we can achieve a higher utility than in previous works, while still guaranteeing membership privacy in a relevant adversarial setting. © 2015 ACM.
Open Access
Differentially private binary- and matrix-valued data query: an XOR mechanism
(Association for Computing Machinery, 2021-01) Ji, T.; Li, P.; Yilmaz, E.; Ayday, Erman; Ye, Y. F.; Sun, J.
Differential privacy has been widely adopted to release continuous- and scalar-valued information on a database without compromising the privacy of individual data records in it. The problem of querying binary- and matrix-valued information on a database in a differentially private manner has rarely been studied. However, binary- and matrix-valued data are ubiquitous in real-world applications, whose privacy concerns may arise under a variety of circumstances. In this paper, we devise an exclusive or (XOR) mechanism that perturbs binary- and matrix-valued query result by conducting an XOR operation on the query result with calibrated noises attributed to a matrix-valued Bernoulli distribution. We first rigorously analyze the privacy and utility guarantee of the proposed XOR mechanism. Then, to generate the parameters in the matrix-valued Bernoulli distribution, we develop a heuristic approach to minimize the expected square query error rate under ϵ-differential privacy constraint. Additionally, to address the intractability of calculating the probability density function (PDF) of this distribution and efficiently generate samples from it, we adapt an Exact Hamiltonian Monte Carlo based sampling scheme. Finally, we experimentally demonstrate the efficacy of the XOR mechanism by considering binary data classification and social network analysis, all in a differentially private manner. Experiment results show that the XOR mechanism notably outperforms other state-of-the-art differentially private methods in terms of utility (such as classification accuracy and F1 score), and even achieves comparable utility to the non-private mechanisms.
Open Access
Dynamic attribute-based privacy-preserving genomic susceptibility testing
(Association for Computing Machinery, 2019) Namazi, M.; Ayday, Erman; Eryonucu, Cihan; Perez-Gonzalez, F.
Developments in the field of genomic studies have resulted in the current high availability of genomic data which, in turn, raises significant privacy concerns. As DNA information is unique and correlated among family members, it cannot be regarded just as a matter of individual privacy concern. Due to the need for privacy-enhancing methods to protect these sensitive pieces of information, cryptographic solutions are deployed and enabled scientists to work on encrypted genomic data. In this paper, we develop an attribute-based privacy-preserving susceptibility testing method in which genomic data of patients is outsourced to an untrustworthy platform. We determine the challenges for the computations required to process the outsourced data and access control simultaneously within patient-doctor interactions. We obtain a non-interactive scheme regarding the contribution of the patient which improves the safety of the user data. Moreover, we exceed the computation performance of the susceptibility testing over the encrypted genomic data while we manage attributes and embedded access policies. Also, we guarantee to protect the privacy of individuals in our proposed scheme.
Open Access
The effect of kinship in re-identification attacks against genomic data sharing beacons
(NLM (Medline), 2020-12) Ayoz, Kerem; Ayşen, Miray; Ayday, Erman; Çiçek, A. Ercüment
Motivation: Big data era in genomics promises a breakthrough in medicine, but sharing data in a private manner limit the pace of field. Widely accepted ‘genomic data sharing beacon’ protocol provides a standardized and secure interface for querying the genomic datasets. The data are only shared if the desired information (e.g. a certain variant) exists in the dataset. Various studies showed that beacons are vulnerable to re-identification (or membership inference) attacks. As beacons are generally associated with sensitive phenotype information, re-identification creates a significant risk for the participants. Unfortunately, proposed countermeasures against such attacks have failed to be effective, as they do not consider the utility of beacon protocol. Results: In this study, for the first time, we analyze the mitigation effect of the kinship relationships among beacon participants against re-identification attacks. We argue that having multiple family members in a beacon can garble the information for attacks since a substantial number of variants are shared among kin-related people. Using family genomes from HapMap and synthetically generated datasets, we show that having one of the parents of a victim in the beacon causes (i) significant decrease in the power of attacks and (ii) substantial increase in the number of queries needed to confirm an individual’s beacon membership. We also show how the protection effect attenuates when more distant relatives, such as grandparents are included alongside the victim. Furthermore, we quantify the utility loss due adding relatives and show that it is smaller compared with flipping based techniques.
Open Access
Efficient quantification of profile matching risk in social networks using belief propagation
(Springer Science and Business Media Deutschland GmbH, 2020) Halimi, A.; Ayday, Erman; Chen, L.; Li, N.; Liang, K.; Schneider, S.
Many individuals share their opinions (e.g., on political issues) or sensitive information about them (e.g., health status) on the internet in an anonymous way to protect their privacy. However, anonymous data sharing has been becoming more challenging in today’s interconnected digital world, especially for individuals that have both anonymous and identified online activities. The most prominent example of such data sharing platforms today are online social networks (OSNs). Many individuals have multiple profiles in different OSNs, including anonymous and identified ones (depending on the nature of the OSN). Here, the privacy threat is profile matching: if an attacker links anonymous profiles of individuals to their real identities, it can obtain privacy-sensitive information which may have serious consequences, such as discrimination or blackmailing. Therefore, it is very important to quantify and show to the OSN users the extent of this privacy risk. Existing attempts to model profile matching in OSNs are inadequate and computationally inefficient for real-time risk quantification. Thus, in this work, we develop algorithms to efficiently model and quantify profile matching attacks in OSNs as a step towards real-time privacy risk quantification. For this, we model the profile matching problem using a graph and develop a belief propagation (BP)-based algorithm to solve this problem in a significantly more efficient and accurate way compared to the state-of-the-art. We evaluate the proposed framework on three real-life datasets (including data from four different social networks) and show how users’ profiles in different OSNs can be matched efficiently and with high probability. We show that the proposed model generation has linear complexity in terms of number of user pairs, which is significantly more efficient than the state-of-the-art (which has cubic complexity). Furthermore, it provides comparable accuracy, precision, and recall compared to state-of-the-art. Thanks to the algorithms that are developed in this work, individuals will be more conscious when sharing data on online platforms. We anticipate that this work will also drive the technology so that new privacy-centered products can be offered by the OSNs.
Open Access
Entering watch dogs*: evaluating privacy risks against large-scale facial search and data collection
(IEEE, 2021-07-19) Durmaz, Bahadır; Ayday, Erman
Discovering friends on online platforms have become relatively easier with the introduction of contact discovery and ability to search using phone numbers. Such features conveniently connect users by acting as unique tokens across platforms, as opposed to other attributes, such as user names. Using this feature, in this work, one of our contributions is to explore how an attacker can easily create a massive dataset of individuals residing in a given region (e.g., country) that includes high amount of personal information about such individuals. To identify the active social network accounts of individuals in a given region, we show that brute force phone number verification is possible in popular online services, such as WhatsApp, Facebook Messenger, and Twitter. We also go beyond and show the feasibility of collecting several data points on discovered accounts, including multiple facial data belonging to each account owner along with 23 other attributes. Then, as our main contribution, we quantify the privacy risk for an attacker linking a total stranger (e.g., someone it randomly comes across in public) to one of the collected records via facial features. Our results show that accurate facial search is possible in the constructed dataset and that an attacker can link a randomly taken photo (i.e., a single facial photo) of an individual to their profile with 67% accuracy. This means that an attacker can, on a large scale, create a search engine that is capable of identifying individuals' records efficiently and accurately from just a single facial photo.
Open Access
GenoGuard: protecting genomic data against brute-force attacks
(IEEE, 2015-05) Huang, Z.; Ayday, Erman; Fellay, Jacques; Hubaux, J-P.; Juels, A.
Secure storage of genomic data is of great and increasing importance. The scientific community's improving ability to interpret individuals' genetic materials and the growing size of genetic database populations have been aggravating the potential consequences of data breaches. The prevalent use of passwords to generate encryption keys thus poses an especially serious problem when applied to genetic data. Weak passwords can jeopardize genetic data in the short term, but given the multi-decade lifespan of genetic data, even the use of strong passwords with conventional encryption can lead to compromise. We present a tool, called Geno Guard, for providing strong protection for genomic data both today and in the long term. Geno Guard incorporates a new theoretical framework for encryption called honey encryption (HE): it can provide information-theoretic confidentiality guarantees for encrypted data. Previously proposed HE schemes, however, can be applied to messages from, unfortunately, a very restricted set of probability distributions. Therefore, Geno Guard addresses the open problem of applying HE techniques to the highly non-uniform probability distributions that characterize sequences of genetic data. In Geno Guard, a potential adversary can attempt exhaustively to guess keys or passwords and decrypt via a brute-force attack. We prove that decryption under any key will yield a plausible genome sequence, and that Geno Guard offers an information-theoretic security guarantee against message-recovery attacks. We also explore attacks that use side information. Finally, we present an efficient and parallelized software implementation of Geno Guard. © 2015 IEEE.
Open Access
An inference attack on genomic data using kinship, complex correlations, and phenotype information
(IEEE, 2018) Deznabi, Iman; Mobayen, Mohammad; Jafari, Nazanin; Taştan, Öznur; Ayday, Erman
Abstract—Individuals (and their family members) share (partial) genomic data on public platforms. However, using special characteristics of genomic data, background knowledge that can be obtained from the Web, and family relationship between the individuals, it is possible to infer the hidden parts of shared (and unshared) genomes. Existing work in this field considers simple correlations in the genome (as well as Mendel’s law and partial genomes of a victim and his family members). In this paper, we improve the existing work on inference attacks on genomic privacy. We mainly consider complex correlations in the genome by using an observable Markov model and recombination model between the haplotypes. We also utilize the phenotype information about the victims. We propose an efficient message passing algorithm to consider all aforementioned background information for the inference. We show that the proposed framework improves inference with significantly less information compared to existing work.
Open Access
Inference attacks against differentially private query results from genomic datasets including dependent tuples
(NLM (Medline), 2020) Almadhoun, Nour; Ayday, Erman; Ulusoy, Özgür
Motivation: The rapid decrease in the sequencing technology costs leads to a revolution in medical research and clinical care. Today, researchers have access to large genomic datasets to study associations between variants and complex traits. However, availability of such genomic datasets also results in new privacy concerns about personal information of the participants in genomic studies. Differential privacy (DP) is one of the rigorous privacy concepts, which received widespread interest for sharing summary statistics from genomic datasets while protecting the privacy of participants against inference attacks. However, DP has a known drawback as it does not consider the correlation between dataset tuples. Therefore, privacy guarantees of DP-based mechanisms may degrade if the dataset includes dependent tuples, which is a common situation for genomic datasets due to the inherent correlations between genomes of family members. Results: In this article, using two real-life genomic datasets, we show that exploiting the correlation between the dataset participants results in significant information leak from differentially private results of complex queries. We formulate this as an attribute inference attack and show the privacy loss in minor allele frequency (MAF) and chisquare queries. Our results show that using the results of differentially private MAF queries and utilizing the dependency between tuples, an adversary can reveal up to 50% more sensitive information about the genome of a target (compared to original privacy guarantees of standard DP-based mechanisms), while differentially privacy chi-square queries can reveal up to 40% more sensitive information. Furthermore, we show that the adversary can use the inferred genomic data obtained from the attribute inference attack to infer the membership of a target in another genomic dataset (e.g. associated with a sensitive trait). Using a log-likelihood-ratio test, our results also show that the inference power of the adversary can be significantly high in such an attack even using inferred (and hence partially incorrect) genomes.
Open Access
Key protected classification for collaborative learning
(Elsevier, 2020) Sarıyıldız, Mert Bülent; Cinbiş, R. G.; Ayday, Erman
Large-scale datasets play a fundamental role in training deep learning models. However, dataset collection is difficult in domains that involve sensitive information. Collaborative learning techniques provide a privacy-preserving solution, by enabling training over a number of private datasets that are not shared by their owners. However, recently, it has been shown that the existing collaborative learning frameworks are vulnerable to an active adversary that runs a generative adversarial network (GAN) attack. In this work, we propose a novel classification model that is resilient against such attacks by design. More specifically, we introduce a key-based classification model and a principled training scheme that protects class scores by using class-specific private keys, which effectively hide the information necessary for a GAN attack. We additionally show how to utilize high dimensional keys to improve the robustness against attacks without increasing the model complexity. Our detailed experiments demonstrate the effectiveness of the proposed technique. Source code will be made available at https://github.com/mbsariyildiz/key-protected-classification.
Open Access
On non-cooperative genomic privacy
(Springer, Berlin, Heidelberg, 2015) Humbert, M.; Ayday, Erman; Hubaux J.-P.; Telenti, A.
Over the last few years, the vast progress in genome sequencing has highly increased the availability of genomic data. Today, individuals can obtain their digital genomic sequences at reasonable prices from many online service providers. Individuals can store their data on personal devices, reveal it on public online databases, or share it with third parties. Yet, it has been shown that genomic data is very privacysensitive and highly correlated between relatives. Therefore, individuals’ decisions about how to manage and secure their genomic data are crucial. People of the same family might have very different opinions about (i) how to protect and (ii) whether or not to reveal their genome. We study this tension by using a game-theoretic approach. First, we model the interplay between two purely-selfish family members. We also analyze how the game evolves when relatives behave altruistically. We define closed-form Nash equilibria in different settings. We then extend the game to N players by means of multi-agent influence diagrams that enable us to efficiently compute Nash equilibria. Our results notably demonstrate that altruism does not always lead to a more efficient outcome in genomic-privacy games. They also show that, if the discrepancy between the genome-sharing benefits that players perceive is too high, they will follow opposite sharing strategies, which has a negative impact on the familial utility. © International Financial Cryptography Association 2015.
Open Access
Privacy and security in the genomic era
(ACM, 2016-10) Ayday, Erman; Hubaux, Jean-Pierre
With the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn about their (genetic) predispositions to diseases, their ancestries, and even their (genetic) compatibilities with potential partners. This trend has also caused the launch of health-related websites and online social networks (OSNs), in which individuals share their genomic data (e.g., Open-SNP or 23 and Me). On the other hand, genomic data carries much sensitive information about its owner. By analyzing the DNA of an individual, it is now possible to learn about his disease predispositions (e.g., for Alzheimer's or Parkinson's), ancestries, and physical attributes. The threat to genomic privacy is magnified by the fact that a person's genome is correlated to his family members' genomes, thus leading to interdependent privacy risks. This short tutorial will help computer scientists better understand the privacy and security challenges in today's genomic era. We will first highlight the significance of genomic data and the threats for genomic privacy. Then, we will present the high level descriptions of the proposed solutions to protect the privacy of genomic data and we will discuss future research directions. No prerequisite knowledge on biology or genomics is required for the attendees of this proposal. We only require the attendees to have a slight background on cryptography and statistics.
Open Access
Privacy threats and practical solutions for genetic risk tests
(IEEE, 2015) Barman, L.; Elgraini, M.-T.; Raisaro, J. L.; Hubaux, J. -P.; Ayday, Erman
Recently, several solutions have been proposed to address the complex challenge of protecting individuals' genetic data during personalized medicine tests. In this short paper, we analyze different privacy threats and propose simple countermeasures for the generic architecture mainly used in the literature. In particular, we present and evaluate a new practical solution against a critical attack of a malicious medical center trying to actively infer raw genetic information of patients. © 2015 IEEE.
Open Access
Privacy-preserving aggregate queries for optimal location selection
(IEEE, 2019) Yılmaz, Emre; Ferhatosmanoğlu, H.; Ayday, Erman; Aksoy, Remzi Can
Today, vast amounts of location data are collected by various service providers. These location data owners have a good idea of where their users are most of the time. Other businesses also want to use this information for location analytics, such as finding the optimal location for a new branch. However, location data owners cannot share their data with other businesses, mainly due to privacy and legal concerns. In this paper, we propose privacy-preserving solutions in which location-based queries can be answered by data owners without sharing their data with other businesses and without accessing sensitive information such as the customer list of the businesses that send the query. We utilize a partially homomorphic cryptosystem as the building block of the proposed protocols. We prove the security of the protocols in semi-honest threat model. We also explain how to achieve differential privacy in the proposed protocols and discuss its impact on utility. We evaluate the performance of the protocols with real and synthetic datasets and show that the proposed solutions are highly practical. The proposed solutions will facilitate an effective sharing of sensitive data between entities and joint analytics in a wide range of applications without violating their customers' privacy.
Open Access
A privacy-preserving framework for outsourcing location-based services to the cloud
(IEEE, 2021) Zhu, X.; Ayday, Erman; Vitenberg, R.
Thanks to the popularity of mobile devices numerous location-based services (LBS) have emerged. While several privacy-preserving solutions for LBS have been proposed, most of these solutions do not consider the fact that LBS are typically cloud-based nowadays. Outsourcing data and computation to the cloud raises a number of significant challenges related to data confidentiality, user identity and query privacy, fine-grained access control, and query expressiveness. In this work, we propose a privacy-preserving framework for outsourcing LBS to the cloud. The framework supports multi-location queries with fine-grained access control, and search by location attributes, while providing semantic security. In particular, the framework implements a new model that allows the user to govern the trade-off between precision and privacy on a dynamic per-query basis. We also provide a security analysis to show that the proposed scheme preserves privacy in the presence of different threats. We also show the viability of our proposed solution and scalability with the number of locations through an experimental evaluation, using a real-life OpenStreetMap dataset.
Open Access
Privacy-preserving search for a similar genomic makeup in the cloud
(Institute of Electrical and Electronics Engineers Inc., 2021-04-20) Zhu, X.; Vitenberg, R.; Veeraragavan, N. R.; Ayday, Erman
Increasing affordability of genome sequencing and, as a consequence, widespread availability of genomic data opens up new opportunities for the field of medicine, as also evident from the emergence of popular cloud-based offerings in this area, such as Google Genomics [1]. To utilize this data more efficiently, it is crucial that different entities share their data with each other. However, such data sharing is risky mainly due to privacy concerns. In this article, we attempt to provide a privacy-preserving and efficient solution for the “similar patient search” problem among several parties (e.g., hospitals) by addressing the shortcomings of previous attempts. We consider a scenario in which each hospital has its own genomic dataset and the goal of a physician (or researcher) is to search for a patient similar to a given one (based on a genomic makeup) among all the hospitals in the system. To enable this search, we propose a hierarchical index structure to index each hospital’s dataset with low memory requirement. Furthermore, we develop a novel privacy-preserving index merging mechanism that generates a common search index from individual indices of each hospital to significantly improve the search efficiency. We also consider the storage of medical information associated with genomic data of a patient (e.g., diagnosis and treatment). We allow access to this information via a fine-grained access control policy that we develop through the combination of standard symmetric encryption and ciphertext policy attribute-based encryption. Using this mechanism, a physician can search for similar patients and obtain medical information about the matching records if the access policy holds. We conduct experiments on large-scale genomic data and show the high efficiency of the proposed scheme.