Browsing by Subject "Data privacy"
Now showing 1 - 13 of 13
- Results Per Page
- Sort Options
Item Open Access Can you really anonymize the donors of genomic data in today’s digital world?(Springer, 2016-09) Alser, Mohammed; Almadhoun, Nour; Nouri, Azita; Alkan, Can; Ayday, ErmanThe rapid progress in genome sequencing technologies leads to availability of high amounts of genomic data. Accelerating the pace of biomedical breakthroughs and discoveries necessitates not only collecting millions of genetic samples but also granting open access to genetic databases. However, one growing concern is the ability to protect the privacy of sensitive information and its owner. In this work, we survey a wide spectrum of cross-layer privacy breaching strategies to human genomic data (using both public genomic databases and other public non-genomic data). We outline the principles and outcomes of each technique, and assess its technological complexity and maturation. We then review potential privacy-preserving countermeasure mechanisms for each threat. © Springer International Publishing Switzerland 2016.Item Open Access Differential privacy with bounded priors: Reconciling utility and privacy in genome-wide association studies(ACM, 2015-10) Tramèr, F.; Huang, Z.; Hubaux J.-P.; Ayday, ErmanDifferential privacy (DP) has become widely accepted as a rigorous definition of data privacy, with stronger privacy guarantees than traditional statistical methods. However, recent studies have shown that for reasonable privacy budgets, differential privacy significantly affects the expected utility. Many alternative privacy notions which aim at relaxing DP have since been proposed, with the hope of providing a better tradeoff between privacy and utility. At CCS'13, Li et al. introduced the membership privacy framework, wherein they aim at protecting against set membership disclosure by adversaries whose prior knowledge is captured by a family of probability distributions. In the context of this framework, we investigate a relaxation of DP, by considering prior distributions that capture more reasonable amounts of background knowledge. We show that for different privacy budgets, DP can be used to achieve membership privacy for various adversarial settings, thus leading to an interesting tradeoff between privacy guarantees and utility. We re-evaluate methods for releasing differentially private χ2-statistics in genome-wide association studies and show that we can achieve a higher utility than in previous works, while still guaranteeing membership privacy in a relevant adversarial setting. © 2015 ACM.Item Open Access Inference attacks against kin genomic privacy(Institute of Electrical and Electronics Engineers Inc., 2017) Ayday, E.; Humbert M.Genomic data poses serious interdependent risks: your data might also leak information about your family members' data. Methods attackers use to infer genomic information, as well as recent proposals for enhancing genomic privacy, are discussed. © 2003-2012 IEEE.Item Open Access Multirelational k-anonymity(IEEE, 2007-04) Nergiz, M. Ercan; Clifton, C.; Nergiz, A. Erhank-Anonymity protects privacy by ensuring that data cannot be linked to a single individual. In a k-anonymous dataset, any identifying information occurs in at least k tuples. Much research has been done to modify a single table dataset to satisfy anonymity constraints. This paper extends the definitions of k-anonymity to multiple relations and shows that previously proposed methodologies either fail to protect privacy, or overly reduce the utility of the data, in a multiple relation setting. A new clustering algorithm is proposed to achieve multirelational anonymity. © 2007 IEEE.Item Open Access Preventing unauthorized data flows(Springer, Cham, 2017) Uzun, Emre; Parlato, G.; Atluri, V.; Ferrara, A. L.; Vaidya, J.; Sural, S.; Lorenzi, D.Trojan Horse attacks can lead to unauthorized data flows and can cause either a confidentiality violation or an integrity violation. Existing solutions to address this problem employ analysis techniques that keep track of all subject accesses to objects, and hence can be expensive. In this paper we show that for an unauthorized flow to exist in an access control matrix, a flow of length one must exist. Thus, to eliminate unauthorized flows, it is sufficient to remove all one-step flows, thereby avoiding the need for expensive transitive closure computations. This new insight allows us to develop an efficient methodology to identify and prevent all unauthorized flows leading to confidentiality and integrity violations. We develop separate solutions for two different environments that occur in real life, and experimentally validate the efficiency and restrictiveness of the proposed approaches using real data sets. © IFIP International Federation for Information Processing 2017.Item Open Access Privacy and security in the genomic era(ACM, 2016-10) Ayday, Erman; Hubaux, Jean-PierreWith the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn about their (genetic) predispositions to diseases, their ancestries, and even their (genetic) compatibilities with potential partners. This trend has also caused the launch of health-related websites and online social networks (OSNs), in which individuals share their genomic data (e.g., Open-SNP or 23 and Me). On the other hand, genomic data carries much sensitive information about its owner. By analyzing the DNA of an individual, it is now possible to learn about his disease predispositions (e.g., for Alzheimer's or Parkinson's), ancestries, and physical attributes. The threat to genomic privacy is magnified by the fact that a person's genome is correlated to his family members' genomes, thus leading to interdependent privacy risks. This short tutorial will help computer scientists better understand the privacy and security challenges in today's genomic era. We will first highlight the significance of genomic data and the threats for genomic privacy. Then, we will present the high level descriptions of the proposed solutions to protect the privacy of genomic data and we will discuss future research directions. No prerequisite knowledge on biology or genomics is required for the attendees of this proposal. We only require the attendees to have a slight background on cryptography and statistics.Item Open Access Privacy in the genomic era(Association for Computing Machinery, 2015) Naveed, M.; Ayday, E.; Clayton, E.W.; Fellay J.; Gunter, C.A.; Hubaux J.-P.; Malin, B.A.; Wang, X.Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highlydetailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward. © 2015 ACM 0360-0300/2015/08-ART6 $15.00.Item Open Access Privacy-preserving data sharing and utilization between entities(2017-07) Demirağ, DidemIn this thesis, we aim to enable privacy-preserving data sharing between entities and propose two systems for this purpose: (i) a veri able computation scheme that enables privacy-preserving similarity computation in the malicious setting and (ii) a privacy-preserving link prediction scheme in the semi-honest setting. Both of these schemes preserve the privacy of the involving parties, while performing some tasks to improve the service quality. In veri able computation, we propose a centralized system, which involves a client and multiple servers. We speci cally focus on the case, in which we want to compute the similarity of a patient's data across several hospitals. Client, who is the hospital that owns the patient data, sends the query to multiple servers, which are di erent hospitals. Client wants to nd similar patients in these hospitals in order to learn about the treatment techniques applied to those patients. In our link prediction scheme, we have two social networks with common users in both of them. We choose two nodes to perform link prediction between them. We perform link prediction in a privacy-preserving way so that neither of the networks learn the structure of the other network. We apply di erent metrics to de ne the similarity of the nodes. While doing this, we utilize privacy-preserving integer comparison.Item Open Access Privacy-preserving protocols for aggregate location queries via homomorphic encryption and multiparty computation(2019-07) Eryonucu, CihanTwo main goals of the businesses are to serve their customers better and in the meantime, increase their pro t. One of the ways that businesses can improve their services is using location information of their customers (e.g., positioning their facilities with an objective to minimize the average distance of their customers to their closest facilities). However, without the customer's location data, it is impossible for businesses to achieve such goals. Luckily, in today's world, large amounts of location data is collected by service providers such as telecommunication operators or mobile apps such as Swarm. Service providers are willing to share their data with businesses, doing this will violate the privacy of their customers. Here, we propose two new privacy-preserving schemes for businesses to utilize location data of their customers that is collected by location-based service providers (LBSPs). We utilize lattice based homomorphic encryption and multiparty computation for our new schemes and then we compare them with our existing scheme which is based on partial homomorphic encryption. In our protocols, we hide customer lists of businesses from LBSPs, locations of the customers from the businesses, and query result from LBSPs. In such a setting, we let the businesses send location-based queries to the LBSPs. In addition, we make the query result only available to the businesses and hide them from the LBSPs. We evaluate our proposed schemes to show that they are practical. We then compare our three protocols, discussing each one's advantages and disadvantages and give use cases for all protocols. Our proposed schemes allow data sharing in a private manner and create the foundation for the future complex queries.Item Open Access A privacy-preserving solution for the bipartite ranking problem(IEEE, 2016-12) Faramarzi, Noushin Salek; Ayday, Erman; Güvenir, H. AltayIn this paper, we propose an efficient solution for the privacy-preserving of a bipartite ranking algorithm. The bipartite ranking problem can be considered as finding a function that ranks positive instances (in a dataset) higher than the negative ones. However, one common concern for all the existing schemes is the privacy of individuals in the dataset. That is, one (e.g., a researcher) needs to access the records of all individuals in the dataset in order to run the algorithm. This privacy concern puts limitations on the use of sensitive personal data for such analysis. The RIMARC (Ranking Instances by Maximizing Area under the ROC Curve) algorithm solves the bipartite ranking problem by learning a model to rank instances. As part of the model, it learns weights for each feature by analyzing the area under receiver operating characteristic (ROC) curve. RIMARC algorithm is shown to be more accurate and efficient than its counterparts. Thus, we use this algorithm as a building-block and provide a privacy-preserving version of the RIMARC algorithm using homomorphic encryption and secure multi-party computation. Our proposed algorithm lets a data owner outsource the storage and processing of its encrypted dataset to a semi-trusted cloud. Then, a researcher can get the results of his/her queries (to learn the ranking function) on the dataset by interacting with the cloud. During this process, neither the researcher nor the cloud learns any information about the raw dataset. We prove the security of the proposed algorithm and show its efficiency via experiments on real data.Item Open Access Quantifying genomic privacy via inference attack with high-order SNV correlations(IEEE, 2015) Samani, S. S.; Huang, Z.; Ayday, Erman; Elliot, M.; Fellay, J.; Hubaux, J.-P.; Kutalik, Z.As genomic data becomes widely used, the problem of genomic data privacy becomes a hot interdisciplinary research topic among geneticists, bioinformaticians and security and privacy experts. Practical attacks have been identified on genomic data, and thus break the privacy expectations of individuals who contribute their genomic data to medical research, or simply share their data online. Frustrating as it is, the problem could become even worse. Existing genomic privacy breaches rely on low-order SNV (Single Nucleotide Variant) correlations. Our work shows that far more powerful attacks can be designed if high-order correlations are utilized. We corroborate this concern by making use of different SNV correlations based on various genomic data models and applying them to an inference attack on individuals' genotype data with hidden SNVs. We also show that low-order models behave very differently from real genomic data and therefore should not be relied upon for privacy-preserving solutions.Item Open Access SplitGuard: Detecting and mitigating training-hijacking attacks in split learning(Association for Computing MachineryNew YorkNYUnited States, 2022-11-07) Erdogan, Ege; Küpçü, Alptekin; Çiçek, A. ErcümentDistributed deep learning frameworks such as split learning provide great benefits with regards to the computational cost of training deep neural networks and the privacy-aware utilization of the collective data of a group of data-holders. Split learning, in particular, achieves this goal by dividing a neural network between a client and a server so that the client computes the initial set of layers, and the server computes the rest. However, this method introduces a unique attack vector for a malicious server attempting to steal the client's private data: the server can direct the client model towards learning any task of its choice, e.g. towards outputting easily invertible values. With a concrete example already proposed (Pasquini et al., CCS '21), such training-hijacking attacks present a significant risk for the data privacy of split learning clients. In this paper, we propose SplitGuard, a method by which a split learning client can detect whether it is being targeted by a training-hijacking attack or not. We experimentally evaluate our method's effectiveness, compare it with potential alternatives, and discuss in detail various points related to its use. We conclude that SplitGuard can effectively detect training-hijacking attacks while minimizing the amount of information recovered by the adversaries. © 2022 Owner/Author.Item Open Access UnSplit: Data-Oblivious model inversion, model stealing, and label inference attacks against split learning(Association for Computing Machinery, 2022-11-07) Erdoǧan, Ege; Küpçü, Alptekin; Çiçek, A. ErcümentTraining deep neural networks often forces users to work in a distributed or outsourced setting, accompanied with privacy concerns. Split learning aims to address this concern by distributing the model among a client and a server. The scheme supposedly provides privacy, since the server cannot see the clients' models and inputs. We show that this is not true via two novel attacks. (1) We show that an honest-but-curious split learning server, equipped only with the knowledge of the client neural network architecture, can recover the input samples and obtain a functionally similar model to the client model, without being detected. (2) We show that if the client keeps hidden only the output layer of the model to ''protect'' the private labels, the honest-but-curious server can infer the labels with perfect accuracy. We test our attacks using various benchmark datasets and against proposed privacy-enhancing extensions to split learning. Our results show that plaintext split learning can pose serious risks, ranging from data (input) privacy to intellectual property (model parameters), and provide no more than a false sense of security. © 2022 Owner/Author.