Browsing by Subject "Data Privacy"

Now showing 1 - 2 of 2

Open Access
Privacy-preserving collaborative analytics of location data
(2017-09) Yılmaz, Emre
Deriving meaningful insights from location data helps businesses make better decisions. While businesses must know the locations of their customers to perform location analytics, most businesses do not have this valuable data. Location data is typically collected by other services such as mobile telecommunication operators and location-based service providers. We develop scalable privacy-preserving solutions for collaborative analytics of location data. We propose two classes of approaches for location analytics when businesses do not have the location data of the customers. We illustrate both of our approaches in the context of optimal location selection for the new branches of businesses. The rst type of approach is retrieving the aggregate information about the customer locations from location data owners via privacy-preserving queries. We de ne aggregate queries that can be used in optimal location selection and we propose secure two-party protocols for processing these queries. The proposed protocols utilize partially homomorphic encryption as a building block and satisfy differential privacy. Our second approach is to generate synthetic location data in order to perform analytics without violating privacy of individuals. We propose a neighborhood-based data generation method which can be used by businesses for predicting the optimal location when they have partial information about customer locations. We also propose grid-based and clustering-based data generation methods which can be used by location data owners for publishing privacy-preserving synthetic location data. Proposed approaches facilitate running optimal location queries by businesses without knowing their customers' locations.
Open Access
A privacy-preserving solution for the bipartite ranking problem on spark framework
(2017-07) Faramarzi, Noushin Salek
The bipartite ranking problem is defined as finding a function that ranks positive instances in a dataset higher than the negative ones. Financial and medical domains are some of the common application areas of the ranking algorithms. However, a common concern for such domains is the privacy of individuals or companies in the dataset. That is, a researcher who wants to discover knowledge from a dataset extracted from such a domain, needs to access the records of all individuals in the dataset in order to run a ranking algorithm. This privacy concern puts limitations on the use of sensitive personal data for such analysis. We propose an efficient solution for the privacy-preserving bipartite ranking problem, where the researcher does not need the raw data of the instances in order to learn a ranking model from the data. The RIMARC (Ranking Instances by Maximizing Area under the ROC Curve) algorithm solves the bipartite ranking problem by learning a model to rank instances. As part of the model, it learns a weight for each feature by analyzing the area under receiver operating characteristic (ROC) curve. RIMARC algorithm is shown to be more accurate and efficient than its counterparts. Thus, we use this algorithm as a building-block and provide a privacy-preserving version of the RIMARC algorithm using homomorphic encryption and secure multi-party computation. In order to increase the time efficiency for big datasets, we have implemented privacy-preserving RIMARC algorithm on Apache Spark, which is a popular parallelization framework with its revolutionary programming paradigm called Resilient Distributed Datasets. Our proposed algorithm lets a data owner outsource the storage and processing of its encrypted dataset to a semi-trusted cloud. Then, a researcher can get the results of his/her queries (to learn the ranking function) on the dataset by interacting with the cloud. During this process, neither the researcher nor the cloud can access any information about the raw dataset. We prove the security of the proposed algorithm and show its efficiency via experiments on real data.