A privacy-preserving solution for the bipartite ranking problem on spark framework

buir.advisorGüvenir, Halil Altay
dc.contributor.authorFaramarzi, Noushin Salek
dc.date.accessioned2017-08-07T08:06:48Z
dc.date.available2017-08-07T08:06:48Z
dc.date.copyright2017-07
dc.date.issued2017-07
dc.date.submitted2017-08-03
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2017.en_US
dc.descriptionIncludes bibliographical references (leaves 50-54).en_US
dc.description.abstractThe bipartite ranking problem is defined as finding a function that ranks positive instances in a dataset higher than the negative ones. Financial and medical domains are some of the common application areas of the ranking algorithms. However, a common concern for such domains is the privacy of individuals or companies in the dataset. That is, a researcher who wants to discover knowledge from a dataset extracted from such a domain, needs to access the records of all individuals in the dataset in order to run a ranking algorithm. This privacy concern puts limitations on the use of sensitive personal data for such analysis. We propose an efficient solution for the privacy-preserving bipartite ranking problem, where the researcher does not need the raw data of the instances in order to learn a ranking model from the data. The RIMARC (Ranking Instances by Maximizing Area under the ROC Curve) algorithm solves the bipartite ranking problem by learning a model to rank instances. As part of the model, it learns a weight for each feature by analyzing the area under receiver operating characteristic (ROC) curve. RIMARC algorithm is shown to be more accurate and efficient than its counterparts. Thus, we use this algorithm as a building-block and provide a privacy-preserving version of the RIMARC algorithm using homomorphic encryption and secure multi-party computation. In order to increase the time efficiency for big datasets, we have implemented privacy-preserving RIMARC algorithm on Apache Spark, which is a popular parallelization framework with its revolutionary programming paradigm called Resilient Distributed Datasets. Our proposed algorithm lets a data owner outsource the storage and processing of its encrypted dataset to a semi-trusted cloud. Then, a researcher can get the results of his/her queries (to learn the ranking function) on the dataset by interacting with the cloud. During this process, neither the researcher nor the cloud can access any information about the raw dataset. We prove the security of the proposed algorithm and show its efficiency via experiments on real data.en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2017-08-07T08:06:48Z No. of bitstreams: 1 thesis.pdf: 2065376 bytes, checksum: db80bc1b4458df5351576b4cc78a16e9 (MD5)en
dc.description.provenanceMade available in DSpace on 2017-08-07T08:06:48Z (GMT). No. of bitstreams: 1 thesis.pdf: 2065376 bytes, checksum: db80bc1b4458df5351576b4cc78a16e9 (MD5) Previous issue date: 2017-08en
dc.description.statementofresponsibilityby Noushin Salek Faramarzi.en_US
dc.embargo.release2019-08-03
dc.format.extentxii, 54 leaves : charts ; 29 cmen_US
dc.identifier.itemidB156077
dc.identifier.urihttp://hdl.handle.net/11693/33533
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectBipartatiteen_US
dc.subjectRanking Problemen_US
dc.subjectData Miningen_US
dc.subjectData Privacyen_US
dc.subjectSparken_US
dc.titleA privacy-preserving solution for the bipartite ranking problem on spark frameworken_US
dc.title.alternativeİki taraflı sıralama problemine spark çerçevesinde gizliliği koruyan bir çözümen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
1.97 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: