Large-Scale Privacy-Preserving Matching and Search
Riazi, M. Sadegh
Koushanfar , Farinaz
Master of Science
The past few decades have witnessed considerable efforts for achieving a Privacy- Preserving Computing (PPC) scheme where the input data of each engaging party is not revealed to any other party. PPC, in fact, prevents any data leakage or invasion of privacy. While there exist provably secure solutions, they suffer from high communication overheads and therefore they cannot scale for large-scale datasets. This thesis proposes several novel techniques to overcome this limitation and opens a new door for real-world applications. In this thesis, we focus on two of the most important branches in this area which are matching and search. In the process of matching, two groups of individuals are interested to be matched with each other given certain rules and based on their preference list while keeping all preference lists private. In the process of search, a user holding a query wants to find the most similar profile (with a specific similarity metric) in a bank of profiles while keeping both query and bank private. Our approach is based on a well-known protocol called Garbled Circuit (GC) protocol which can securely evaluate any function of choice. Previous works suffer from immense overheads and lack of scalability. In this thesis, we introduce a new set of techniques that makes the GC-based protocols more efficient and make them scalable. We also study a new paradigm which is based on Randomized Embeddings to develop a new framework for a faster privacy-preserving search that can deliver unprecedented speed up. We have developed a framework that introduces a privacy/accuracy trade-off and can process search query for Millions of users in real-time, an infeasible task prior to this work. Proof-of-concept evaluations on large-scale datasets prove the practicality and scalability of our approach. For the matching case, we evaluate our proposed solution on a bank of one million different genomes. Privacy-preserving stable matching is performed on a set size as big as National Residency Matching Program (NRMP). The last experiment is performed on the largest MovieLens dataset which contains hundreds of thousands different user profiles that is used to recommend a movie to a user based on the most similar profile without compromising the privacy of user and profiles in the dataset.
Privacy-preserving computing; secure function evaluation; circuit design