An asymptotically more efficient solution allows sensitive information, like hospital records, to stay secure.
by Lauren Delwiche
Suppose two governments want to determine which names appear on both of their no-fly lists. However, they are unwilling to share their actual lists with each other or anyone else for security reasons. How can they efficiently accomplish the sharing of data?
PhD student Phi Hung Le of George Mason University, under the supervision of S. Dov Gordon, has developed a solution to this private set intersection problem. His algorithm meets all of the security criteria for the situation; its results are accurate and nothing extra is revealed to any party other than what can be inferred from the output.
However, there are already algorithms that can find and compute functions across the intersection of private sets like Le’s does. His solution is distinguished by its major efficiency advantage, achieved using a three-party oblivious sort to order the data. Previous solutions use just two parties in their oblivious sorts.
Background
Le’s efficiency improvement is valuable in the computer science community, which puts a large emphasis on optimizing the time it takes an algorithm to run and the amount of data that it sends across the networks in relation to the length of the inputs. A notation called Big-O is often used represent the time and data used. For example, if an algorithm has a Big-O of n, denoted as O(n), the time it takes to run increases linearly with the size of the input being processed. When the input size is doubled, so is the runtime. If the algorithm is O(n2), the time it takes to run quadruples when the input size is doubled.
Generic oblivious sorts done with only two-parties run in O(n*logn) time, which is very slow when large amounts of data need to be processed. Le's sort algorithm accomplishes the same task, but in O(n) time, which is faster for all but very small lists.
Security
This quicker oblivious sort algorithm relies heavily on a third-party computational helper, which comes with risks, as introducing a potentially dishonest party could jeopardize the security or accuracy of the information exchange. To mitigate the risks, Le uses many cryptographic techniques in his sort, including data authentication and validation. He also utilizes hashing, the encoding of data with a function to make it look random. These are important to ensure that, in the presence of at most one dishonest party, the calculations remain secure and accurate.
Algorithm
Le's algorithm begins when the two parties with inputs, the clients, agree on a common hash function and hash all of their data points. Then, all three parties agree on a long string of random characters called a message authentication code and sample distinct parts of it. This code is used to guarantee that no party can alter the inputs, intermediate results or the final results during the execution of the algorithm.
Now the oblivious sort begins, which is where Le is able to greatly increase the efficiency of the entire process using his third party. The server connects to each client individually and shuffles the data using different permutations that appear to be random to the clients, but actually combine to sort the data. These “random” permutations prevent both clients from determining what data belongs to whom. After performing the sort, the three parties verify that the data and the message authentication code agree.
The data is then ordered locally on the server by its hash code, so that the first part of the array contains all of the data common to both sets. An equality check is run to ensure that this step was performed correctly and then a comparison circuit is run to ensure that the leftover data is in sorted order at the end of the array and does not have duplicates. Finally, the three parties execute a circuit to compute the desired function over the data in the intersection. The output is then revealed to both clients.
Implications
There are many applications of this algorithm. Besides being used for communicating sensitive information between governments, it can be used to share hospital records for patients or by companies and ad vendors to judge the effectiveness of ads.
However, there are still many open questions relating to private set intersection, such as how to achieve the O(n) efficiency offered by Le’s algorithm without the overhead of a third party. Solving problems like this will allow algorithms to become more feasible for everyday use.