High Dimensional Data Anonymous Publication and Updates to Confidential Databases

Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. The authors propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate Nearest-Neighbor (NN) search in high-dimensional spaces, which is efficiently performed through Locality-Sensitive Hashing (LSH). Suppose Alice owns a k-anonymous database and needs to determine whether her database, when inserted with a tuple owned by Bob, is still k-anonymous.