Privacy Preserving Data Mining over Vertically Partitioned Data

Download

Author

Jaideep Vaidya

Tech report number

CERIAS TR 2004-40

Entry type

phdthesis

Abstract

The goal of data mining is to extract or ``mine'' knowledge from large amounts of data. However, data is often collected by several different sites. Privacy, legal and commercial concerns restrict centralized access to this data. Theoretical results from the area of secure multiparty computation in cryptography prove that assuming the existence of trapdoor permutations, one may provide secure protocols for \emph two-party computation as well as for \emph multiparty computation with honest majority.
However, the general methods are far too inefficient and impractical for computing complex functions on inputs consisting of large sets of data. What remains open is to come up with a set of techniques to achieve this efficiently within a quantifiable security framework. The distributed data model considered is the heterogeneous database scenario with different features of the same set of data being collected by different sites. This thesis argues that it is indeed possible to have \emph and \emph techniques for useful privacy-preserving mining of knowledge from large amounts of data. The dissertation presents several privacy preserving data mining algorithms operating over vertically partitioned data. The set of underlying techniques solving independent sub-problems are also presented. Together, these enable the secure ``mining'' of knowledge.