Years of Citing Articles

Bookmark

OpenURL

Abstract

Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacy-preserving data mining techniques. A large fraction of them use randomized data distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random value distortion technique in privacy preservation. The paper notes that random objects (particularly random matrices) have “predictable ” structures in the spectral domain and it develops a random matrix-based spectral filtering technique to retrieve original data from the dataset distorted by adding random values. The paper presents the theoretical foundation of this filtering method and extensive experimental results to demonstrate that in many cases random data distortion preserve very little data privacy. 1.

Citations

...signal processing literature [12] offers many filters to remove white noise from data and they often work reasonably well. Randomly generated structures like graphs demonstrate interesting properties =-=[7]-=-. In short, randomness does seem to have “structure” and this structure may be used to compromise privacy issues unless we pay careful attention. The rest of this paper illustrates this challenge in t...

...bit us from extracting the hidden information? This section presents a discussion on the properties of random matrices and presents some results that will be used later in this paper. Random matrices =-=[13]-=- exhibit many interesting properties that are often exploited in high energy physics [13], signal processing [16], and even data mining [10]. The random noise added to the data can be viewed as a rand...

...omains is facing growing concerns. Therefore, we need to develop data mining techniques that are sensitive to the privacy issue. This has fostered the development of a class of data mining algorithms =-=[2, 9]-=- that try to extract the data patterns without directly accessing the original data and guarantees that the mining process does not get sufficient information to reconstruct the original data. This pa...

...ut that in many cases the noise can be separated from the perturbed data by studying the spectral properties of the data and as a result its privacy can be seriously compromised. Agrawal and Aggarwal =-=[1]-=- have also considered the approach in [2] and have provided a expectation-maximization (EM) algorithm for reconstructing the distribution of the original data from perturbed observations. They also pr...

... the original data (which could be used to guess the data value to a higher level of accuracy). However, [1] provides no explicit procedure to reconstruct the original data values. Evfimievski et al. =-=[5, 4]-=- and Rizvi [15] have also considered the approach in [2] in the context of association rule mining and suggest techniques for limiting privacy breaches. Our primary contribution is to provide an expli...

... the original data (which could be used to guess the data value to a higher level of accuracy). However, [1] provides no explicit procedure to reconstruct the original data values. Evfimievski et al. =-=[5, 4]-=- and Rizvi [15] have also considered the approach in [2] in the context of association rule mining and suggest techniques for limiting privacy breaches. Our primary contribution is to provide an expli...

...by exchanging only the minimal necessary information among the participating nodes without transmitting the raw data. Privacy preserving association rule mining from homogeneous [9] and heterogeneous =-=[19]-=- distributed data sets are few examples. The second approach is based onsdata-swapping which works by swapping data values within same feature [3]. There is also an approach which works by adding rand...

...omains is facing growing concerns. Therefore, we need to develop data mining techniques that are sensitive to the privacy issue. This has fostered the development of a class of data mining algorithms =-=[2, 9]-=- that try to extract the data patterns without directly accessing the original data and guarantees that the mining process does not get sufficient information to reconstruct the original data. This pa...

...a (which could be used to guess the data value to a higher level of accuracy). However, [1] provides no explicit procedure to reconstruct the original data values. Evfimievski et al. [5, 4] and Rizvi =-=[15]-=- have also considered the approach in [2] in the context of association rule mining and suggest techniques for limiting privacy breaches. Our primary contribution is to provide an explicit filtering p...

...a � variable . We will consider asymptotics such that in the limit ����� as , we §���� ����� have , ����� , and ����� ������� ����� ��� , where ��� � . Under these � assumptions, it can be shown that =-=[8]-=- the empirical c.d.f. ������� converges in probability to a continuous distribution � ��������� function for � every , whose probability density function (p.d.f.) is given by � � ����� � � ��� �������...

...omized value distortion technique for learning decision trees [2] and association rule learning [6] are examples of this approach. Additional work on randomized masking of data can be found elsewhere =-=[18]-=-. This paper explores the third approach [2]. It points out that in many cases the noise can be separated from the perturbed data by studying the spectral properties of the data and as a result its pr...

...rices and presents some results that will be used later in this paper. Random matrices [13] exhibit many interesting properties that are often exploited in high energy physics [13], signal processing =-=[16]-=-, and even data mining [10]. The random noise added to the data can be viewed as a random matrix and therefore its properties can be understood by studying the properties of random matrices. In this p...

...ults that will be used later in this paper. Random matrices [13] exhibit many interesting properties that are often exploited in high energy physics [13], signal processing [16], and even data mining =-=[10]-=-. The random noise added to the data can be viewed as a random matrix and therefore its properties can be understood by studying the properties of random matrices. In this paper we shall develop a spe...

...ke independent component analysis. However, projection matrices that satisfy certain conditions may be more appealing for such applications. More details about this possibility can be found elsewhere =-=[11]-=-. Acknowledgments The authors acknowledge supports from the United States National Science Foundation CAREER award IIS-0093353, NASA (NRA) NAS2-37143, and TEDCO, Maryland Technology Development Center...

...sing randomized techniques. The perturbed data is then used to extract the patterns and models. The randomized value distortion technique for learning decision trees [2] and association rule learning =-=[6]-=- are examples of this approach. Additional work on randomized masking of data can be found elsewhere [18]. This paper explores the third approach [2]. It points out that in many cases the noise can be...