Data Perturbation

Definition - What does Data Perturbation mean?

Data perturbation is a form of privacy-preserving data mining for electronic health records (EHR). There are two main types of data perturbation appropriate for EHR data protection. The first type is known as the probability distribution approach and the second type is called the value distortion approach. Data pertubation is considered a relatively easy and effective technique in for protecting sensitive electronic data from unauthorized use.

Techopedia explains Data Perturbation

Data pertubation has been hailed as a more effective application of data protection in health care than de-indentification/re-identification due to the higher probability that attacks could take place which link public data sets to original identifiers or subjects. For this very reason, data pertubation is hailed as a more solid application when it comes to EHR security.

The probability distribution approach takes the data and replaces it from the same distribution sample or from the distribution itself. The value distortion approach perturbs data by multiplacative or additive noise, or other randomized processes. It is considered to be more effective than the former type of perturbation. This approach builds decision tree classifiers where each element is assigned random noise from the Gaussian distribution, for instance. By data mining, the original data distribution is rebuilt from its perturbed version. However, critics point to the fact that random additive noise can be filtered which can result in EHR privacy compromises.