Know your data 15: the false promise of data correction

It's a good thing that FTC is making some noise about regulating the snooping done by online services. (link) It's not a good thing that the measures described in the article ("tools to view, suppress and fix the information") do not solve the fundamental problem, and are likely counter-productive.

What's the fundamental problem?

Imagine a world in which you walk into your supermarket. When you check out, you are required to provide your home address and phone number; otherwise, you can shop at the store down the block. Imagine a world in which they take a snapshot of your face at the checkout counter. Imagine a world in which every package you send through UPS is scanned and the contents recorded.

Not long ago, these actions are considered invasion of privacy. Today, websites large and small, and mobile/tablet apps, are doing all of the above and more.

FTC thinks the problem is "transparency". Not quite, the fundamental issue is propriety.

Why are the proposed measures inadequate?

There is a big difference between providing tools to suppress the information and not collecting the information in the first place. The word "suppress" is ambiguous. It can mean a lot of things. Many of the websites out there do not delete user information; if you delete your information, it just severs future access. Archived copies of past data are pretty easy to find. Besides, your data are likely to still be present in the website's servers, and are possibly still being used by algorithms.

A simple blanket do-not-track option should be made available but websites have moved in the opposite direction. Yahoo! recently informed users they no longer honor do-not-track requests -- other major websites didn't even bother with do not track.

The blanket option is the only option that is viable. Otherwise, consumers will be forced to scroll through thousands of attributes to check off everything that they want suppressed.

The other measure of allowing consumers to correct the data is even worse. By correcting the data, the individual has given implicit consent to its collection and continual usage.

Why are the measures counter-productive?

Being able to correct data sounds like a good idea but the devil is in the details.

Let's say John's age is currently listed as 16. John corrects it to 25. Now what? The data vendor will continue to receive information about John's age, from public records, etc. The new data may contradict John's correction. If the vendor now overwrites John's entry, does this violate the law? Who's to arbitrate which version of the "truth" is the truth? If the vendor is not allowed to overwrite John's entry, then the data become "dead" each time a consumer "corrects" it.

The biggest problem with correcting data is gaming. I discussed this years ago in my first book, Numbers Rule Your World (Chapter 2) in the context of credit bureau data. It is naive to think that consumers will provide truthful data about themselves. Given that we know that they use this data to set insurance rates, to decide who gets a marketing offer, etc., our incentive is to create a better version of ourselves. The only errors that would be corrected are those that paint the consumer in a worse light (think late payments misattributed to you). If the errors artificially inflate the consumer's positive qualities, why would one fix them? (think true late payments not reported to the bureau)

In addition, there is an unavoidable bias in who are correcting the data. In the case of credit bureau data, for example, people who have bad credit scores are much more likely to want to review and correct their data than people who have great credit scores.

The end result of such fixing is to take the data further from reality, add human mischief to the mix, and deteriorate the performance of algorithms. That's why open data correction is counterproductive.

***

The data industry wants to self-regulate. I agree with this stance but the substance is lacking. What has the industry done to regulate themselves?

The industry should be talking about the blanket do-not-track option. It should be defining the minimum set of data it needs. It should be quantifying the incremental value of collecting more data. That means not collecting data that are of no utility.

The industry is confident that the additional data bring tremendous benefits to consumers; we keep hearing about more relevant ads! more personalized marketing! better user experience! better customer service! etc. For all the confidence, it is afraid of do-not-track. Why should that be?