Web content publishers spend sizeable amounts of time and money on content quality control: fact checking, spell checking, grammar checking, copy editing, proofreading and house style checks. Mistakes are to be eliminated at all costs.

If you own a large data set, minor deliberate errors can help to protect your data from plagiarism:

Fictitious entries, also known as fake entries, Mountweazels, and Nihilartikels, are deliberately incorrect entries or articles in reference works such as dictionaries, encyclopedias, maps and directories. […] By including a trivial piece of false information in a larger work, it is far easier to demonstrate that someone has plagiarized that work: they will presumably copy the fictitious entry along with other articles. (Source: Fictitious Entry, Wikipedia)

Notable examples of fictitious entries include Trap Streets – fictitious streets or geographical tweaks on maps – and Esquivalience, a fake dictionary entry (defined as, “the wilful avoidance of one’s official responsibilities“) that was successfully used to prove ownership.

If “data is the new oil“, this technique will become increasingly prevalent, whether we realise it or not.

Canary Traps and Honeytokens

A similar approach can be used to identify unreliable content partners or content delivery channels; this is especially useful given the contemporary attitude towards freely sharing digital content.

A canary trap is a method for exposing an information leak, which involves giving different versions of a sensitive document to each of several suspects and seeing which version gets leaked. (Source: Canary Trap, Wikipedia)

and

Honeytokens are honeypots that are not computer systems. Their value lies not in their use, but in their abuse. […] An example of a honeytoken is a fake email address used to track if a mailing list has been stolen. (Source: Honeytoken, Wikipedia)

Gmail accounts offer a little-known feature that enable you to potentially track the source of mailing list spam. A + symbol can be placed after your username in the Gmail email address, followed by arbitrary words. Emails to this address will still reach your inbox, but you’ll have a better idea of their origin.

Using my Gmail address of danzambonini@gmail.com as an example, I can sign-up to a car insurance website with danzambonini+carinsurance@gmail.com. The emails will still reach me, but if I receive any spam using the same address, it can be directly attributed to the car insurance company selling my details (and a Gmail filter added to handle it, accordingly).