Data scientists: Question the integrity of your data

If there’s one lesson website traffic data can teach you, it’s that information is not always genuine. Yet, companies still base major decisions on this type of data without questioning its integrity.

"Here's the dirty secret: If you ever want to have a really, really good mobile campaign in terms of click-through rate, just show the ad on the Flashlight app. Independent of the brand and the product, the click-through rates are the highest on the Flashlight app and some other game [apps].

"Now with that data available we can optimise [a predictive model] towards these things, and you end up with a population that is completely uninterested in your product.

"A click is a click, whether or not it was an intentional request for more information about the product is a completely different question. It's the human interpretation that's typically wrong, and it often requires a geek to question the typical interpretation."

Noisy geographic data is also a problem that many people miss, as there's a lot of hype around this kind of data at the moment, especially in marketing, Perlich said.

"It turns out location is actually nothing 80 percent of the time."

She gave an example of an analysis showing population accumulating astronomically in a rural part of the US, which doesn't seem likely.

"It's the geographic mid point of the US, which many of the ad requests default to when they have to send latitude and longitude but don't really know where you are. There is a very, very small percentage of geolocation data that is reliable," she said.

"The same is true for probabilistic matching of devices. Is it really the same person who is sitting on that laptop or holding that mobile? You can tell with certainty for a small percentage of people but you can tell with much less certainty for a larger percentage of people," she added.