Data-themed articles, essays, and studies

The Pace of Veracity

Let me start with this, since it has been in the news: there is no such thing as an “alt-fact.” Facts are statements of agreed-upon consensus, and so by definition there is no “alt” when it comes to a “fact.” When someone uses that term they are deceiving us, intentionally or otherwise.

There can be “statements” and if you like, “alt-statements.” Anyone can make any statement, at any time. It might be true, kind of true, not very true, or simple nonsense. As a stand-alone thing, it can be hard to tell which level of truth applies to any particular statement.

And that’s the point. Facts are the product of comparison and reference to other things that are understood – they don’t stand alone. Even something as simple as “The sky is blue” requires consensus on the issues of what “sky” and “blue” means – and trust me, if you look hard enough you can find an argument on those very points.

Because a fact references other facts, it takes time to establish them – facts are made, not born. Likewise, the pace of veracity – of fact generation – is much slower than that of statement generation. As a society, we’ve become rather good at generating and transmitting statements and answers, but far less good at generating facts and understanding. The dynamics of creating a genuine fact is a major reason why that’s the case.

Both in professional and personal settings, our real issue is to separate statements from facts, or more precisely to decide what information should be established as genuine facts. To assume a fact incorrectly leads to wrong answers; while to validate low-value (or unused) statements wastes time and effort.

I often talk with people who find this state of affairs frustrating and uncontrolled, particularly when sources we’ve traditionally relied on treat the idea of a fact as malleable. Where do you start?

I believe we start with this: if we think a statement comprises a relevant important fact, we should ask ourselves why. I actually care less about the reason we select, than that there is a reason. Facts don’t have to be a “big” things. We might decide the weather forecast can be treated as a fact because the source I used is usually good enough. That’s fine. We might realize that we regard something as a fact without a good reason – that’s a great time to re-evaluate. Maybe there is a better reason, or maybe for this statement veracity just isn’t that big a deal. Non-fact statements aren’t necessarily wrong, they just don’t have the burden of veracity attached to them. I don’t need a great reason for eating oatmeal at breakfast each Monday – it’s just what I do. But if veracity matters, so does our “why” reason – whatever it is.

I believe the “why” associated with our facts is going to become a bigger issue in professional environments than it has been in the past. Many solid professionals work hard to validate databases and show that relied-on numbers are facts, not simple statements. But if we were to pick a particular number and ask “why is this a fact,” most of us would be hard-pressed to say much other than that the database was validated, with reference to a validation suite. Validation is good, but not always by itself good enough to provide confidence to a skeptical user. And increasingly, when definitions of facts are malleable, we will be confronted with savvy and skeptical users. I believe part of the answer, perhaps ironically, is to validate less, by understanding which data are really being used to provide facts for users. This “impact analysis” usually isn’t difficult. Another part of the solution also isn’t difficult – it’s to provide summary analytics reporting that demonstrates where a particular fact lies in the universe of its fellows. A fact similar to many others has the weight of numbers on its side; while those far from most others may deserve additional scrutiny. None of this requires a data science Ph.D. “or equivalent.” (By my estimate, it requires two to three weeks of concentrated training.) It can provide better support for the true bedrock of all information systems – that the facts we consume are reliable – ultimately meaning they are sensible with reference to other facts we understand.