Monday, 14 December 2015

No, "Big Data" Can’t Predict the Future

We've been told that with enough data, we can use sophisticated computing
methods to predict the future. That often works with the physical sciences,
acknowledges Per
Bylund in this guest post, but predicting human action is
something else altogether...

With Google’s dominance in the online search engine market we
entered the Age of Free. Indeed, services offered online are nowadays expected
to be offered at no cost. Which, of course, does not mean that there is no cost
to it, only that the consumer doesn’t pay it. Early attempts financed the
services with ads, but we soon saw a move toward making the consumer the
product. Today, free and un-free services alike compete for “users” and then
make money off the data they collect.

Data has always been used, but what’s new for our time is the very low (or
even zero) marginal cost for collecting and analysing huge amounts of data. The
concept of “Big Data” is taking over and is predicted to be “the future” of
business.

There’s a problem here, and it is the over-reliance on the Law of Large
Numbers in social forecasting. Statistical probabilities for events may
mathematically converge to the mean, but is it applicable in the real world? The
answer is most definitely yes in the natural sciences. Repeated controlled
experiments will weed out erroneous explanations or causes to phenomena, at
least assuming we’re good enough at separating and controlling those causes.

What about the social sciences? In this age of scientism, as Hayek called it,
we’re told “Big Data” will completely transform production, logistics, and
sales. The reason for this is that vendors can better target customers and even
foresee what they might want next. Amazon.com does this on their web site in
crude form, where they make suggestions based on your purchase history and what
others with similar purchase histories have searched for. Sometimes it works,
and sometimes it doesn’t.

There is some regularity to our interests and behaviour. All of us are, after
all, human beings — and we’re formed in certain cultures. So one American with
interests x, y, and z may have other interests similar to another American who
also has an interest in x, y, and z.

Human Behaviour Is Unpredictable

But similarity is not the same thing as prediction. Amazon.com’s suggestions
or the highly annoying ads following you around web sites are useful methods for
sellers because they can somewhat accurately identify what not to offer.
Exclusion of very low-probability interests increases the probability for
suggesting something that the person behind the eyeballs focusing on the
computer screen may be interested in.

To use as prediction, however, exclusion of almost-zero probability events is
far from sufficient. Indeed, prediction requires that we are able to accurately
exclude all but one or a couple highly probable outcomes. And we have to be able
to rely on that these predictions turn out to be true.
Otherwise we’re just
playing games, and so we’re making guesses. Sure, they’re educated guesses
(because we’ve excluded the impossible and almost-impossible), but they’re still
games and guesses.

Where Big Data Fails

Speaking of guesses, Microsoft’s Bing search engine, which powers the Windows
digital assistant Cortana among other things, has produced a prediction
engine with the purpose of predicting sports and other results. They rely on
very advanced algorithms and huge amounts of collected data.
Amazingly, they did very well initially and predicted
the outcomes of the Soccer World Cup perfectly. So maybe we can use Big Data
to get a glimpse of the future?

No, not so. The Bing teams are learning a lesson only Austrian economists
and, more specifically, Misesian praxeologists, seem to be alone in grasping:
that there are no constants in human action, and therefore that predictions of
social phenomena are impossible. Pattern predictions, as Hayek called them,
may not be impossible, but predictions of exact magnitudes are. For instance, we
can rely on economic law (such as “demand curves slope downward”) to estimate an
outcome such as “the price will be lower than it otherwise would have been,” but
we can’t say exactly what that price will be.

When it comes to sports, reality shows and other competitions between
individuals or teams, the story is exactly the same. The team with a better
track record doesn’t always win. Why? They have objectively performed better
than the other team, perhaps exclusively so, but this doesn’t say anything about
the future. We’re not here referring to the philosophical doubt as in “will the
sun shine tomorrow?” (maybe something changes completely the sun’s ability to
shine during the night).

The Social Sciences Are Different

In the social sciences we’re dealing with complex phenomena. Action and,
especially, its outcome is the result of a complex system of social
interaction, psychology, and much more. Are the players in both teams as
motivated and focused as they were before? Did anything in their personal lives
affect their mindsets or psyches? How do the players within their teams and
players in other teams react on each other before and during the game? A team
with a poor track record can upset a team with an objectively better track
record; this happens all the time. Sometimes for the sole reason that the better
team underestimates the worse team, or because the underdog feels no pressure to
perform and therefore plays less defensively.

Bing’s prediction engine struggles with this, just as we would predict. As Windows
Central reported recently, the prediction engine had its “worst week yet”
picking only four of fourteen winners in the NFL. Overall, its track record was
approximately two-thirds right and one-third wrong (95–53). It’s definitely
better than tossing a coin, but pretty far from actually predicting the results.
In other words, if you’re placing bets you may want to use the Bing
prediction engine. That is, unless you have the type of tacit, implicit
understanding of what’s going on that the engine is missing. Maybe you can beat
it, or maybe not. In either case, you cannot count on coming out a victor each
and every time.

The reason for this is that the outcome simply cannot be predicted perfectly
— or even close to it. Even the players themselves cannot predict who’ll win a
game, but they may have inside information about whether their own team seems
motivated and focused. It is not a perfect method, however, and it certainly
cannot be scientific.

Even with Big Data there’s no predicting of social events — there’s only
guessing.

Yes, guessing with access to huge amounts of data is easier, at least if the
data is reliable and relevant. But a good guess is not the same thing as a
prediction; it is still a guess, and it can be wrong.

Winning every time
requires luck.
Per Bylund is Assistant Professor of
Entrepreneurship and Records-Johnston Professor of Free Enterprise in the School
of Entrepreneurship at Oklahoma State University.
Visit his website at PerBylund.com.
This post first appeared at the Mises Daily.
Image source: iStockphoto

7 comments:

A Minor point: Services such as google make money mostly because our privacy laws are archaic.

A modern update of traditional understanding of privacy could quickly put Google out of business.

Another point: as a wise man has said, "social science" is neither social or science. By its very nature in cannot be science in the modern meaning of the term. Here we move beyond mere scientism and into pseudo science. This tendency is further exacerbated by the term "data science", which from a scientific perspective is literally an unintelligible utterance. It is not science; moreover, there can hardly be a "science" about "data" per se.

In actuality, marketers have been using statistics for years; all things considered all "big data" does is broaden the sample set.

the whole craze shows a deep lack pf seriousness and a profound misunderstanding of what science, or even mathematics is about.

I think that the Professor doesn't understand what prediction, estimation, forecasting are? He needs to learn about those concepts first (mathematical context) before spewing out his word-smithing guesses.

The professor is ignorant about the analytics & big-data field & that's my point. How can someone railed against big data when in fact he has zilch knowledge of what analytics is all about.

Quote "It’s definitely better than tossing a coin, but pretty far from actually predicting the results."

The professor's quote above simply says it all. Does he understand what prediction means or he's just babbling? One predicts an event 'A' with a probability (upper-bound/lower-bound). That's what prediction is in the field of analytics. It doesn't say that event 'A' will occur at date 'B' with probability of 1. Prediction says that event 'A' will happen with a confidence C percents. That's it. This is how human mind works. The mind works in probabilistic reasoning, not certainty.

The mind works like the following without the person being aware of it that the mind does roughly probability calculation in his/her mind. It builds into our mechanism of thinking. Objectivist can go further & say, aha, this is one part of knowledge integration without the mind (the person) being aware that itself is computing probabilistic scenarios in his/her thinking to make decisions. Objectivists know what knowledge integration is about. But when probe further to describe what it is quantitatively, then you can't get one, but you're being bombarded with, oh, it's how the mind combines separate nuggets of knowledge to form new facts or nuggets. Well, such explanation is too vague to understand. But how? The answer is what I've just stated above. When the mind computes & weights facts/scenarios in his/her mind, it is in fact doing knowledge integration. Such knowledge integration is often vague (fuzzy & imprecise) or uncertain (probabilistic). When a person is making decision, then prediction (either fuzzy or probabilistic) enters his/her thought. The mind does its job by weighing/computing its premises to come up with roughly accurate consequent or facts.

Quote : "It is concerned with how our minds are related to reality, and whether these relationships are valid or invalid."

"What is Epistemology?"http://www.importanceofphilosophy.com/Epistemology_Main.html

It is easy to understand the description above even to 5 year olds. The question is how does the mind do it? How can the mind determine the validity of a hypothesis that's consistent with the qualitative description above? I just explained it in my previous posts. The mind is a computation engine, with or without the person being aware of mathematics or not, but that's how the mind works. In fact all humans are mathematicians (the minds) in their thinking process without any knowledge of math at all.

1. Commenters are welcome and invited. 2. All comments are moderated. Off-topic grandstanding, spam, and gibberish will be ignored. Tu quoque will be moderated.3. Read the post before you comment. Challenge facts, but don't simply ignore them.4. Use a name. If it's important enough to say, it's important enough to put a name to.5. Above all: Act with honour. Say what you mean, and mean what you say.