Probability

We all make predictions everyday, pundits and media bloggers more so than the rest of us. Business folks in startups and enterprises are no exceptions – we predict the market size, our share of the market, revenue growth, effect of a marketing campaign etc. These predictions drive our decisions to launch a product, enter a new market or acquire a new business.

Most if not all of these predictions are just made up, with no legs in data, with no appetite for refinement and made solely to convey confidence than nothing else. Most predictions are black and white with no room for gray areas. This is because leaders are encouraged to show confidence and what better way to show it than by assertion. And confidence of course is usually confused with competence.

Those who want a way out deliberately make vague predictions like, “it is not that far out when Amazon Kindle will be free” and others line up escape clauses like, “if we execute well on the market opportunity it is all ours to take”. Worse, the ultimate copout which is seemingly well balance but wrong, “50-50 chance“.

We all making predictions with no basis in reality, no understanding of base rate, no intention of seeking data to refine our estimate, no desire to state our prediction as livelihoods, and definitely no patience to explain why we believe our prediction is correct.

There is a better way – Probabilistic Thinking. Jason Zweig of The Wall Street Journal points us to a book called Superforecasting and the associated website Good Judgement Project that show us and teach us how to get better in predictions and overall decision making.

You can cultivate these same skills by visiting GJOpen.com and joining the next round of the tournament. Or you can try refining your own thinking.

I have written here on decision making under uncertainty and how scenario analysis and likelihood assignments can help us make better decisions. The Good Judgement Project kicks it up several notches and offers us a solid framework to improve our prediction skills and decision making skills.

Take a look at this question at GJOpen.com on predicting Twitter CEO situation by the end of this year.

Unlike the media pundits who state with confidence, “it is going to be Jack,” this question asks you to consider more options and assign probabilities to each outcome. In addition it asks you enter a rationale to explain your choice of probabilities. You can’t get away with, “50-50 chance Jack will be appointed CEO”.

When you take what they teach you in this project to predictions and decision making in your business you are bound to improve quality of decisions and business output. I am 90% certain you will improve your decision making because this method teaches you a repeatable, defensible, data driven three step approach.

Start with the base rate – What is supported in the history? If 90% of startups before you took 7 years to get to $200 million annual run rate, start with the notion yours will take as long.

Ask what needs to get done to exceed base rate – If you re going to grow faster what needs to line up, what do you need to do? Look at those that grew faster than the base rate and see what drivers helped them. Are those endogenous drivers (actions the company took) or exogenous (external and random)? Write down your estimate of likelihood of you beating the base rate and your rationale.

Take action. Collect Data. Refine the estimate and repeat – Decision making is not an event, it is an iterative process.

Are you ready to check your need to speak in absolutes, make bold predictions that have no uncertainties and instead practice probabilistic thinking?

Like this:

The post is titled, “10 horrifying stats about display advertising”. In a attempt to tell stories or relate arcane data to something common the author goes on to make some likelihood comparisons

You are more likely to complete NAVY SEAL training than click a banner ad.

You are more likely to get a full house while playing poker than click on a banner ad.

You are more likely to get a full house while playing poker than click on a banner ad.

You are more likely to birth twins than click a banner ad.

You are more likely to get into MIT than click a banner ad.

You are more likely to survive a plane crash than click on a banner ad.

It is pointless and simply wrong to make such comparisons based on respective frequentist probabilities. Let us say there is a one tenth of one percent of people who see display ad click on it. Let us us say one percent of people who sign up for Navy SEAL complete the training.

That does not mean these two are comparable nor can you say that chances of ANYONE completing Navy SEAL training is far better than clicking on display ad. What is missing here are the hidden hypotheses we take for granted (the context).

Think of those who sign up for Navy SEAL. Consider their drive, motivation, physical fitness, mental strength, initial screenings they survived to get to training stage. You already have weeded out people like us. If indeed 1% of people who attempt SEAL training complete, it is the conditional probability

P(Complete SEAL training | Passed all screenings and have wherewithal to complete it)

This is not the probability of any random person you pick from street, which is highly likely close to 0.

It is indeed horrifying that they would compare such unrelated events and their conditional probabilities to make their case about display advertising.

Now to the title of this article. Determining whether or not this is relevant comparison of likelihoods is left as an exercise to the reader.

Suppose you asked me the probability of picking a red ball from a urn that has 10 red balls and 10 green balls, I would say the answer is 1/2. I cannot say with certainty what the next pick will be but if you picked a ball enough times I can say 50% of those instances will be red balls (yes you return the ball after each pick).

But what if you brought a mystery urn (of unknown size) and didn’t reveal how many red and green balls are in the urn? Heck, what if there were lot more colors than just red and green? I would have no idea. My best approach would be to guess a number (using my gut feel or intuition) and say something like 1/1000.

But as someone too analytically bent I would find it hard to play this game. You didn’t tell me how many total balls, the different colors and whether the urn had a red ball or not. I cannot even apply Bayesian reasoning to refine my answer.

Entrepreneurial outcomes, says Saras Sarasvathy, is like estimating chances of picking red balls from such a mystery urn. According to Sarasvathy entrepreneurial thinking is

Whatever the initial distribution of balls in the urn, I will continue to acquire red balls and put them in the urn. I will look for other people who own red balls and induce them to become partners and add to the red balls in the urn. As time goes by, there will be so many red balls in the urn that almost every draw will obtain one. On the other hand, if I and my acquaintances have only green balls, we will put them in the urn, and when there are enough, will create a new game where green balls win

It is hard not to imagine someone you know doing exactly this – hustling to find those with red balls to add to their urn (discover a few early adopters and build on them) or pivoting to redefine the game as picking green balls not red balls.

That is the clear distinction between the mind of an entrepreneur with irrational optimism and that of rational person. And I do not use rational and irrational to say one is better than other rather use irrationality as equally positive trait as rationality (or rationality as equally negative trait as irrationality), like the way Dan Ariely did.

Rational people, there are many of us, refuse to play the guessing game of picking red balls from unknown mystery urns. But if picking red balls is important to significantly improve our lives, someone has to do it. However no single person can keep on trying with a single urn. And most run out of cash and time before they can add enough red balls to their urn.

That is why we need many many entrepreneurs with irrational optimism to keep picking balls from their own mystery urns. Most end up picking all kinds of different balls but a few will find urns and change it in such a way to always draw red balls.

In a recent article in Inc magazine, Evernote CEO, Mr. Phil Libin, wrote

” there is a good chance that it will be worth $100 billion in a few years”

You likely want to ask what “good chance mean”.

Mr. Libin wrote this in the context of Evernote’s current one billion valuation and comparing it valuation of The New York Times. Mr. Libin’s makes a very valid point that such comparisons are point less and valuations are based on future expected value from a business’ growth.

I agree.

Most public companies have relatively predictable levels of growth, so their valuations are heavily based on the current values of their businesses. In other words, few investors expect The New York Times‘s profits to grow tenfold in the next few years.

Such valuations on future growth are valid as long as they are computed by taking into account all possible future scenarios and not just the most optimistic outcomes. In many cases, and I don’t mean it is the case with Evernote, we not only overestimate the size of positive outcomes but also overestimate the chances of such outcomes. In such cases the valuations become segregated from reality.

Back to the $100 billion valuation for Evernote. What would it look like?

Let us say it gets the same revenue multiple of 5.51 (say 5 for ease of math) as Google. That would mean $20 billion in yearly revenue. Where would that come from?

From its current sources I estimate that Evernote makes $63 to $84 million a year from 34 million users (1.4 million paying subscribers). If the current business model is the only option that would mean one of following (or combination)

Every customer generates $45 a year, meaning 444 million paying customers (13 times current user numbers and 31 times current paying subscribers)

50% paying customers, meaning 888 million users

100 million customers (not users), meaning $200 a year revenue per customer – that means either their subscription price goes up or they found other ways to monetize customer. $200 a year just from subscription does not make sense (NYTimes yearly subscription costs $195 and it did not find 100 million subscribers). Regarding other revenue sources even Google and Facebook have not found a way to get $200.

Even if Evernote does deals like Moleskine tie-up that generate $4-$6 million a year, that is a larger number of deals to get to $20 billion a year sales.

That leaves other sources of revenue that are not yet known from its current strategy. Which means one must consider higher uncertainty in such large outcomes given insufficient information.

Mr. Libin said, “there is a good chance”. Given what is known today and the uncertainties I am not sure what “good chance” means. But given the current valuation of $1 billion, investors seem to think the expected value of the valuation (considering all good and bad chances) is $1 billion. Or in other words, the numeric value of good chance is much less than 1%.

A question you must ask is,

Is there also ‘good chance’ of $200 million valuation? (See: Zynga)

Finally I am not going to run a complete scenario analysis here as I have done for other valuations before. That is left as a homework for you.

The title refers to the famous anecdote about Marissa Mayer testing 40 shades of blue to determine the right color for the links. (Unfortunately I am colorblind, I know just one blue.)

Mayer is famous for many things at Google, but the one that always sticks out – and defines her in some ways – is the “Forty Shades of Blue” episode.

she ordered that 40 different shades of blue would be randomly shown to each 2.5% of visitors; Google would note which colour earned more clicks. And that was how the blue colour you see in Google Mail and on the Google page was chosen.

Thousands of such tests happen in the web world, every website running multiple experiments in a day. Contrary to what most in webapp development may believe AB testing does not have its origins in webapp world. It is simply an application of statistical testing, Randomized Control Trial, to determine if a ‘treatment’ made a difference on the performance of treatment group compared to performance of control group.

The simplest test is testing if the observed difference between the two sample means are statistically significant. What that means is measuring the probability, p-value, the difference is just random. If p-value is less than a preset level we declare the treatment made a difference.

“I have published about 800 papers in peer-reviewed journals and every single one of them stands and falls with the p-value. And now here I find a p-value of 0.0001, and this is, to my way of thinking, a completely nonsensical relation.”

Should you test 40 shades of blue to find the one that produces most click-thrus or conversions? xkcd has the answer:

Can Ms. Mayer test the way out of Yahoo’s current condition? Remember all these split testing are about finding lower hanging fruits not quantum leaps. And as Jim Manzi wrote in his book Uncontrolled,

Perhaps the single most important lesson I learned in commercial experimentation, and that I have since seen reinforced in one social science discipline after another, is that there is no magic. I mean this in a couple of senses. First, we are unlikely to discover some social intervention that is the moral equivalent of polio vaccine. There are probably very few such silver bullets out there to be found. And second, experimental science in these fields creates only marginal improvements. A failing company with a poor strategy cannot blindly experiment its way to success …

Last week NPR’s Morning Edition had a piece on President Obama’s poll numbers after the second debate. As their discussion turned to his chances of winning based on Intrade betting numbers,

MONTAGNE: OK, so got all those numbers. Finally, we checked the betting crowd. Intrade, which bills itself as the world’s relieving prediction market, runs an online betting service whose participants put the odds of Barack Obama winning the election at 65 percent over Mitt Romney at 35 percent.

GREENE: But, Renee, odds only mean so much. That same service said there was a 75 percent chance the U.S. Supreme Court would strike down a national healthcare law. And the court beat those odds.

Notice the bolded text and you can see the fallacy in Greene’s argument or his lack of understanding of probabilities.

First these are very uncertain events to predict and the outcome can depend on many different variables. Modeling methods like Monte-Carlo simulation and prediction markets like Intrade try to estimate the level of uncertainty (by placing a lower and upper bound on them). The net result is a probability distribution of different outcomes and in this case whether or not President Obama will win this November.

When a model states one outcome is more likely than the other and in reality the other outcome happens one should not treat this as failure of the model. If there is no doubt who will win, there is no uncertainty, then one does not need any of these tools. By definition there is uncertainty in the outcome. The modeling indicates one outcome (say, Obama win) is more likely than the other.

The model is doing its job perfectly well. What it really states is, “if we were to imagine million different ways of running the 2012 election, more of them show Obama win over Romney win”.

Another point is confusing results of previous model on an unrelated event with the current one. There is no relation between Supreme Court’s decision and election results.

To say, “Odds only mean so much”, as a way to dismiss all predictive models is just plain wrong.