In my last lean startup blog on measurement, I talked about using a Minimal Viable Product (MVP) to test hypothesis derived from leap of faith assumptions
contained in the startup vision.

In the case of Joe's lemonade stand (see previous blog), the first leap of faith was that the customer would buy the lemonade. Customers purchased the lemonade but not in the amounts he was looking for (10 customers). Joe then tested the price people would pay for it starting off at a premium price of $1.50 a glass. He measured sales volume at that price and another price ($1.00 a cup) and concluded that it was better to sell at the lower price because the volume more than compensated for the lower price. Joe is making progress towards operating a successful lemonade stand.

In this blog I want to look at the process of hypothesis testing in more detail and see how it maps onto some of the terms we have been using.

Let H be a class of hypothesis and h be a specific hypothesis.

Let M be a class of measurement outcomes and m be a specific measurement outcome.

We can use Modus Ponens (latin for "the affirming mode") to draw conclusions about whether our hypothesis is true:

If H=h Then M=m
M=m
-------------
H=h

We can also use Modus Tollens (latin for "the denying mode") to draw conclusions about whether our hypothesis is true:

If H=h Then M=m
Not M=m
--------------
Not H=h

So far we are in the realm of formal logic and these two forms of inference are foundational in guiding automated forms of inference.

We can cross over into the realm of informal logic by using the P( ) operator around all our assertions, where P stands for "the probability of".

So Modus Ponens now looks like this:

P(If H=h Then M=m)
P(M=m)
------------------
P(H=h)

And Modus Tollens now looks like this:

P(If H=h Then M=m)
P(Not M=m)
------------------
P(Not H=h)

It is this form of Modus Ponens and Modus Tollens that we are dealing with when we test our startup assumptions. The application of scientific
methods to startup hypothesis does not necessarily yield clear cut answers, but answers where one hypothesis might seem be better supported
by the evidence than another hypothesis, without being able to completely rule out an alternative hypothesis.

In the case of the learning platform company Grockit (see previous blog), they were adding new peer-learning features to their learning platform and not seeing any effects on their metrics. They concluded that the learner only wanted peer-learning up to a point, then the learner wanted to engage in solo mode learning. A logical possibility was also that Grockit didn't zone in on the proper peer-learning approach yet. The alternative hypothesis is not completely ruled out by testing and measurement, but made sufficiently implausible that a pivot was deemed necessary.

We will be getting into the topic of pivoting in the next blog, but it is important to note here that deciding when to pivot or not is made difficult by
the fact that the original and alternative hypothesis may each have merit making it difficult to decide what to do.

Recognizing that probabilities are involved can be helpful in deciding what decision making framework you want to use in your startup hypothesis testing. If P(H=h) is .6 perhaps that is enough certainty to go by in situations of irreducible uncertainty (you don't have the time or resources to achieve greater certainty).

You could examine formula-laden articles on sequential A/B testing and bayesian A/B testing to try to figure out when to stop collecting data and what to conclude (which I recommend reading), but I'm also interested in a more practical approach based on using informal logic to evaluate the probability of the premises P and the probability of the inference (i.e., P(if P Then C)) to arrive at a probability of the conclusion C of an argument.

P(If P Then C)
P(P)
---------------
P(C)

The evaluation of the premises and the inferences is based upon informal logic techniques appropriate to criticizing scientific arguments, combined with common sense, to assign probabilities to each premise. The evaluation of the premises and the inferences of the argument determines the evaluation you assign to the conclusion. Bayesian forms of informal logic may also involve assigning a prior probability to the conclusion so that the posterior probability of the conclusion can be evaluated.

P(If P Then C)
P(P)
P(C)
---------------
P(C)

Whether these probabilities are to be combined additively or multiplicatively to yield the posterior conclusion is worth thinking about, although multiplicative combination tends to used more often and to work better. Informal logic nowadays often involves creating a graphical representation of the argument. Below is how we might graphically express this Bayesian approach to evaluating arguments (where hypothesis testing is just one type of argument). The premises (e.g., the measurements and other assumptions) appear at the top with lines connecting them to the conclusion. The lines are your inferences (if P1 then C, if P2 then C). The prior probability of the conclusion C (based on previous knowledge) appears next to the premises as a separate contribution to the posterior conclusion probability C. The posterior probability of the conclusion at the bottom is what you get when you combine your prior probability of C and a likelihood estimate (the left side of the argument below).

The purpose of this blog was to dig a bit deeper into what startup hypothesis testing might involve from an formal and informal logic perspective. I am not a practicing logician and this is not a peer reviewed discussion so you may or may not find this a useful framework to use when approaching the problem of testing the leaps of faith that your startup vision implies.

Inspiration for this blog and the argument evaluation diagramming comes form my undergraduate mentor Wayne Grennan and his book Informal Logic (1997).

Ian Flemming in his excellent book Lean Logic (2016) has this to say about the relationship between informal and formal logic.

It sounds banal, but the syllogisms of formal logic are the building blocks of reasoning, which - in combination with a series of conditions, affirmed or denied in sequence and in parallel - can develop into a problem-solving capacity of great complexity, used as the logical structure on which artificial intelligence is based.

Informal logic is, of course, the junior partner in all this, since it depends on the reasoning of formal logic, and its mixing up of logic and content is exactly what you cannot do with formal logic. On the other hand, without content, logic has no purpose. Formal logic is the road, informal logic is the journey.
~ p. 165

In this blog, I want to get under the hood of what causes a profit distribution (which I have discussed in my last three blogs).

One cause of a Profit Distribution Function (PDF) is one or more Profit Generating Functions (PGF).

A profit generating function simulates expected profits based upon a set of parameters that are fed into it.

An example would be a line-of-business that involves shearing sheep for the wool fiber they produce. If you are at the beginning of the sheep shearing season, and are trying to estimate your profits for the end of the upcoming sheep shearing season, you would need to estimate how much money you might make per kg of wool fiber, how much wool fiber each sheep might produce (affected by heat, rain, nutrition, genetics), how many sheep you will have to shear at the future date, the fixed costs of raising your sheep, and the variable costs of raising each sheep. Each of these factors will have a range of uncertainty associated with them. The uncertainty associated with the price per kg and amount of wool in kgs per sheep are illustrated below in the tree diagram below.

The full calculation of how much you will make at the end of a season is a function of the values that each of these parameters might reasonable attain over the forecast period. A profit generating function will sample from each pool of uncertainty according to the distributional characteristics of that parameter and then use some arithmetic to generate a single possible profit value. When the profit generating function is re-run many times, it will generate a large number of possible values that can be graphed and this graph would look like your estimated profit distribution, or something that approximates it.

When estimating the probability to assign to each profit interval for Google (see Google 2013 Profit Distribution), we could constrain our estimates based upon the profit generating functions we believed were critical to generating the actual amount of profit they might attain. The profit generating function for adwords might include the estimated average cost per click and the volume of clicks over a given period (among other factors). Or, we could ignore the profit generating function and estimate our values on something less concrete but still significant - the level of goodwill that will exist towards Google over the forecast period (e.g., big brother privacy concerns creating negative sentiment), or social network rivals taking more of the advertising budget of companies, or search engine rivals like Yahoo gaining more market share, etc... As a Bayesian you are free to base your subjective estimates upon whatever factors you feel are the most critical to determining the actual profit of Google. In certain cases, you might want to rely more upon what your profit generating functions might be telling you. It could be argued that it is always a good idea to construct a profit generating functions for a company just so you understand in concrete terms how the company makes money. Then you can choose to ignore it in your profit forcasts, or not, or base you estimate on a blend of profit generating functions modified by subjective Bayesian factors.

What I am here calling a Profit Generating Function, is somewhat akin to what I have referred to as a Business Model in the past. If you want some ideas for how profit generating functions could be implemented, I would encourage you to examine my blog entitled A Complete and Profitable Business Model. Perhaps in a future blog I will try my hand at implementing a profit generating function that samples from several pools of uncertainty to deliver a forecast profit, and which will generate a profit distribution when re-run many times.

In my last 2 blogs, I discussed the idea of a profit distribution. I argued that it is better to estimate profit using a profit distribution rather than a single most-likely value (e.g., we should make 100k next year). A distribution is more epistemically informative than a single most-likely value. I'll illustrate what I mean by this in today's blog on the shapes of uncertainty.

In this blog, I want to focus on what to look for in a profit distribution. A profit distribution can have many shapes and these shapes are quite informative about the type and level of uncertainty involved in an estimate.

It is useful to acquire the skill of reading and interpreting a profit distribution. That skill involves attending to significant aspects of the distribution shape and understanding what the shapes mean.

Flat Profit Distribution

If our profit distribution for Google was flat, this would mean that our level uncertainty was the same over all the profit intervals. In the graph below, the estimated profit could fall within the full range of values with the same probabiliy (i.e.,16.6%) of being in any interval. Some Bayesian textbooks advise that you start with a flat distribution if you have no strong convictions where an estimated parameter might lie.

Peaked Profit Distribution

In a peaked profit distribution one of the intervals has significantly more probability mass than other profit intervals. This refects an increased level of certaintly that the estimated profit will actually be within that interval. As we acquire more information about the company and its lines of business (e.g., second quarter financials), we might expect that our profit distribution estimate would begin to change shape in this manner first.

Shrunk Profit Distribution

As we learn even more about a company and their lines of business, then the range of possible profiit outcomes should be reduced so that instead of a Google profit range running from 10.0b to 12.4b, perhaps it only covers the range from 10.8b to 12.0b (see below). We show our confidence in our prediction by how narrow our profit distribution is. This does not necessarily change the shape of the profit distribution, it changes the x axis of the profit distirbution (both shapes might be peaked, but they would be on x axis with different ranges of possible values).

Conclusion

The shape of a profit distribution tells us alot about the nature of the uncertainty surrounding our estimate of profit. We have seen that our confidence in an estimate is reflected in how peaked our profit distribution is and how shrunk the range of possible profits are. This suggests strategies one might adopt to increase confidence in an estimate - gather information that helps you establish a more peaked profit distribution and that helps you reduce the range of the profit distribution.

In this article we have examined three ways in which a profit distibution can appear on a graph - flat, peaked, or shrunk. There are other aspects of shape that we have not examined, namely, the skew factor and the kurtosis factor (second and third moments of the distribution). Using these shape controls, we might be able to approximate the peaked distribution above as a normal distibution with a skew and kurtosis setting that would help match a theoretical normal distribution to the estimated profit distribution. A normal distribution is an example of a function that generates points on a probability curve (sums to 1) based upon the values fed into it (i.e., mean, standard devitation, skew, kurtosis, x-values). We might want to take this additional step of creating a profit distribution function if we thought it would simplify calculations (or thinking) or if we thought it was a better representation of the data than a discrete historgram of possible profit intervals. Step functions are potentially limited as a means of representing the actual shape of our uncertainty about a parameter.

My last article on the concept of a profit distribution was a bit abstract and lacked a graphic. I wanted to correct this situation by constructing a profit distribution for a company we can all relate to - Google.

In order to make this example realistic, I wanted to know how profitable Google is on a year-to-year basis. To find this info, I consulted Google's investor relations area, specifically their 2013 Financial Tables. Here I learned that the net income for Google in 2011 was approx $9.7 billion, in 2012 it was approx. $10.7 billion. I used these values to come up with some reasonable bounds for their expected profit in 2013 (e.g., between 10 billion and 12.4 billion). I divided up this range in units of .4 billion and estimated the probability that Google's net income (or profit) would fall in each interval. This is what I came up with.

The shape of the profit distribution function reflects my belief that Google will continue to grow and that my best guess is that they will grow by another billion in profit next year. I also believe that there is a greater chance they will earn less than than this amount than that they will earn more than this amount.

Notice that if you sum the percentages (e.g., by converting 45% to .45) that they sum to 1 as all good probability distributions should. My uncertainty regarding the expected profit of Google in 2013 is best captured by a range of probability assignments to profit intervals, than by a single point estimate of how much they might make next year. I don't know that much about Google's business lines and how they will perform this year, but I'm able to use my general knowledge and recently acquired financial statements to come up with a 2013 Profit Distribution for Google. This could be considered my "prior" distribution for Google, one that can be updated according to Bayesian logic as more information comes in.

I used JpGraph library to generate this graph. I modified an example graph from the JpGraph site. FYI, here is the code I used to generate the graph.

One factor that an investor takes into account when deciding whether or not to invest in a company is the expected profit that company might make in the near and the longer term.

So how should we represent the expected profit of a company?

One approach that I think might be useful involves diagramming the expected profit distribution of the company. The profit distribution graph would consist of a subjective estimate of the probability that the company will make a given amount of profit over a specified time frame. The Y axis of the graph is labelled "Probability". The X axis of the graph is labelled "Profit". To construct the graph involves estimating the probability that the company will make specific amounts of profit
(e.g., 10k to 20k, 20k to 30k, 30k to 40k, 40k to 50k, 50k to 60k, 60k to 70k). So we assign a probability to the event that a company will make 10k to 20k in profit next year. Then we assign a probability to the event that a company will make between 20k and 30k and so on up to our 70k limit (the range and intervals chosen will vary by company). In
this manner we can construct a profit distribution.

The profit distribution that is constructed should be constrained so that the mass of the
probability distribution sums to 1. If you constrain it in this manner than you can potentially
do bayesian inference upon the profit distribution. This could be in the form of conditionalizations
that involve saying that given some factor A (e.g., money invested) the profit distribution function
will shift - the mean of the profit distribution would ideally go up by an amount greater than the
money invested.

So far in my discussions of Bayesian Angel Investing, I have used Bayesian techniques in an objective
manner. The inputs into Bayes formula were objectively measurable entities. In the case of generating
the profit distribution function for a company, we are subjectively assigning probabilities to
possible outcomes. There is no set of trials we can rerun to establish an objective probability
function for the profit distribution of a company (i.e., the relative frequency of different profit
levels for the same company repeated many times with profit levels measured). The probability that
is assigned to a particular profit level should reflect your best estimate of how likely a given
profit level is for the compaany within a particular timeframe. So, what is the probabiity that
Google will make between X1 billion and X2 billion next year (e.g., .10)? What is the probability that
Google will make between X2 and X3 (e.g., .40). Assign mass to the intervals in such a way that the
probability mass of all the intervals sums to 1. Then you will meet all the technical requirements for
a distribution to be considered a probability distribution. All the probability axioms are satisfied.

Why go through all this bother to estimate the how profitable a company might be? Why not just
ball-park a value that you think is most likely and leave it at that.

One reason is because one number does not adequately represent your state of uncertaintly about the
outcome.

Another reason has to do with modelling risk. Usually when you model risk you don't use one number to do so. Those modelling risk usually like to work with probability distributions, not simple point estimates of the most likely outcome. It provides a more informative model of the uncertainty associated with a forecast.

Also, if you are constructing a profit distribution function for a company there is no reason to hide that information from the company you want to invest in or from co-investors. The profit distribution function, because it is inspectable, can be updated with new information from the company and other investors who might offer strategic capabilities. So the transparency and inspectability of the uncertainty model are also useful features of this approach.

A bit of housekeeping first. To keep track of my discussion of topics related to Bayesian Inference, I have created a blog category called "Bayesian Inference". You can click on the category link Bayesian Inference to see how my earlier blogs prepare the groundwork for my later blogs on Bayesian inference. If you are new to this topic, I recommend reading my oldest Bayesian inference blog first and then reading each one up to my most recent Bayesian inference blog. Later blogs build on earlier blogs.

Bayes inference techniques are not limited to helping angel investors optimize their investment decision, they can also be used by entrepreneurs to optimize their startup decision making. For example, entrepreneurs must make decisions about how they should invest their startup capital in order to maximize their return on investment. Imagine that you are a new farmer and must make a decision about whether to invest in buying wheat seed for the upcoming growing season. To make an optimal decision here you might begin by estimating the joint probability of getting 28 cm or more of rain during the wheat growing season AND that your wheat yield will be 7800 kg/ha or more. You might estimate this value by tallying the number of instances of (rain >= 28 cm and wheat yield >= 7800 kg/ha) and dividing this by the total number of observations you have on rain amount and wheat yield. Lets assume the P(R>=28 cm & Y>=7800 kg/ha) = .18. From historical records you might also estimate that the probability of getting a rain amount >= 28 cm to be .21. Now using our definition of conditional probability P(H|E)=P(H&E)/P(E), we calculate P(Y>=7800 kg | R >= 28 cm) as follows:

This tells us that the probability of getting a good yield from our wheat is fairly high if we get 28 cm or more of rain during the wheat growing season. The probability of getting 28 cm of rain or more is, however, only .21 so we might want to examine other rainfall amounts and yield amounts to see if there is a good yield value for a more probable rain fall amount. This is how a startup farmer might go about making an optimal decision regarding whether to purchase wheat seed for the upcoming growing season. It might be noted that there is a very high correlation between rainfall amounts and wheat yield (correlation coefficient of .95) so of all the variables that a farmer might take into account in making a seed purchase decision, an investigation into rainfall amounts and wheat yields is a particularly important relationship to examine when projecting a probable return on investment. Don't waste your time calculating probabilities based upon factors that don't really matter that much.

There are two ways to make decisions - analytically or non-analytically. Making a decision analytically requires the quantification of the main elements in your decision problem so that you can compute answers. The main reason entrepreneurs might want to bother with analytic decision making is if they can make better decisions by adopting an analytical approach versus a non-analytical approach (perhaps "intuitive" would be a more favorable word to use). In some ways this dichotomy is false because most "analytic" decisions involve a combination of analytic and intuitive problem solving, however, it is worth emphasizing the distinction because the role of analysis in entrepreneurial decision making is not an aspect of entrepreneurship that is discussed much. It is worth examining whether Bayesian inference techniques might be useful for entrepreneurs to learn because they lead to more success. It is difficult to say whether this is true or not because the idea of Bayesian entrepreneurship has not been studied or promoted to date. Maybe this blog will help change this state of affairs by offering some instruction on how Bayesian inference techniques might be applied in entrepreneurial decision making.

In my last blog I showed how to compute the likelihood term P(E|H) in Bayes formula which is shown below:

P(H|E) = P(E|H) * P(H) / P(E)

In today's blog we will be using the likelihood values we previously computed in order to predict startup success based upon the evidence of two diagnostic tests P(H|E). Here is the data table we created in the last blog with likelihoods appearing in parenthesis.

Tests

Outcome

# Startups

++

+-

-+

--

S

1200

650 (.54)

250 (.21)

250 (.21

50 (.04)

U

8800

100 (.01)

450 (.05)

450 (.05)

7800 (.89)

Total

10,000

This data table provides us with all the information we need in order to use Bayes Theorem to predict the probability of startup success given evidence from two diagnostic tests. To compute the posterior
probabilities for each hypothesis given different evidence patterns, we will use a simple bayes_wizard.php script. Let me show you how it works.

When we point our browser at the bayes_wizard.php script (in a php-enabled web folder), the first screen asks us to input the number of hypothesis and test labels:

The next screen asks us to input the labels for the hypothesis and tests. We use S to mean successful startup and U to mean unsuccessful startup. We use ++ to indicate a positive outcome on two diagnostic tests, -- to indicate a negative outcome on two diagnostic tests, and so on.

Next we are asked to enter the prior probability of the different hypothesis (i.e., P(H=S) and P(H=U)). These are just the fraction of the 10,000 startups classified as successful or unsuccessful.

The next screen asks up to input the likelihood for each combination of test and hypothesis. We enter the likelihoods we computed in our last blog in this screen (see values in parenthesis in table above):

The final screen displays the posterior probabilities for each hypothesis given each evidence pattern:

The way to interpret this table is to examine each row separately. In the first row where we have two diagnostic tests with positive outcomes, we see that the posterior probability that the startup is successful is significantly higher (.88) that the probability that the startup is unsuccessful (.12). So, a startup exhibiting this pattern of diagnostic evidence is quite likely to be successful. Our posterior probability calculation allows us to move from an inital estimate of 12 percent probability of startup success to an 88 percent probability of startup success.

The diagnostic tests that might be used could be anything that might be predictive of startup success. We could, for example, assess a startup's business plan with respect to a checklist of desirable attributes and score it as pass + or fail -. The Bayes Wizard allows you to specify as
many tests and hypothesis as you want. It is up to you to come up with the hypothesis you want to examine and the number and kind of tests you want to use. You should look for empirical information about the covariation between your tests and outcomes so that you can compute the required likelihood terms.

If you have been following my last few blogs, you should now have a good sense of how you can begin to use Bayes inference to arrive at better Angel Investment decisions. If you want to see how the wizard works under the hood and how the Bayes theorem calculation is implemented, you can download the code from my GitHub account.

Where the symbol ~ means "is proportional to". The equation says that the probability of an hypothesis given evidence P(H|E) is equal to
the likelihood of the evidence P(E|H) given the hypothesis multiplied by a prior assessment of the probability of our hypothesis P(H).
The likelihood term plays a critical role in updating our prior beliefs. So how is it computed and what does it mean? That is what
will be discussed today.

Below I have fabricated a data table consisting of 10,000 startups classified as successful S (1200 instances) or unsuccessful U (8800 instances). In a previous blog, I reported a finding that claimed the success rate of first time startups is 12% which equates to 1200 instances out of 10,000. The data table also includes the outcome of two diagnostic tests. A positive outcome on both tests is denoted ++, while a negative outcome is denoted --. Each cell displays a joint frequency value and a corresponding likelihood value for the relevant combination of diagnostic tests and startup outcomes.

Tests

Outcome

# Startups

++

+-

-+

--

S

1200

650 (.54)

250 (.21)

250 (.21

50 (.04)

U

8800

100 (.01)

450 (.05)

450 (.05)

7800 (.89)

Total

10,000

Computing a likelihood from this data table is actually a simple calculation involving the formula:

P(E|H) = P(H & E) / P(H)

To calculate the likelihood of two positive tests given that a startup is successful P(E=++|H=S), we divide the joint frequency of the evidence E=++ when a startup is successful H=S (which is 650) by the frequency of startup success H=S (which is 1200). So 650/1200 is equal to .54 which is the value in parenthesis beside 650 in the table above. To calculate the likelihood of two positive tests given that a startup is unsuccessful P(E=++|H=U), we divide the joint frequency of the evidence E=++ when a startup is unsuccessful H=U (which is 100) by the frequency that a first time startup is unsuccessful H=U (which is 8800). So 100/8800 is equal to .01 which is the value in parenthesis beside 100 in the table above.

The likelihood calculation tells us which hypothesis makes the evidence most likely. In this case, the hypothesis that the startup is successful makes the positive outcome of our two diagnostic tests (E=++) more likely (.54) than the hypothesis that the startup is unsuccessful (.01). We can examine the likelihood values in each column to determine
which hypothesis makes the diagnostic evidence more likely. You can see why the likelihood values are important in updating our prior beliefs about the probability of startup success. We can also appreciate why some would argue that likelihood values are sufficient for making decisions - just compare the relative likelihood of the different hypothesis given the evidence.

In this blog, I'll be doing a bit of algebra to show you that our conditional probability formula P(H|E) = P(H & E) / P(E) is equivalent to
P(H|E) = P(E|H) * P(H) / P(E). This latter form of the equation is the version that people most often refer to as Bayes theorem. They
are mathematically equivalent, however, in different circumstances it is easier to work with one versus the other. A Bayesian
Angel Investor will need to master this Bayes theorem version of the conditional probability equation. This version of the equation includes a term P(E|H) called the likelihood term which is also critical for a Bayesian Angel Investor to understand and master. We will briefly discuss this term, leaving a more detailed discussion until next week when I will dedicate a blog to the likelihood concept.

The derivation of Bayes theorem follows naturally from the definition of conditional probability:

P(H|E) = P(H & E) / P(E)

Using some simple algebra (moving terms from one side to the other), this equation can be rewritten as:

P(H & E) = P(E | H) * P(E)

The same right-hand value can also be computed using E as the conditioning variable in the right-hand part of the equation:

P(H & E) = P(H | E) * P(E)

Given this equivalence, you can write:

P(H|E) * P(E) = P(E|H) * P(H)

We can now substitute P(E|H) * P(H) for P(H & E) and arrive at Bayes theorem:

P(H|E) = P(E|H) * P(H) / P(E)

Notice that this formula for computing a conditional probability is similar to the original formula with the exception that the joint probability P(H & E) that used to appear in the numerator has been replaced with an equivalent expression P(E|H) * P(H).

We can simplify this equation further by pointing out that P(E), the probability of the evidence, is just a mathematical convenience that ensures that when we compute all our conditional probabilities P(H|E), they collectively sum to 1. Conceptually, we can eliminate it from our equation by making the weaker claim that P(H|E) is proporational to P(E|H) * P(H):

P(H|E) ~ P(E|H) * P(H)

What this simplified equation is saying is that the probability of an hypothesis (e.g., startup success) given the evidence (e.g., tests diagnostic of startup success) is proportional to the likelihood of the evidence P(E|H) times the prior probability of the hypothesis P(H). When making decisions, we don't necessarily need to know the probability of success exactly, just that the success probability is quite a bit bigger than the failure probability. This is why this simpler version of Bayes theorem is still useful even though it only expresses a proportional relationship and not a full identity.

In order to update our prior probability of first-time startup success from .12 (or 12%) given the evidence of some diagnostic tests, we need to multiply our prior assessment of first time startup success P(H) by a factor called the likelihood P(E|H). The likelihood term is obviously doing alot of the heavy lifting in terms of updating our prior beliefs.

In my next blog, I will discuss how likelihoods can be computed from a data table using the conditional probability equation P(E|H) = P(E & H)/P(H) and other techniques. Some statisticians argue that likelihoods are good enough for decision making, that you don't have to incorporate prior probabilities P(H) into calculations to figure out the most probable outcome. These statisticians are afraid of introducing a subjective element (e.g., your prior assessment P(H) of the relative probability of different outcomes) into decision making. Bayesians argue that this subjective element makes the probability calculations more intelligent and contextually sensitive. An angel investor with lots of business experience should have at their disposal a mathematical tool that allows them to use their experience in making startup investment decisions. Bayesian inference techniques offer the promise of being that tool.

One of the pieces of data you should have in your mind as a Bayesian Angel Investor is the prior probability that a startup will be successful. According to Funders and Founders the success rate for first time startups is 12%, going up to 20% if the founder failed in their first effort, and up to 30% if they are a veteran (3 or more kicks at the can).

One way to look at this data is that the success percentages for a startup go up from 12% if you conditionalize your estimate on knowledge about how many times the startup has attempted to start a company. So this could be viewed as one evidence factor to consider when evaluating whether a company will be successful or not (e.g., number of startup attempts).

Another aspect of this data to note is that while 12% may seem like a small percentage, it is not so small (say 1%) that new knowledge is going to keep the conditional probabilities so low that you cannot make a confident decision. Early screening for breast cancer (e.g., at age 40) is difficult, in part, because the base rate of breast cancer at age 40 is so low (1%) that even if you do have a fairly good test (80% true positive rate), and that test is positive, it will only increase the probability of a cancer diagnosis to approx. 8%. With a 12% success rate for first time startups, we can potentially increase our estimate of a companies success rate by quite a bit by taking into account other information about the company. Two good diagnostic tests applied in sequence could get us up to a 80% probability estimate of startup success and increase the likelihood that you will make a good angel investment decision.

Research shows that even doctors are not very good at taking base rates (i.e., priors) into account and put too much emphasis upon the test accuracy to arrive at conditional probability estimates for a diagnosis. Their estimates can be improved considerably if instead of being given information in a probability format (0.12 probability of first-time startup success), the information is presented in a frequency format (120 out of 1000 first-time startups are successful). Sticking with numbers as frequency counts allows us to mentally compute more accurate conditional probabilities.

In this blog post, I'll be going over the concept of Conditional Probability (i.e., P(H|E). I'll be reusing some of my earlier writings on
bayesian inference using a medical example and substituting in an angel investing example. The concept of conditional probability is central to Bayesian inference. A bayesian angel investor is always computing the probability of some hypothesis given some pattern of evidence P(H|E). There are many mathematical techniques you can use to compute a conditional probability P(H|E), but the simplest way involves set enumeration and it is what clergyman Thomas Bayes had in mind when he proposed his new method of inference. So hopefully you will learn one important method for computing a conditional probability from reading this blog post.

Imagine that H refers to "Company is Successful" and E refers to "Quality Business Plan". P(H | E) would then read as the "probability that a company is successful (H) given that they have a quality business plan (E)." If H tends to occur when E occurs, then knowing that E has occurred allows you to assign a higher probability to H's occurrence than in a situation in which you did not know that E occurred.

More generally, if H and E systematically co-vary in some way, then P(H | E) will not be equal to P(H). Conversely, if H and E are independent events, then P(H | E) would be expected to equal P(H).

The need to compute a conditional probability thus arises any time you think the occurence of some event has a bearing on the probability of another event's occurring.

The most basic and intuitive method for computing P(H | E) is the set enumeration method. Using this method, P(H | E) can be computed by counting the number of times H and E occur together {H & E} and dividing by the number of times E occurs {E}:

P(H | E) = {H & E} / {E}

If you gave your ok to 12 business plans to date, and observed that 10 of those companies were successful, then P(H | E) would be estimated at 10/12 or 0.833. In other words, the probability of a company being successful given that they have a quality business plan can be estimated at 83 percent by using a method that involves enumerating the relative frequencies of H and E events from the data gathered to date.

Computing a conditional probability becomes a form of inference when we take into account that the prior probability P(H) that a startup would be successful was probably lower than 83 percent. So conditionalizing our hypothesis (company will succeed) on other information (business plan quality) helped to increase our estimate of the probability that a startup would be successful. We can make decisions to proceed further based upon this improved knowledge.

You can compute a conditional probability using the set enumeration method with the PHP code below.

In my last blog introducing a classification framework for Bayesian Angel Investing, I discussed a php-based software class called ClassifierDiagnostics.php. I showed how you enter bivariate data points into it and the type of output it displays. I didn't go into much detail on what the output is telling us. Today I will go into some more detail on what the output is telling us and start to give some indication as to why it is important if you want to be a successful Bayesian Angel Investor.

One way to formulate the problem of Bayesian Angel Investing is as a classification problem where an Investor is trying to asign a probability to whether a startup belongs to the class of "Successful" (S) companies or "Unsuccessful" (U) companies. One way to do this would be to just rely upon the prior odds of a startup being successful or not. You would not conditionalize the probability assignments (e.g., P(S) = θ1, P(U) = θ2) on information about the start up (e.g., P(S|I) = θ3), just the fact that
they are a startup and the historical probabilities that a startup will be unsuccessful or successful. This is more difficult than it sounds because the success of a startup is already conditionalized insofar as we have to delimit the scope of the concept "startup" in some way in order to measure the probabilities of success or not. So let us say
we will look at startups confined to some region near the Investor's place of residence - the state or province level statistics on startup success.

Can you use the startup success statistics, your "priors", to make successful investment decisions? My guess is that the rate of success for startups in your region is below
50% so if the probability of any given startup being successful is below 50% it is unlikely you will ever invest. You would have to invest randomly according to a "priors only"
strategy (i.e., P(S) = θ1 AND P(U) = θ2) and that would produce losses.

To get more levarage on making good angel investments, you will need to incorporate information about the startup in your classification decision regarding the likely
success or not of the startup. You will want to identify types of information that have good diagnostic value in classifying startups into bins labelled Successful (S) and
Unsuccessful (U). In the example I provided yesterday I suggested that you could use your evaluation of their business plan as a good indicator of whether the startup
might succeed or not. If the business plan addresses enough of your checklist of concerns, then you will assign the business plan a "Pass" value (1), otherwise you assign the business plan a "Fail" value (0). The question then becomes whether our pass/fail assignments can be used to successfully distinguish between successful and unsuccessful startups. In other words, how diagnostic is a good business plan of being a successful startup?

In my last blog, I entered observations of 4 startups into my classifier diagnostics program. Each observation consisted of two values, a value specifying whether the business plan passed (1) or
failed (0), and a value specifying whether the startup eventually succeeded (1) or failed (0) in their enterprise. When I entered the data into my classifier diagnostics
program it generated the output below. I have removed some of the statistics being reported because I want to focus on the foundational concepts in diagnostic problem solving.

Successful Company

Yes

No

Business Plan

Pass

2(TP)

0(FP)

Fail

1(FN)

1(TN)

Successful Company

Yes

No

Business Plan

Pass

0.67(TP)

0.00(FP)

Fail

0.33(FN)

1.00(TN)

Test Sensitivity (TP)

0.67

False Alarm Rate (FP)

0.00

Miss Rate (FN)

0.33

Test Specificity (TN)

1.00

One critical observation to make about this data is that business plan quality is not a perfect test for classifying startups as successful or unsuccessful. The most grievous error is the case where a startup had a failing business plan but ended up being successful (example of a "miss" or false negative). The test "missed" the correct classification. Because we have
such a low sample size, 4 startups, this one error throws our percentages around quite a bit.

What we are looking for in a good test of statup success is one that has high Test Sensivity and high Test Specificity. Test Sensitivity measures the proportion of actual positives which are correctly identified as such. Test specificity measures the proportion of actual negatives which are correctly identified as such. In real life, test sensitivity and specificity
are seldom 1, so we have to figure out how we will cope with false alarms (negative instances identified as positive instances) and misses (positives instances identified as negative instances). Rise averse angel investors will likely be more worried about false alarms than misses because in the case of a false alarm you could invest in an unsuccessful
company and lose money whereas in the case of a miss you will not have invested in a successful company but will have at least retained your money.

One way to proceed towards becoming a Bayesian Angel Investor is to do some diagnostic work and figure out what types of tests are the best to use in order to classify startups into those who will succeed or not. When evaluating tests to use, you should examine the diagnostic accuracy of your tests using some of the metrics provided above (Test Sensitivity, False Alarm Rate, Miss Rate, Test Specificity). Bayesian Angel Investing likely taps into the same problem solving skills as a doctor who must diagnose whether a patient has cancer or not. They will order up a series of tests (often binary scored) and make a diagnosis, or, if matters are still unclear, order up more tests (e.g., scans, probes, incisions, etc...) so that they can achieve more confidence in their decision making.

In 2004 I wrote 3 articles for IBM developerWorks on Bayesian inference and developed php-based code to explore the topic with. I'd like to follow up on some of that work by exploring how Bayesian inference might be applied to Angel Investing.

It is hard to pick a starting point for this investigation. I thought the best way to begin would be to give a quick demo of how to use a ClassifierDiagnostics.php class I developed to analyze the relationship between two binary-valued variables (a "test" variable and a "classification" variable). Doing so will introduce you to many concepts, calculations, and stats you should be familiar with if you want to apply Bayesian inference to Angel Investing.

The two variables we will be analyzing in the demo code below are "Business Plan Quality" test variable and a "Successful Company" classification variable. The data we will be inputting to our software for analysis will consist of a binary rating of Business Plan Quality (0=Fail, 1=Pass) and a binary rating for the Successful Company variable (0=Not Successful, 1=Successful). Each of the four $data records below corresponds to an observation conducted on one startup company. In this case, the observation of Business Plan Quality for a startup company and the eventual success or failure of that startup company. One question to investigate is whether the Business Plan Quality measurement should be used as a "test" for diagnosing whether a startup company will be successful or not.

Without further ado, here is the source code for the business_plan_and_success.php demo script which invokes input, analysis, and output functions supplied by the ClassifierDiagnostics.php class.

Below is the output generated by the running the demo script. The first set of tables below are the joint frequency and joint probability tables. Underneath these tables is displayed various diagnostic stats that can be used to assess the quality of your "test" variable (i.e., Business Plan Quality) in classifying a startup as being sucessful or not.

Successful Company

Yes

No

Business Plan

Pass

2(TP)

0(FP)

Fail

1(FN)

1(TN)

Successful Company

Yes

No

Business Plan

Pass

0.67(TP)

0.00(FP)

Fail

0.33(FN)

1.00(TN)

Test Sensitivity (TP)

0.67

False Alarm Rate (FP)

0.00

Miss Rate (FN)

0.33

Test Specificity (TN)

1.00

Base Rate

0.75

P(+Test)

0.50

P(-Test)

0.50

P(+Class | +Test)

1.00

P(-Class | +Test)

0.00

P(+Class | -Test)

0.50

P(-Class | -Test)

0.50

Likelihood Ratio(+Test)

0.00

Likelihood Ratio(-Test)

0.33

Accuracy

0.75

Gain

1.33

I'll return to discussing some of the stats being reported here in a later blog. For now, I'd like to complete the technical part of the demo by showing you the source code for the ClassifierDiagnostics.php object. If you put the ClassifierDiagnostics.php object in the same php-enabled folder the as business_plan_and_success.php demo script, then point your browser at the demo script, you will see the output above.

/* * If a two column data matrix is supplied to the class, it will * proceed to compute various accuracy metrics from this data. * Otherwise, use the loadJointFrequency method to bypass having * to feed in raw data. */function ClassifierDiagnostics($data="empty") { if ($data != "empty") {

In Bayesian Angel Investing, you calculate the prior and posterior probability of an investment outcome to arrive a good decisions regarding those investments.

Let us see how it might work in the context of making a decision to invest in a startup company.

When an investor encounters an opportunity to invest in a startup company their goal is likely not to make an investment decision right away, but rather a decision on whether it is worth allocating time to pursue the opportunity further.

So, if a proposal meets the investor's checklist of positive attributes:

+ good management
+ good idea
+ good business plan
+ good deal

This might get the Bayesian Investor sufficiently motivated to start calculating the prior probability that the startup company might be worth investing in.

So if you assign a prior probability of 60% that the company might be worth investing in, you will need more information to move the probability upwards in order to finalize any deal.

You will want to meet via email, phone, and possibly in person to further discuss the proposal.

A Bayesian Investor can move towards a final decision by setting a decision making threshold of, say, 80% on the prior probability estimate (e.g., that the company will be successful S or not ~S). If the prior probability estimate of the startup being successful reaches or exceeds 80%, then invest in the company. If further information causes the prior probability to go below 50%, then don't invest. Prior estimates beget posterior estimates which become the priors in the next round of due diligence.

The way a Bayesian Investor moves towards making an investment decision is by gathering more information about the company. The information that is gathered should be diagnostic of whether the company is likely to succeed. Similar to the way a medical doctor orders test to either confirm or dis-confirm an hypothesis related to the prior hypothesis (e.g., diagnostic possibilities - has cancer, does not have cancer).

We will try to formalize Bayesian investing more in a later blog post using this formula, p(H|E) = p(H∩E) / p(E), as our starting point (where H stands for Hypothesis and E for Evidence).