In a fully specified decision tree diagram I would also have labels along each horizontal branch, probabilities along each event branch, and consequence values at the terminal end of each event branch (see this previous blog for an example). I won't be going into such details in this blog because I want to focus on the first step in diagramming a decision tree; namely, drawing the skeletal structure of a decision tree.

If you want to diagram decisions it is useful to have a tool that makes it easy to do so. A piece a paper is a good option, but I wanted to use a piece of software to create decision tree diagrams exactly as they appear in the textbook mentioned above. The draw.io tool fits these requirements. In a previous blog I demonstrated a more complex approach to creating decision trees. In this blog I wanted to find a more accessible approach that anyone could use without installing a bunch of software.

Decision trees have lots of symmetrical branches and drawing them was the main challenge I had in making decision tree skeletons. Duplicating and dragging horizontal or vertical lines were the main actions I ended up using to create the symmetrical decision tree diagram above. The left column in the figure above shows the "General" and the "Misc" symbol libraries that I used to create the square, circle, and line shapes. By default they are available to use when you access the draw.io online application and you just need to click the library name to open it and access the diagramming shapes you want to use.

It would be possible to print off a decision tree template like this so you can write labels and numbers onto it. It would be a ready-made template for a common form of decision problem, namely, a decision with 2 possible choices and 2 possible event outcomes that will determine the expected consequences for each choice.

In this blog my goal was to suggest some software you can use to create decision tree diagrams (draw.io) and how you can go about creating nice symmetrical tree diagrams with it. There are probably more efficient techniques to use than what I'm suggesting as I just started playing around with draw.io for this purpose and the duplicate and drag approach was the first approach that worked ok. We can use decision tree diagrams to explore a wide range issues in decision making and in future blogs we'll explore some of these issues.

Here are some useful tips that will get you up to speed quickly using draw.io.

In my previous blog, I showed how to construct a nice decision tree for a decision
about how much nitrogen to apply to a crop. In this blog, I want to advance our thinking about decision trees in two ways:

Show how expected returns can be calculated using PHP.

Discuss the issue of how detailed we should get when constructing a decision tree.

Computing Expected Return

In my blog titled Computing Expected Values I referred you to a video tutorial
on how to calculate expected values. In this blog, I will implement that calculation in a PHP script. Implementing the calculation
programmatically allows us to see what types of data structures need to be defined and how they looped over in order to compute
expected returns. We need a data structure to represent our actions (i.e., a $nitrogen array), our
events (i.e., a $weather), our outcomes (i.e., a $payoffs matrix), and to store the expected returns
that are computed for each action option (i.e., an $EV array). With these basic elements in place we can
compute our expected values in a straightforward manner as illustrated in the code below:

Levels of Detail

The decision tree we have constructed to represent a nitrogen application decision is vague in many of its details and, as such,
would be difficult to use for the purposes of making an actual decision about whether to apply nitrogen or not.

Our biggest omission is to just talk about an "expected return" without talking specifically about whether this is expected
revenue, expected profit, or expected utility. If our payoffs are expected revenue amounts then our decision tree is not going to
be that useful because it hides the costs involved. For this reason, the expected profit would be a better value to compute as
our "payoffs" rather than expected revenues. Theoretically, an even better value to compute would be the expected utility
associated with each action option but that is a tricky value to compute because it depends on subjective factors and more complex
formulas. For this reason, we can be satisfied if we can at least compute expected profits for each decision option.

Another omission in our decision tree is any discussion of the costs associated with each proposed action. In order to compute
such costs we must get detailed about the when, where, and how of applying nitrogen. We also need to estimate the price
of nitrogen at the time of application. If we have already purchased our nitrogen then this would simplify our calculations.
Other costs include the cost of fuel to apply our nitrogen. We also need to be specific about what crop we are applying our
nitrogen to. In order to compute expected profits we would need to compute some other costs associated with planting,
cultivating, and harvesting the crop per acre so that these can be subtracted from the overall revenue generated to compute
our expected profits.

Our nitrogen application decision is impacted by weather which we have characterized as poor, average, or good. This is also
not very precise and would need to be specified in more detail. Weather could specifically mean rainfall amounts in the spring
phase of the year.

Once we get very specific about costs and what our variables specifically refer to, then our decision tree would provide better
guidance on how to act. The visual depiction of a decision as a decision tree helps to organize our research efforts but it
omits much of the research work that will have gone into making the decision tree useful and realistic.

I experimented with two Graphviz features that I thought might improve the appearance of my decision trees:

I wanted the connecting lines to be rectilinear rather than curvilinear. Unfortunately, I am not able to achieve this effect when I use labelled edges; rectilinear connections only works when labels are applied to nodes, when labels are applied to edges it may be more difficult to calculate line placement. Rectilinear connections might have looked better but I'll have to live with curvilinear connections because I prefer labelled edges for the decision trees I'm exploring right now.

I wanted to separately highlight actions, events, and outcome sections of the decision tree. I found a way to do this for the outcome nodes but have not found a way to apply further labelling or highlighting to labelled edges.

Todays blog shows how to use the "subgraph" feature of the dot language to highlite nodes that are related in some way. I used the subgraph feature to better highlite what the payoffs were for each separate action. The possible payoffs associated with each action option are distinguished by having a different color and bounding box for each set of payoffs.

Notice that the payoffs associated with each action are separately highlited and that I also add up the payoffs for each set of payoffs and report it as the expected value (EV) for that action. Here is where I get into calculating expected values manually because the dot language is not a general purpose programming language. For that I'll be using PHP in my next blog to compute expected values and supply them to the dot file that is generated.

Here is the dot file that was used to generate a decision tree that uses subgraphs to highlite sections of it.

To date I have not been fully satisfied with how my decision trees have appeared. They did not appear to use space efficiently and they were not as easy to read as I would have liked. Today we will make some improvements to a Graphviz recipe for constructing decision trees. These improvements will make for a more space efficient decision tree that is also easier to read.

The main improvements I have made are:

All of the labelling for actions and events appears on the edges instead of the nodes. In previous examples, most labelling was done at the nodes.

There is no labelling at all between actions and events, just a small connector shape.

The combination of these two improvements means that 1) it is easier to read the graph as all the edges are labelled and the flow is oriented in a left-to-right fashion, and 2) the layout is more space efficient as the nodes connecting actions to events takes up alot of space when they include labelling. Now, we have only small connector shapes with no labelling.

I'll illustrate the improved decision tree in the context of a decision about how much nitrogen to apply to a crop per acre that involves calculating the payoffs you might expect if you get Good, Average, or Poor weather during the growing season. This is what such a decision tree looks like with the improvements mentioned above:

There are a couple of other aspects of this decision tree that are also noteworthy. First, the labels for each possible action (e.g., Nitrogen application amounts) includes the cost per acre of applying that amount of Nitrogen. One aspect of constructing a decision tree is computing the cost for each course of action. The second aspect to note is that the terminal nodes on our decision tree are often payoffs that involve multiplying an estimate of revenue by the probability of some event that significantly affects the payoff (e.g., the quality of the weather during the growing season).

I created the visualization for the nitrogen application decision using Graphviz. To do so I piped the recipe below into the graphviz program "dot". The recipe illustrates how to add comments to your dot file to make it easy to follow your recipe for rendering a graph shape. The recipe is also organized into logical sections to also make it easier to read.

Figuring out how to render decision trees with labelled intermediate edges instead of labelled intermediate nodes was a big step in creating a decision tree format that I find is more workable. I'm not done yet, however, as I want to explore some other features of graphviz to add some more tweaks to my decision tree.

A decision tree can become more complex in two basic ways. We can add more intermediate acts or we can add more intermediate events. In simple decision trees we have a binary set of Actions (apply 90 lb nitrogen, apply 110 lb nitrogen) leading to a binary set of Events (e.g., probability of low rainfall, probability of high rainfall) and each combination of Actions and Events lead to an Outcome. See my blog, Representing Decisions with Graphviz, for more details.

So one way we can add complexity to a decision tree, beyond just adding more than 2 branches for each action and event node, is to add intermediate actions and/or events to our decision tree. So, for example, our decision problem might involve the act of either applying 90lbs or 110lbs of Nitrogen per acre to our wheat crop. We might also have to choose between the actions of applying the Nitrogen at time X or at time Y. The combination of these actions can then lead into a season with either a low summer rainfall event or a high summer rainfall event. We can represent a fragment of this decision tree generically with the following diagram:

The diagram was constructed using Graphviz and the dot file I used to construct it looks like this:

digraph MultiStep {

rankdir=LR;

Decision -> Action_Step_1A; Decision -> Action_Step_1B;

Action_Step_1A -> Action_Step_2A; Action_Step_1A -> Action_Step_2B;

Action_Step_2A -> Event_1; Action_Step_2A -> Event_2;

Action_Step_2B -> Event_3; Action_Step_2B -> Event_4;

}

This is just a fragment of a multistep decision problem. As you can see, the number of terminal branches in this decision problem explodes as we add more intermediate action or event nodes. This does not prevent us from using decision trees to help us make better decisions, but it does give us advance warning that we should be very sure that it is necessary to introduce intermediate actions or events into our decision tree before we do so as they add considerable complexity to the decision tree. Decision trees are not meant to capture the minute details of a decision problem, just the high level actions and events that impact upon the decision. The choice of action and event nodes, just like the assignment of probabilities to event nodes, involves alot of subjective judgement. The process of formalizing it all into a decision tree, however, brings the whole exercise out of subjective reality into consensus reality where others can comment, disagree, or agree with the manner in which you have framed the decision problem.

So you have a simple decision tree leading from actions, to events, to outcomes. You have labelled the probability of your events, the costs and payoffs associated with actions and outcomes, and you are wondering how you can use all this information to pick a course of action. One answer is that you can compute the expected value associated with each outcome and make your decision based upon the course of action that yields the highest expected value (e.g., highest average profit).

Fortunately, I do not have to explain what an expected value is or how to compute it because there is an excellent tutorial available that explains
it all. So, sit back, and learn from MBA Bullshit how to use a decision tree to compute expected values and how you can use expected values to help you decide on a course of action.

I introduced the Graphviz program in my last blog. In today's blog I want to go a little deeper into the DOT language to show how you can achieve three useful effects using the DOT language. The three effects are:

Change the overall layout of the graph. Instead of starting our decision tree from the top, I would prefer to start it from the left side of the canvas and expand it towards the right side of the canvas (i.e., left-to-right reading order). I can do this by adding the command rankdir=LR; to my dot file.

Would be nice to show probability values on links going into event nodes. For example, the probability of high rain fall this season. We do this by adding a bracket next to link commands and specifying the value for the "label" attribute (e.g., Action -> HighRainFall[label="0.6"];).

If you are trying to highlight a path through a decision tree, then there are ways to highlight a path in graphviz. One way would be to thicken the line and add red coloration to each link in the path (e.g., Action -> LowRainFall[label="0.4",color=red,penwidth=3.0]; ).

If we put all these elements together in one dot program file, it would look like this:

If we load this dot file into the graphviz program "dot", it will generate this graph:

What we have here is a fragment of a graph. A fragment like this might appear in your decision tree leading from an action node to an event node. This is how we can get probabilities to appear on our graphical representations of a decision problem. Also, I like to orient the tree from left-to-right because if you have a large branchy tree it can more easily be printed off whereas top-to-bottom trees are hard to print off and involve alot of horizontal scrolling to view. Finally, when you make a decision to pursue a particular course of action, you can highlight that course of action graphically with a thick red pen effect.

One aspect of "precisely specifying" a decision is representing the overall decision making problem in the form of a graph. We can use a pad of paper to draw lines representing possible actions, which lead to events, which lead to outcomes. Or, we can use a tool like Graphviz to construct much prettier graphs and make us feel more professional.

GraphViz was developed by AT&T Research and is considered a top-tier tool for creating/visualizing graph structures.

In today's blog, I want to give you a basic idea of how Graphviz works so you can judge for yourself whether you want to invest time into learning it.

To generate a graph using Graphviz you need to write some commands into a "dot" file that ends with the extension ".dot". The term "dot" is also used to denote one of the main Graphviz programs used to generate graphs from dot files. Also, the term "DOT" is used to refer to the command langauge you enter into your dot files.

Without further ado, here is dot file called DecisionTree.dot that depicts a decision in terms of an action, event, outcome framework for managing risk.

digraph DecisionTree {

Decision -> Action1; Decision -> Action2;

Action1 -> Event1; Event1 -> Outcome1; Event1 -> Outcome2;

Action2 -> Event2; Event2 -> Outcome3; Event2 -> Outcome4;

}

If I run this command from my Linux command prompt:

dot -Tpng DecisionTree.dot > DecisionTree.png

The dot interpreter will read the file and generate the graph in png format. This is what that output looks like.

As you can see, it is not difficult to go from entering commands into a dot file and generating a decent looking graph. The DOT language has much more powerful features for drawing graphs than what I am showing you here; however, in the initial stages of sketching out the actions, events, and outcomes involved in your decision problem, you may want to keep things simple and just focus on drawing out all the nodes and the lines between them.

The graph below visualizes a nitrogren application decision. Do I apply 90 lbs per acre of Nitrogren or 110 lbs? The effect of each action on the crop I want to grow is jointly determined by the amount of rainfall I'm likely to recieve over the growing season. The action forks (i.e., application amounts) lead to the event forks (i.e., rainfall amounts) which lead to the terminal outcomes (i.e., expected number of bushels).