I recently wrote about blood transfusions and their inherent risk of postoperative infections. This post is a tutorial on some of the basics of drawing a directed acyclic graph (DAG). Blood transfusions and infections is a great topic as most are familiar with risk factors for infections.

A DAG contains three elements:

Directed arrows – each arrow has a causal direction

Acyclic – the arrows can not be arranged to form a circle

Graph – a visualization helps us understand the complex relations

The current post will not go into the details of DAGs, the aim is to show how to quickly generate a reasonable DAG. If you are interested in the specifics you can look at either Sander’s or Shrier and Platt’s excellent articles on the subject.

For this tutorial I will use Friedman et al’s article on blood transfusions as a risk for infections after arthroplasty surgery. Their method section suggests that they adjusted for the following confounders:

Geographic region (United States and Canada compared with other countries)

Duration of surgery

Throughout the post I will be using the amazing DAGitty-tool that is available for free at DAGitty.net. All plots have associated “Model text data” that you can copy->paste into tool in order to regenerate the images and to edit them at free will. After entering the above factors I ended up with something like this:

One can always argue that the relations are different but I think the graph is reasonable. As you can see, nodes should be connected to both the exposure (transfusions) and the outcome (infections). If they aren’t then it is uncertain if they actually belong in the model. Note that in the current graph:

Region may be associated with different protocols for transfusions but it is more difficult to see that the regions have a large variety in infections. An argument could be made that there is a connection but hospitals included in this study should be top-notch and are therefore unlikely to have a large variety in infections for the included patient type.

Hypertension is certainly a risk factor for cardiac disease but the connection to infections is most likely weak.

Renal failure should not affect the number of transfusions unless the hematocrit is low, although there is a link to infections.

Although the graph is simpler it lacks a an important detail – knowledge of why we transfuse patients. Looking at guidelines is always a good starting point. The UK blood and transfusion services have a neat handbook and some simple guidelines. Also, Joy and Bennet’s article on “The appropriateness of blood transfusion following primary total hip replacement” touches on the subject, giving us a fairly good idea of why we transfuse patients. Adding this knowledge to the graph generates the following:

It now is rather obvious that we should have adjusted for smoking, especially since respiratory disease is not adjusted for. This is a very strong risk factor for both cardiovascular and respiratory disease while at the same time a very strong risk factor for infections, see Hans Nåsell’s excellent thesis.

We can also start theorizing between the different confounder’s importance. In my opinion, the surgical injury is vastly more important than a few patients with blood disease. It would be more interesting adjusting for CRP, myoglobin or some other marker than blood disease. Frank Harrell puts it nicely:

Decide how many d.f. can be spent

Decide where to spend them

Spend them

Don’t reconsider, especially if inference needed

The d.f. stands for degrees of freedom and is at a minimum a single variable. Even in large studies we need to consider how many variables the study model can handle. The planning should not only be of what possible variables to include, but also their importance.

Furthermore, we can remove diabetes as transfusions are not directly dependent and the model already contains cardiovascular disease. While it may seem natural to include diabetes, it seems there is little support for it in the DAG.

Conversely, I would leave BMI in the model. BMI has a known connection with infections and due to its impacts on both cardiovascular status and the surgical trauma it makes sense to leave it in.

Sex and age are in the category “compulsory confounders” – while I haven’t been able to fit them nicely into the DAG, they have most likely an impact not covered in the DAG. For instance, the fear of non-diagnosed cardiovascular diseases for elderly may for instance drive transfusion rates while age simultaneously affects the immune response. Women have a lower risk for implant infections and are known to have a different hematocrit tolerance and could thus be subjected to a different treatment.

In summary

A DAG gives an interesting overview of the relationship between variables. It helps us to think of and visualize relations. It can also be a valuable aid in finding variables that you haven’t originally thought of or identifying variables that you don’t need.

Furthermore, don’t expect the perfect DAG, view it as a work-in-progress. It will also aid and clarify your discussions with your co-authors/reviewers. The ultimate goal is of course to limit the amount of late changes that often require a substantial amount of work.

Note: The above discussion is for transfusion yes/no. Friedman et al. also compared allogenic versus autologous transfusions and this can radically change the graph as the decision to provide allogenic vs. autologous transfusion is much more complicated.