Fast-track publishing using knitr: table mania (part IV)

Constructing tables is an art – maximizing readability and information can be challenging. The image is of the Turning Torso in Malmö and is CC by Alan Lam.

Fast-track publishing using knitr is a short series on how I use knitr to speedup publishing in my research. While illustrations (previous post) are optional, tables are not, and this fourth article is therefore devoted to tables. Tables through knitr is probably one of the most powerful fast-track publishing tools, in this article I will show (1) how to quickly generate a descriptive table, (2) how to convert your regression model into a table, and (3) worth knowing about table design and anatomy.

Data set preparation

To make this post more concrete I will use the melanoma data set in the boot package. Below I factor the variables:

Descriptive tables

Generating descriptive tables containing simple means, medians, ranges, frequencies, etc. should be fast and efficient. Decide on what you want in your columns and then structure your data into sections; I try to use the following structure:

After deciding on the variables I often use the getDescriptionStatsBy function from my Gmisc-package to get the statistics into columns. I’ve found that you almost always have more than one column, thereby comparing different groups. In an RCT you want to compare the treatment groups, in a case-control study you want to compare the cases to the controls, and in an observational survival study you usually want to compare those that survived with those that died (as in this example). If you are uncertain what groups to compare in your Table 1, then just compare those with complete data to those with missing data.

The getDescriptionStatsBy function has several settings that you may want to use:

P-values: While some despise the use of p-values in tables, I believe they can be useful in some cases and my function can therefore fetch fisher.test or wilcox.test p-values depending on the variable type by simply specifying statistics=TRUE.

Total-column: Adding a total-column may sometimes be useful, e.g. if you have by alive/dead it is of interest to quickly get a total-column, while if you present your data by RCT-group then a total-column makes little sense.

Percentages for categorical variables: depending on the setting you may want your percentages to sum up horizontally or vertically, e.g. in an alive/dead setting it makes sense to sum up the columns horizontally using hrzl_prop=TRUE while an RCT is better to sum up vertically where you want to show how many cemented, uncemented, mixed hip replacements were in each treatment arm.

As the getDescriptionStatsBy has plenty of options, I usually use a wrapper function like this:

There is of course a myriad of alternatives for generating descriptive data. My function is trying to resemble the format for Table 1 in major medical journals, such as NEJM and Lancet. You can easily tailor it to your needs, for instance if you want to use median instead of mean for continuous variables, you provide it a different continuous function:

# A function that takes the variable name,# applies it to the melanoma dataset# and then runs the results by the status variable
getT1Stat function(varname, digits=0){
getDescriptionStatsBy(melanoma[, varname],
melanoma$status,
add_total_col=TRUE,
show_all_values=TRUE,
hrzl_prop=TRUE,
statistics=FALSE,
html=TRUE,
digits=digits,
continuous_fn=describeMedian)}

Apart from my function I’ve recently discovered the power of the plyr-package that can help you generate most table/plot data. I strongly recommend having a closer look at the ddply function – it will save you valuable time.

After running the previous code I loop through the list to extract the variable matrix and the rgroup/n.rgroup variables that I then input to my htmlTable function:

Generating this beauty (the table is an image as the CSS for the site messes up the layout):

Regression tables

I recently did a post on my printCrudeAndAdjustedModel-function where I showed how to output your model into a table. My function allows you to get both unadjusted and adjusted estimates into a table, adds the references, and allows can automatically attach the descriptive statistics:

Now there are alternatives to my function. The texreg is an interesting package that is worth exploring and hopefully stargazer will eventually have an html/markdown option. A minor note concerning these later packages where outputs contain R2 and more; I have never seen models presented in medical literature in that way and if you need to adjust the output you loose the fast-track idea.

Table design and anatomy

Tables are generally good for comparing a few values, while plots are better when you want to show a trend consisting of multiple values. Although you should avoid using tables to show trends, you can still have large tables with lots of data. When presenting a lot of data, you need to think about the table navigation:

Order: always report variables in the same order, e.g. sex, age, ulceration… should be at a similar location in each table

Precision: avoid unnecessary decimals

Markup: use headers and spanners

The first one we have already touched upon. For the second one, I often rely on the sprintf function. While round may seem like a natural option you will often want to show all decimals that you find of interest. For instance:

There are standard tools that you can us to help your readers to navigate the tables. I use stubs and column spanners as much as I can. A stub is a row header at the same column level as the actual rows, the rows differ by a small indentation of two white-spaces. This is an efficient way of grouping variables without making the table wider, while at the same time adding some white space around the numbers that help navigating. Similarly to stubs you can have column spanners that group columns. In my htmlTable these are called rgroup and cgroup arguments. They need to have the n.rgroup/n.cgroup in order to let the function know how many rows/columns each group should contain, see below example:

An alternative to using stubs is using row headers. The difference is that headers are located in a separate column, thus making the table wider. A benefit is that you can have infinite levels row group headers. Below is an example with two header levels using the xtable function: