Friday, March 31, 2017

Generating APA style tables in R: Current challenges

This post reviews some aspects of generating formatted tables using R suitable for inclusion in a manuscript conforming to APA style. I review my current workflow that involves a large amount of manual formatting in Excel. I then discuss what it would take to automate more of these manual steps in R.

My current workflow for incorporating tables into a journal manuscript involves the following steps:

Create data.frame in R with core table data, row names are column names carry row and column headers. This usually includes some rounding of numbers to desired precision (in order to avoid Excel rounding errors)

Ideally, the actual rows or columns of data have been specified correctly in R, but occasionally, it is simpler to remove rows or columns at the Excel stage. For example, the R output might list fit statistics for 6 models, but it is later decided that only five are relevant. In particular, rearranging the order of rows should be done in R for increased reliability.

Add lines

Lines are placed on top and bottom line of column header and bottom line of last row

Decked column headings and table spanners require additional lines

Format numbers

Common tasks include adjusting number of decimal places, removing leading zeros (e.g., correlations, multiple r, p-values), putting parentheses around certain numbers, putting two numbers together in some way (e.g., ranges, confidence intervals, often have a separator like a comma or hyphen and may be surrounded by brackets).

Add line breaks in cells

Some cells have two or more bits of information that should be presented on distinct rows. column names will include sample size on second row (e.g., "Treatment {line-break} (n = 132)" ). E.g., value is presented in first line and confidence intervals in second line. In this case, it is also possible to insert an additional row into the table and include these values in separate cells.

Some text is too long and needs to be split across multiple rows. This is usually done automatically. However, often this should include an indent on the second or subsequent row.

Adjust column widths

This is often a manual process in order to get the table to fit on the page and avoid cell wrapping.

Decked headings: Special requirements

Decked headings occur where two or more column headings are grouped under a column spanner (e.g., M and SD is shown for two groups where the group name is the spanner).

Merge cells of column spanner (i.e., the heading that groups the two columns)

Insert line below the cells of the column spanner

Insert a small empty column between column spanner and other columns (this ensures that there is a gap between the line underneath the column spanners and makes it easier to see the intended grouping)

Table spanners: Special requirements

A table spanner is a centred heading that represents a major subdivision of a table.

It involves inserting a new row with merged cells and centred text and adding a line to the bottom of the table division.

Table caption, title, and notes: Special requirements

In general, I specify these things in the manuscript. Mostly this works well. There is just the occasional bit of information that might be data driven. E.g., correlations above a certain value might be flagged as significant and this information might be included in the table note.

Reflections on manual formatting

Table formatting is complex. There is a visual quality to formatting tables. While some tables are approximated by a matrix with row and column headers, there are a huge number of common and not so common additional requirements. I often identify refinements to table formatting in an iterative fashion until it looks right.

While I attempted to document all the tasks that I do, I would not be surprised if there were additional tasks that did not come to mind. And presumably the common requirements of APA style tables in psychology are not the same as those relevant to other style guides and other disciplines.

It is possible to automate all of the above steps using R and output a table in a suitable format such as rtf, docx, or possibly HTML. However, at this point, this would require a lot of coding for each table.

There are a few packages of relevance:

apaTables provides APA tables exported to RTF for a few very specific scenarios. And the author also adopts specific preferences, which while well reasoned, are not always what you want.

apaStyle is similar to apaTables in that it exports to Word format, although it seems a little more flexible. It has a generic table function that can handle decked headings, but it still seems a long way from the flexibility required to produce most tables.

xtable is one of the best packages for table production but it exports principally to HTML and LaTeX. It also doesn't really seem designed for capturing all the complexities of APA style tables.

The challenge is to design a flexible and efficient system that is also reliable (in that it limits the introduction of errors). I think a nice challenge for anyone willing to take this on would be to develop simple set of functions in R that can be applied to generate tables in Word or RTF format that could be applied to produce the 16 tables in the APA 6th edition style manual (ideally from hypothetical data to include the additional challenges of extracting and formatting the numbers, converting variable names, etc.). These tables include a range of the common requirements of APA style that are not well supported in existing packages.

**Update:**

After posting, I learnt about the papaja package. It seems specifically designed for writing APA style documents with R Markdown. The apa_table function seems like its designed to capture many of the quirks of APA style, but at present its more advanced table-formatting features are limited to exporting LaTeX (i.e., Rmarkdown to LaTeX to PDF). A fully reproducible workflow has a lot to love, but at present I still find that collaboration and other features makes Word my go-to option for manuscript preparation.

huxtable (mentioned in the comments) has quite a lot of formatting flexibility. It exports to HTML and LaTeX format. See this vignette. It also supports a row and column spans, albeit row spans are handled as separate columns whereas APA style uses indenting. I'm also not clear on how you would go from HTML to Word. My general impression is that HTML is less prescriptive by design.

2 comments:

Some more modern packages you might want to look at: `formattable`, `pixiedust` and `huxtable` (my own). All of them are for formatting tables with in R and can export LaTeX and HTML, `huxtable` can do Word via the `ReporteRs` package, which is itself worth looking at. There's a comparison chart at https://hughjonesd.github.io/huxtable/design-principles.html

I am a lecturer at Deakin University bridging I/O psychology and statistics. My blog contains 100+ posts focused on data analysis in the social sciences.
If you're new, check out the
Site Map.
If you love R, check out the
40+ posts on R. If you want to follow the blog, see the RSS and email subscription options.

Disclaimer

This page, its contents and style, are the responsibility of the author and do not necessarily represent the views, policies or opinions of any current, present, or future employer.The information on this internet site is provided without any express or implied warranty as to its accuracy or currency. Any use of this information is at your own risk.