17 R Markdown

When conducting research your end product is usually a Word Document or a PDF which reports on the research you’ve done, often including several graphs or tables. In many cases people do the data work in R, producing the graphs or numbers for the table, and then write up the results in Word or LaTeX. While this is a good system, there are significant drawbacks, mainly that if you change the graph or table you need to change it in R and change it in the report. If you only do this rarely it isn’t much of a problem. However, doing so many times can increase both the amount of work and the likelihood of an error occurring from forgetting to change something or changing it incorrectly. We can avoid this issue by using R Markdown, R’s way of writing a document and incorporating R code within.

This chapter will only briefly introduce R Markdown, for a comprehensive guide please see this excellent book. For a cheatsheet on R Markdown see here.

What R Markdown does is let you type exactly as you would in Microsoft Word and insert the code to make the table or graph in the places you want it. If you change the code, the document will have the up-to-date result already, reducing your workload. There are some additional formatting you have to do when using R Markdown but it is minimal and is well-worth the return on the effort. This book, for example, was made entirely using R Markdown.

To open up a R Markdown file click File from the top menu, then New File, and then R Markdown…

From here it’ll open up a window where you select the title, author, and type of output. You can always change all three of these selections right in the R Markdown file after making your selection here. Selecting PDF may require you to download additional software to get it to output - some OS may already have the software installed. For a nice guide to using PDF with R Markdown, see here.

When you click OK, it will open a new R Markdown file that is already populated with example text and code. You can delete this entirely or modify it as needed.

When you output that file as a PDF it will look like the image below.

R converted the file into a PDF, executing the code and using the formatting specified. In an R Script a # means that the line is a comment. In an R Markdown file, the # signifies that the line is a section header. There are 6 possible headers, made by combining the # together - a # is the largest header while ###### is the smallest header. As with comments, they must be at the beginning of a line.

The word “Knit” was surrounded by two asterix * in the R Markdown file and became bold in the PDF because that is how R Markdown sets bolding - to make something italics using a single asterix like this. If you’re interested in more advanced formatting please see the book or cheatsheet linked earlier.

Other than the section headers, most of what you do in R Markdown is exactly the same as in Word. You can write text as you would normally and it will look exactly as you write it.

17.1 Code

The reason R Markdown is so useful is because you can include code output in the file. In R Markdown we write code in what is called a “code chunk”. These are simply areas in the document which R knows it should evaluate as R code. You can see three of them in the example - at lines 8-9 setting a default for the code, lines 18-20 to run the summary() function on the cars data (a data set built into R), and lines 26-28 (and cut off in the screenshot) to make a plot of the data set pressure (another data set built into R).

To make a chunk click Insert near the top right, then R.

It will then make an empty code chunk where your cursor is.

Notice the three ` at the top and bottom of the chunk. Don’t touch these! They tell R that anything in it is a code chunk (i.e. that R should run the code). Inside the squiggly brackets {} are instructions about how the code is outputted. Here you can specify, among other things if the code will be outputted or just the output itself, captions for tables or graphs, and formatting for the output. Include all of these options after the r in the squiggly brackets. Multiple options must be separated by a comma (just like options in normal R functions).

If you do not have the R Markdown file in the same folder as your data, you’ll need to set the working directory in a chunk before reading the data (you do so exactly like you would in an R Script). However, once a working directory is set, or the data is read in, it applies for all following chunks. You will also need to run any packages (using library()) to use them in a chunk. It is good form to set your working directory, load any data, and load any packages you need in the first chunk to make it easier to keep track of what you’re using.

17.1.1 Hiding code in the output

When you’re making a report for a general audience you generally only want to keep the output, not the code you used. At early stages in writing the report or when you’re collaborating with someone who wants to see you code, it is useful to include the code in the R Markdown output.

If you look at the second code chunk in the screenshot (lines 18-20) it includes the function summary(cars) as the code and the options {r cars} (the “cars” simply names the code chunk “cars” for if you want to reference the chunk - or it’s output if a table or graph - later, but does not change the code chunk’s behavior). In the output it shows both the code it used and the output of the code. This is because by default a code chunk shows both. To set it to only show the output, we need to set the parameter echo to FALSE inside of the {}.

In the third code chunk (lines 26-28), that parameter is set to false as it is {r pressure, echo=FALSE}. In the output it only shows the graph, not the code that was used.

17.2 Tables

There are a number of packages that make nice tables in R Markdown. We will use the knitr package for this example.

The easiest way to make a table in Markdown is to make a data.frame with all the data (and column names) you want and then show that data.frame (there are also packages that can make tables from regression output though that won’t be covered in this lesson). For this example we will subset the mtcars data (which is included in R) to just the first 5 rows and columns. The kable function from the knitr package will then make a nice looking table. With kable you can add the caption directly in the kable() function. The option echo in our code chunk is not set to FALSE here so you can see the code.

For another package to make very nice looking tables, see this guide to the kableExtra package.

17.3 Making the output file

To create the Word or PDF output click Knit and it will create the output in the format set in the very top. To change this format click the white down-arrow directly to the right of Knit and it will drop-down a menu with output options. Click the option you want and it will output it in that format and change that to the new default. Sometimes it takes a while for it to output, so be patient.