Rmd first: When development starts with documentation

Documentation matters ! Think about future you and others. Whatever is the aim of your script and analyses, you should think about documentation. The way I see it, R package structure is made for that. Let me try to convince you.

At use’R 2019 in Toulouse, I did a presentation entitled: ‘The “Rmd first” method: when projects start with documentation’ (Video on Youtube: https://youtu.be/cB1BCxFbhtk). I ended up saying: Think Package ! If you are really afraid about building a package, you may want to have a look at these slides before. If you are not so afraid, you can start directly with this blog post. In any case, Paco the package should make this more enjoyable ! I hope so…

This article is quite long. You can do a quick read if you skip the bullet points. Use the complete version while you are developing a real project.

Prevent code sickness

You never took back a R script of yours, 6 months later? Did you perfectly remember how worked every parts of the code? Functions and their parameters?

You never had to improve/debug the code of somebody else?

You were never asked to add functionnalities in a Shiny application originally written by a master trainee and then ponctually modified by the supervisor to “just” add one or two functionnalities?

If not, you are lucky. For now…

In any case, think about future you and think about future users or developpers of your work. Never think this is a one shot. One day or another, this one shot script will again be useful. It will be more useful if it is correctly documented.
Multiple reasons would make documentation necessary:

You share your scripts with your young padawan and you do not really have time to spend with him/her to explain each step. You prefer spending your time for more constructive discussions around the project, than on the use of a specific function.

A colleague has to run your analysis when you are on a sick leave. Sick leave is generally not something you planned.

You want to share your work with the World ! One day or another, you will be asked to share your scripts because it can interest people out of your team. Or maybe you think that this can be valuable to share your code with a R-community that gives you so much every day…

Don’t provoke (to much) sickness to people who will have to understand and use your code… Give them the remedy, give them a documentation.

If you read this post, you are good enough to create a package

You are not here by chance. You are here because one day, you wrote a line of code in the R language. Thus, trust me, you know enough to create a package.

When we tell our students/trainees, after their two first days, that the next step is to build a package, the first reactions are:

I just started, I am not developer who can create a package.
I do not want to share my work on the CRAN with the World, not even on Github.

Standardized structure of a package and its check are supposed to conduct to a re-usable code

There are different places in a package where to write some code, some text, some information, … Remembering the complete procedure of package development may be painful for beginners, even for advanced developers. Hence, some think they need to build a hundred packages before being able to get all of it. That’s not true.

The ‘Rmd first’ method may reduce the fear of package building by developing as if you were not in a package

Moreover, some useful packages are here to help you in all package development steps, among them {devtools}, {usethis}, {attachment}, {golem}, {desc}, …

Develop as if you were not in a package

In my presentation at use’R 2019 in Toulouse, I started developing my code in a “classical” R project. If you looked at this presentation, you know we will end up with a package. Here, we will directly start by building a package skeleton. Just a few files to think about before developing. You will see that everything is about documentation.
In this blog post, we will start to create a small data analysis in a project named my.analysis.package.

These steps looks like package development…

Your project needs to be presented. What is the objective of this work? Who are you? Are others allowed to use your code? We also keep a trace of the history of the package construction steps.

Create new “R package using devtools” if you are using Rstudio

Or use usethis::create_package("my.analysis.package").

A constraint is that your project name can only consist of letters, numbers and dots. It must start with a letter; and it cannot end with a dot.

Create a file named dev_history.R, at the root of the project

Be careful, if you work with Rstudio, it proposes to save this file in the “R” folder. Make sure that the file dev_history.R is stored at the root of the project

This file stores the documentation of your workflow. This makes the creation and development of your package reproducible

Do not forget to hide it from package build using usethis::use_build_ignore("dev_history.R")

Use it to store all your calls to usethis::, devtools::, attachment:: and co…

Fill in the project DESCRIPTION file

Give a title to your analysis

Write a description sentence. This sentence should finish with a point .. If you need to write the description on multiple lines, you need to indent the text with two extra spaces at the beginning of each new line.

Define the license

You may think about licensing all of your work. You never know who is going to read and re-use it in the future.

If you do not know what to choose for now, you can choose a proprietary license. Use License: file LICENSE in the DESCRIPTION file and then create a file called LICENSE (without extension) at the root of the project, containing for example:

Proprietary
Do not distribute outside of ThinkR.

Add the following lines in your dev_history.R script

Execute these lines each time you see: in this blog post. Including now.

Work with your data in a Rmd document almost like classical project development

The Rmd file is your sandbox before being the userguide of your work. Use it to try, explore, test your data and analyses. Once you are ready, follow the guide to clean up the content and store everything at the right place.

Create a small dataset

A small dataset will help in the documentation process

If you are building this package for a specific analysis for your work or for a client, you may think you want to you use your entire dataset. Directly manipulating a big dataset can be painful during code writing. You may face long computation time while debugging. Do not neglect the power of a small reproducible example ! It may not cover all future possible bugs, but this will accelerate development and help documenting the code.

As a start, you can store the dataset my-dataset.csv, in a folder named inst/example-data/. This will later allow to find this data using system.file("example-data/my-dataset.csv", package = "my.analysis.package").

There are different ways to store data inside a package. The proposed one is an easy way to start. When you are mature enough, you may have a look at chapter on External Data in “R Packages” book to decide on your way of storing the data.

“Rmd first”: Here starts the “Rmd first” method.

Not really really first, but almost first as we haven’t started code development yet. So, let us create a vignette. This is a classical RMarkdown file stored in a specific place and that will be shown in a specifically formatted HTML output.

usethis::use_vignette("aa-exploration")

If you choose a filename of your vignettes starting with two letters, this will be used to show them in the correct order in the documentation. Use the same in the VignetteIndexEntry field of the YAML of your vignette, but you can remove them in the title. I recommend using letters instead of numbers because of later use of {bookdown} with package {chameleon}.

It may now be necessary to install your package.

This will make your dataset available for the vignette.

You can use devtools::install(). Add it in your "dev_history.R" to execute when needed.

The “Rmd first” method looks like classical analysis

The workflow of an analysis often starts with data reading, cleansing, filtering, exploring, … These steps may be presented in a Rmarkdown file, so that you can create a report of your work and make it reproducible to others.
The “Rmd first” assumes that any project can start like data analysis. Whatever project you are doing, you will need some data, some reproducible examples to build it. Below is a tiny example of a Rmd build.

Load necessary packages at the top of your vignette

If you need more packages along the development, always go on top to add them so that future users won’t be surprised in the middle of the analysis

“Rmd cleansing”: Make it a proper a package

Everything you need to make a proper package is already set. We will now clean the Rmd. This means we will move parts of the code of the Rmd to the appropriate directory of the package. In the vignette, we will only keep the explanations and the code to use the main functions.

To help you not forget anything, use attachment::att_to_description().

regularly

Publish your documentation

As you created a package, you can use {pkgdown} to present the entire documentation in the same HTML site. The ‘Rmd first’ forced you to create vignettes that can also be presented as a HTML (or PDF) book, delivered as shareable userguides or as your analysis report.

Use {pkgdown} to present your documentation.

This contains vignettes, the Readme (if created) as a presentation of your study, fonctions and executed examples of the functions with outputs.

You can present it to your clients as the vignette is the output of your analysis

You can use chameleon::build_pkgdown() to build and keep the pkgdown site inside your package, so that your clients can call it offline with my.analysis.package::open_pkgdown()

Transform your vignettes into a real userguide (or analysis report) using {bookdown} with chameleon::build_book(). You keep it inside the package so that your clients can call it offline with my.analysis.package::open_book(), or you can deliver the book itself.

This is the reason why I named my vignettes starting with two letters like aa-my-analysis.Rmd

Combine with a {testdown} report to show your clients that you worked properly with unit tests for each function.

Everything is a package

A package is not only for archives stored on the CRAN that must be shared with the entire world. A package is a directory containing specific files and folders, organised so that anyone can find documentation, code and tests in a precise place. Hence, every project can be a package, so that tools built around packages will allow/force you to work in a reproducible way with re-usable project. Think package:

Package of data analysis for a client using ‘Rmd first’ method allows documentation of function for future analyses and creates a userguide and/or analysis report with {pkgdown}/{bookdown} for the client

Shiny App with {golem} and Rmd first allows documentation of internal functions for future debugging and improvements, as well as presentation of intermediate steps through static reports.

Package is a package. While using ‘Rmd first’, you write documentation along with your code.

Ressources

If you already wrote your analysis in a Rmd file and if you are now convinced that it would be more reusable while inside a package, I recommend to read Emily Riederer blog post entitled “RMarkdown Driven Development (RmdDD)”. She presented the same kind of ‘Rmd first’ approach and details the steps to clean your Rmd to make it a package.