What are the specific utilities that can help R developers code and debug more efficiently?

I'm looking to set up an R development environment, and would like an overview of the tools that would be useful to me in crafting a unit testing infrastructure with code coverage, debugging, generation of package files and help files and maybe UML modeling.

Note: Please justify your answers with reasons and examples based on your experience with the tools you recommend. Don't just link.

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. If this question can be reworded to fit the rules in the help center, please edit the question.

@Brandon : RStudio is great, but as yet not enough for developing packages. It will get there for sure, I love what they did until now.
– Joris MeysAug 1 '11 at 11:28

11

before casting a vote to close this question again, please read meta.stackexchange.com/questions/100617 - thank you If you want to have a say in the discussion, take it to meta. One of the things borderline questions do is generate noise in the comments. I will not allow that. Discuss on meta, not here.
– Lasse Vågsæther KarlsenAug 2 '11 at 13:15

3

I've tried to edit this into something specific ("recommend all the tools I'll ever want" isn't a question anyone can answer). I've also closed / merged several other similar questions to try and create a set of useful r-tools posts here. Please be mindful of the Note when answering - if this devolves into a list of unjustified links and/or self-promotion, it will be closed and deleted without mercy - your goal should be to aid users who can use Google but need expert advice in making sense of the results, not simply a snapshot list of the ever-changing landscape of tools out there.
– Shog9♦Aug 2 '11 at 16:41

3 Answers
3

I have written way too many packages, so to keep things manageable I've invested a lot of time in infrastructure packages: packages that help me make my code more robust and help make it easier for others to use. These include:

roxygen2 (with Manuel Eugster and Peter Danenberg), which allows you to keep documentation next to the function it documents, which it makes it much more likely that I'll keep it up to date. roxygen2 also has a number of new features designed to minimise documentation duplication: templates (@template), parameter inheritance (@inheritParams), and function families (@family) to name a few.

testthat automates the testing of my code. This is becoming more and more important as I have less and less time to code: automated tests remember how the function should work, even when I don't.

devtools automates many common development tasks (as Andrie mentioned). The eventual goal for devtools is for it to act like R CMD check that runs continuously in the background and notifies you the instance that something goes wrong.

apropos: I'm always forgetting the names of useful functions, and apropos helps me find them, even if I only remember a fragment

Outside of R:

I use textmate to edit R (and other) files, but I don't think it's really that important. Pick one and learn all it's nooks and crannies.

Spend some time to learn the command line. Anything you can do to automate any part of your workflow will pay off in the long run. Running R from the command line leads to a natural process where each project has it's own instance of R; I often have 2-5 instances of R running at a time.

Use version control. I like git and github. Again, it doesn't matter exactly which system you use, but master it!

Interesting - especially the metadata standard for describing data frames. This is not that hard - at least an initial cut that takes care of the 80%. What usage scenarions are you thinking of ? One idea might be a metadata data frame (which has strict semantics) that can be attached to the data source. Thoughts ?
– Krishna SankarJul 28 '11 at 2:29

2

The command line bit can't be overstated. One can increase their productivity by an order of magnitude by have basic unix CLI knowledge.
– geoffjentryJul 28 '11 at 3:28

For meta data information for dataframes can we use comment(dataframe$column) to annotate each column with proper format?
– MarkJul 28 '11 at 22:58

2

Also: I flagged the question for more review after it was closed (a low/negative-utility community-adverse action, in my opinion). Can't this be made into a community wiki? Intention is positive: although flagging may look negative - my intent is not to affect the OP, but the closing.
– IteratorAug 1 '11 at 15:56

is cross-platform just like R so you have similar user-interface experiences on all relevant operating systems

is widely used, widely available and under active development for both code and extensions, see the emacswiki.org site for the latter

<tongueInCheek>is not Eclipse and does not require Java</tongueInCheek>

You can of course combine it with whichever CRAN packages you like: RUnit or testthat, the different profiling support packages, the debug package, ...

Additional tools that are useful:

R CMD check really is your friend as this is what CRAN uses to decide whether you are "in or out"; use it and trust it

the tests/ directory can offer a simplified version of unit tests by saving to-be-compared against output (from a prior R CMD check run), this is useful but proper unit tests are better

particularly for packages with object code, I prefer to launch fresh R sessions and littler makes that easy: r -lfoo -e'bar(1, "ab")' starts an R session, loads the foo package and evaluates the given expression (here a function bar() with two arguments). This, combined with R CMD INSTALL, provides a full test cycle.

Knowledge of, and ability to use, the basic R debugging tools is an essential first step in learning to quickly debug R code. If you know how to use the basic tools you can debug code anywhere without having to need all the extra tools provided in add-on packages.

So we can clearly see that the error happened in function bar(); we've narrowed down the scope of bug hunt. But what if the code generates warnings, not errors? That can be handled by turning warnings into errors via the warn option:

options(warn = 2)

will turn warnings into errors. You can then use traceback() to track them down.

Linked to this is getting R to recover from an error in the code so you can debug what went wrong. options(error = recover) will drop us into a debugger frame whenever an error is raised:

You see we can drop into each frame on the call stack and see how the functions were called, what the arguments are etc. In the above example, we see that bar() was passed a vector not a matrix, hence the error. options(error = NULL) resets this behaviour to normal.

Another key function is trace(), which allows you to insert debugging calls into an existing function. The benefit of this is that you can tell R to debug from a particular line in the source:

This allows you to insert the debugging calls at the right point in the code without having to step through the proceeding functions calls.

If you want to step through a function as it is executing, then debug(foo) will turn on the debugger for function foo(), whilst undebug(foo) will turn off the debugger.

A key point about these options is that I haven't needed to modify/edit any source code to insert debugging calls etc. I can try things out and see what the problem is directly from the session where there error has occurred.

For a different take on debugging in R, see Mark Bravington's debug package on CRAN