Research, teaching, and mentorship in the sciences

Menu

What are best ways to learn R?

Standard

Over my year of sabbatical, I planned to become comfortably proficient with data manipulation and analysis with R. I’m getting there. (I was doing a lot more over sabbatical of course, but this was one of my main objectives.) I figure it’ll take at least a few more manuscripts to get comfortable. As I really should be cranking out a dissertation’s worth of stuff in the next year, I have plenty of opportunity to get better, and the rate limiting step for me is sorting out the code.

I’m going to briefly describe the resources I’ve used, and the challenges and constraints I’ve faced — but the main reason I’m writing this post is that I’m hoping the comments will be a clearinghouse of suggestions and perspectives. A lot of y’all did things differently than me, and some of you are literally pros at teaching this stuff to others. Please, leave a comment with your thoughts about how it’s best for all kinds of folks to learn R, because I imagine that’s where this post will be most valuable to others.

I think my circumstance is rather common, but not one I hear folks discussing much. I went to grad school when R was not much of a thing, finishing up in 1999. In the early 2000’s, I spent a bit of time to get ramped up, but didn’t get far, once I realized it involved delving into the documentation for S. As a PI in a primarily undergraduate institution, I’ve been running my lab steadily, running stats through the range of ways that worked just fine for me. Over time, more and more ecologists have started using R. In recent years, for many approaches, it’s switched from a tool of choice to a tool of no-other-choice.

For a lot of folks in my boat, we don’t have a community of savvy folks around us who can give us tips and troubleshoot off the hip. My department now has a couple professors who are good to great with R, who were recently hired, but there really isn’t anybody else I can readily consult in person. So this is pretty much a solitary endeavor. I was really interested in showing up for a week-long short course, I’ve heard how useful it is, but I just didn’t find that week in my schedule, even on sabbatical.

If am going to keep working on what I’ve been working on, now I’ve got to use R. It’s not a bad thing, it’s just, well, cumbersome to pick up this tool when I’ve already mastered a toolset that did what I needed. There are a lot of other ecologists from my generation that aren’t exactly in this situation — they may or may not have learned R — but most of their analyses and figures are run by members of their lab. I’ve talked to a lot of mid-career and senior folks who concede they should learn R, but for them it’s less of a pressing need because it doesn’t grind the work in their lab to a halt. After relying on collaborators for this kind of stuff interstitially, it long ago became clear this is no longer sustainable. The upshot is I spent my sabbatical learning stuff that folks now are typically learning in grad school, who may have gotten a taste in their undergrad work. And until my department gets on the same page and gives these skills to all of our students as a part of the curriculum (which won’t happen overnight by any means), then I’m to be the guy who teaches my lab members to do this, that is, if it’s something that we make happen. Because I won’t not be available enough to my students to help them troubleshoot that often, even if my help would be helpful at this point.

I’m coming at this from the perspective of being very comfortable with statistics. A lot of the resources out there for teaching R are teaching statistics as much as they are teaching how to code usefully in R. This totally makes sense for junior scientists who are learning the theory and the math while they are learning to code in R, but this also means a lot of the materials out there aren’t built with me in mind.

How did I go about starting out?

I started with a book. The second edition of Beckerman et al’s Getting Started With R. It did precisely what it said it would do — it got me started with R. I was stationed at a desk for a few straight days, mostly walking through the lessons bit by bit — occasionally dipping into my own datasets and looking other stuff up along the way. In my experience, if you know stats but don’t know R, then this book will get you to a point where you’re comfortable with the basics. The book points out a lot of obvious things in a very obvious way, and might seem to be slow in this regard, which I found to be a feature and not a bug. I totally breezed through it, and by the end, I was good to go with the basics. The book also pointed me toward some standard resources. When googling up how to do something in R, you often come upon useful information in stack exchange. Also, the R Studio cheatsheets are super helpful.

Having gotten a little familiar with base R, I’m a lot happier to deal with dplyr and the ‘hadleyverse’ as much as possible.

I’d also like to point out that I’m under no illusions that using R is no more than an incremental step to prevent oneself from being outdated, and this is a good thing to keep in mind regardless of your career stage. I imagine the standard might soon be to use Python or something else that isn’t even on the horizon. It might not be long before R is uniformly seen as the antiquated way of doing things. In a few decades, you might look back at your R code just like senior folks now chuckle about using punchcards and Fortran.

This site had a discussion about good ways to learn R four years ago. Since then, the tools and support for using R have evolved substantially. But maybe it still might be useful to see the comments from back then. So I’m hoping new comments on this post can help steer folks in useful directions.

By the way, this post is coming out just as the annual meeting of the Ecological Society of America is ramping up. Which means a lot of folks probably won’t see this on a prompt timescale. (And probably why I won’t be engaged in commenting/responding much either.) If you happen to be at ESA, please do say hi if you see me around! (And if you wake up early Thursday morning, do feel free to catch my ignite talk).

Wow – everyone must be busy at the meeting. My response is not really any different from what it was four years ago. Just do it. 0) start with the data camp Intro to R if you have never programmed, 1) read Andy Hector’s New Statistics with R because he does a good job explaining the output of lm. This book is very short and you can get through the whole thing in a couple of long days. But this is all you want to 2) dive into a real project, not a long book. Resist the urge to analyze these data using your comfort software, other than to check results. 3) comment heavily, 4) make your own FAQs of successful code snippets, organized your way. 5) Google is your best friend, 6) Follow https://www.r-bloggers.com daily, 7) follow up and read links to blogs on best practices, especially for reproducible research 8) repeat #1-7, including scanning back through Hector’s book.

Now with some familiarity, I would start to watch more advanced Code Academy or Coursera tutorials and scan other books that emphasize good practice. It’s a long, spiraling process.

The really strong argument for scripting is the ease and elegance of a reproducible research pipeline. I’d strongly encourage R studio and knitr and GitHub.

To follow up…someone that is very comfortable with analyzing data using some other system will be constantly be hearing in their mind “This is highly inefficient. Why am I doing this? I could be done by now is I were using xxx. My productivity is sinking faster than the Titanic”. There is no learning R in 10 days, especially if you are not coming in with programming skills. If you want to really make the switch, you just have to accept that you will be inefficient and your productivity will drop for many (many!) months.

I think the best way to learn R is to have a need for it. If you understand the basic premisses (building up a function, a loop, etc), it will come with time. I understand this does not come as a fulfilling solution as you will always feel nervy when a new problem hits in. But I would say that you will rise to the occasion eventually until you master R.

Anyway, while doing my Master’s I did a part of this specialization [see url below] and, few years back they had a chunk of theory and then a 30-40 minute practical lesson using R. Although I was also interested in the theoretical statistics, on the practical lesson they would provide you a script and run it with you, teaching you about the arguments you could use to modify a given test, teaching you how to interpret everything that came on the screen when you ran a GLM, ANOVA, what ever.

In terms of strategy, I took a gradual approach. At first I kept running my stats using the software I was familiar with but decided I would make all my graphs in R. This was good because it gave me practice coding while keeping things manageable and making a satisfying end-product (actually, the nice online manual ‘R for Data Science’ takes a similar approach). Then I moved to doing my simplest statistical analyses in R, things that were easy using base R and where the results were easily verifiable (basic ANOVA, etc). Then I finally moved into the realm of more advanced analyses and have been in the black hole ever since!