Posted
by
timothy
on Thursday February 11, 2016 @02:53PM
from the Is-it-true-that-R-is-jealous-of-U? dept.

Andy Nicholls has been an R programmer and consultant for Mango Solutions since 2011 (where he currently manages the R consultancy team), after a long stint as a statistician in the pharmaceutical industry. He has a serious background in mathematics, too, with a Masters in math and another in Statistics with Applications in Medicine. Andy has taught more than 50 on-site R training courses and has been involved in the development of more than 30 R packages; he's also a regular contributor to events at LondonR, the largest R user group in the UK. But since not everyone can get to London for a user group meeting, you can get some of the insights he's gained as an R expert in Sams Teach Yourself R In 24 Hours (available in print or at Safari), of which he is the lead author. Today, though, you can ask Andy about the much-lauded statistics-oriented free software (GPL) language directly -- Why to use it, how to get started, how to get things done, and where those intriguing release names come from. (The about page is helpful, too.) As usual, please ask as many questions as you'd like, but one question at a time, please.

Note: Slashdot is always looking for interesting interview guests. Who do you want to ask? Let us know!

How has the way you use R changed over time? For myself, I don't think I've gone through an entire R session in the past six months without loading dplyr. Combine that with the pipeline operator and I think if you'd shown the R code I wrote yesterday to me of two years ago, I wouldn't have believed it was the same language.

What's your take on the future of R? It used to be that it was a tool for statisticians, and now it's been discovered by programmers. As a statistician who's not a programmer, but who hangs out sometimes on slashdot and stackoverflow, it feels sometime like it's in danger of becoming just another language for programmers, instead of a tool for statisticians. Should I be worried? Can it be both? Is this mass inflow of programmers going to change it somehow? Or am I just having a "get off my lawn" moment?

As a statistician who's not a programmer, but who hangs out sometimes on slashdot and stackoverflow, it feels sometime like it's in danger of becoming just another language for programmers, instead of a tool for statisticians.

As a programmer who used to research programming languages, here's no danger of that at all.

It's not much of a stretch to say that no programmer really uses R. At most, programmers use the high-quality statistical libraries which only work with R. R is basically the best statistical packages every written bound together by one of the worst programming languages ever developed.

It's not much of a stretch to say that no programmer really uses R. At most, programmers use the high-quality statistical libraries which only work with R. R is basically the best statistical packages every written bound together by one of the worst programming languages ever developed.

I actually program exclusively in R and fine it OK once you learn the quirks. Where it excels is in sort of "jotting" down thoughts about programs. e.g. you can define a S3 class and then make one that only has a few of the properties, or claim your object is a class it is not. This would drive any Java programer bananas but it's super nice for going fast and loose.

Similarly, the fact that it can recover your call in addition to the arguments you passed makes several functions work much better when you have

I actually program exclusively in R and fine it OK once you learn the quirks.

I dunno -- there's an awful lot that's cumbersome about R and constantly does my head in. My pet bugbears:

No native hash/dictionary construct (there is the third-party hash library, but that's not great for portability).It's not possible to define functions at the end of your code, making code difficult to read (or requiring you to source a separate script that contains your functions, but again, portability suffers).Variable scoping is... odd (many people have written previously about R quirks in this re

While I am really only dipping my toe into R I decided to do some research on this question a while back.

I have used python for a number of scientific applications and was attempting to determine if I should use Rpy2 (http://rpy2.bitbucket.org/). It initially made sense to keep all of the data retrieval, formatting and analysis in a few python scripts. However, it seems that the design of the R language intrinsically accounts for the problem solving methodology: "R is designed to operate the way that proble

To add on: R is gaining massive traction in graduate programs but so many professors teach it like it's SPSS, almost as a cargo cult coding language, and so much of the documentation is written for people who are already experienced coders. Is there any decent introduction to R for someone that doesn't already know it (or another programming language) fluently?

There's an entire book, the R Inferno, dedicated to R's many "quirks" and problems. Is there ever a plan to dedicate some time to focusing on cleaning up the language and making it less painful to use?

In my experience (from searching for R advice online - I've never mailed the R discussion list myself) the R community is incredibly harsh and unforgiving of new users. Answers to beginners' questions are normally brusque - often extremely so. (I remember one exchange, where a user basically asked "I've read the documentation for par, and I don't understand...", and the response was, in its entirety, "?par" -- which, for those unfamiliar with R, is the command to bring up the documentation for par.)

As a statistician: someone not trained in statistics using statistical methods when they don't understand the concepts in that mathematically dense paper from 1963 is a dangerous thing. If you want me to be your statistics consultant, pay me my consulting rate. I don't generally costly for free, on the r-help mailing list or elsewhere.

If you don't understand that 1963 paper, you need a statistics consultant. Don't expect someone to do your statistical work for free.

I encountered R via Johns Hopkins University's data science series of Coursera courses which I highly recommend. The first one is at https://www.coursera.org/learn... [coursera.org]

As a mainly Python programer, but someone with an eclectic interest in programing languages (I enjoy Prolog, Lisp, ML...), I've found R very intriguing: it's a very "functional" programing language, but also object oriented (using dollar signs instead of the customary dots). I've also found R to be incredibly quick -- provided you know and use

R has been around longer than Java, and is based on S which is older than C++. There's a huge body of existing code and libraries to leverage. But from what I gather, the real reason to use R is because the only other option you're being offered is SAS, and you don't want to deal with that mess! Or so I hear.

Bottom line, if you're not being threatened with SAS, there may be little reason to learn R. But if you are, or if you think there's any danger you might be, R is probably something you want to learn AS

I feel that one of the weakest points of R is the error handling, reporting, and debugging available. Do you have advice on tools or techniques for people coding in R (aside from using RStudio? Are there plans for improvements in this area? The current facilities are reminiscent, at least to me, of using gdb back in the 1990s.

I have in mind cases like the following, in which a confusion about list access using the [ operator (when the [[ should have been used) provides a cryptic error message with no traceback available.

I've complimented your work in the past, as a matter of fact. I'm sorry if you did not see it. It's not the application that's the problem, but the personality attached to it. If you read my posts on other topics, not related to you, you'll find that I'm quite often the reasonable one in the room. That leads me to wonder if that's really any different here.

Hopefully it wasn't me who drove you to that drink. I'm more of a gin man, myself, but I do enjoy a good rum; might I ask what you're poring tonight?

And this is why I was trying to point out that my initial comment, months ago, was indeed a joke and not a directed attack. Ya gotta admit, ya jumped in pretty heavy at the onset.

We good?

Cruzan is good stuff, definitely one of my choice rums when I go that route. Getting any "spendier" than that is just for show. My gin of preference is Citadelle; I bought a bottle of Tanqueray #10 at 4x the cost per ounce one night when I wanted to indulge and it's ended up being a show piece, certainly not best of bre

Right and I'd mod myself down, too.. mhm, so sure. I have one account, one single account, a fact which only Slashdot staff will be able to prove or disprove, much like your claim that I have multiple sockpuppet accounts. You're playing yourself for stupid.

When did I bring up statistics, other than pointing out that R is a statistical analysis language? Whichever AC said that, I can assure you it was not me, just as whoever modded both of us down was not me. I think your "disappointment" is misdirected; you're not family and clearly have no interest in being a friend, though, so your disappointment really doesn't mean much to me. Sorry about that.

Everything you are referring to was posted before we supposedly made amends and had already been replied to by you.

Your POST HISTORY SHOWS YOU CONSTANTLY COMING IN AFTER I HAVE BEEN IN POSTS TOO

I stood up for you in one post and directly replied to you in another, in this very topic. Aside from that, there was another thread a few days ago where we interacted, and I made one off-the-cuff remark about wishing you'd leave me alone (in a thread where that type of comment was actually quite relevant), which was also made during that little tiff.