Learn more about using open source R for big data analysis, predictive modeling, data science and more from the staff of Revolution Analytics.

January 15, 2014

In data scientist survey, R is the most-used tool (other than databases)

O'Reilly has just published the results of the Data Scientist Salary Survey, based on data collected from attendees of the O'Reilly Strata conferences in 2012 and 2013. There were some interesting results from the salary portion of the survey:

On that last point, the tool usage section of the survey also held interesting results. Each respondent listed multiple tools that they used both in data roles and non-data roles, and the results are summarized below:

That SQL tops the list is no surprise: most data scientists need to access a database at some point. But of non-database tools, R is the most-used tool, closely followed by Python. From the survey report:

The preponderance of R and Python usage is more surprising —operating systems aside, these were the two most commonly used individual tools, even above Excel, which for years has been the go-to option for spreadsheets and surface-level analysis. R and Python are likely popular because they are easily accessible and effective open source tools for analysis.

It's also interesting to note that the "traditional" proprietary data analysis tools, SAS and SPSS, fall at the bottom of the list. This isn't a random sample by any means — the attendees at Strata are heavily weighted towards US-based startups — but it's certainly indicative of where the market for data analysis products is going. R is also the top-ranked data analysis tool in recent surveys by KDNuggets and Rexer Analytics.

You can download the full report (free registration required) fom the the O'Reilly website at the link below.

Comments

You can follow this conversation by subscribing to the comment feed for this post.

You need to strip out the Revolution URL fragment from the O'Reilly link. And, so far as the bias goes, it's also a heavily OS slanted group, so the lack of SPSS/SAS/BMDP/etc. should come as no surprise.