Question goes here

1,300 Followers

R vs. Python?

Both R and Python are popular languages used to perform data analysis tasks. From what I understand, Python is a great general-purpose language, and R's functionality is developed specifically with statisticians in mind. I've heard people argue both sides, but I wonder which is better for daily use?

It depends on what you mean by "daily use".. Here are a couple of scenarios:

1. If you are building a generalized web platform that has more user engagement use-cases outside of data and statistical dashboards, then Python is going to be more resourceful as it has full stack web frameworks that can assist with web development and provides a productive/superior eco-system than R for web dev.

2. If your daily chores and product require a lot of data analysis and predictive modeling based on large sets of data, I'm biased that R has a better usage and easier to attain your goals.

I would say it depends on what you are trying to do. I use both R and python+scikit-learn. If I am just doing statistical modeling or data mining I prefer to use R. If however I need the analysis to be part of a web app I prefer to use Python.
But the bottom line is I can probably achieve the same results from the analysis perspective using either one.
Ana

I did a phd in statistics. Everyone used R. I didn't know R (I was not a stats undergrad), and it seemed magical: everyone was using it to solve everything. So, I invested time learning it.

I was pretty disappointed. It really seemed like the result of a small community only knowing a single scripting language. You can do pretty much anything with pretty much any language. Why would you want to though? This isn't a case of best tool - it's just the only script tool for that community (or was at the time - I think it's changing, mercifully).

If you already know R and can accomplish a task with a R and you don't know python, I can't see a reason for you to not just use R to solve your problem.

If you already know python, then check out pandas and numpy/scipy. When I was in grad school, these tools didn't exist, and as a result, I would have told you then that it made more sense to use the packages already in R than code the specialized routines you needed in another language. Even so, R is just awful at manipulating data; I'd usually manipulate the data into the form I wanted outside R, then use read.table to read it in and pass it through the least amount of R code I needed to get the analysis done. I was hardly alone: in fact, many of my fellow grad students just wrote everything in C++ for their dissertation, using R just as a way to easily bang out graphs when needed.

Now that these python-based tools and libraries exist, however, I see no reason for a python programmer to not turn to them first, regardless of what you may hear about R.

If you do not know either R or python, please just learn python with pandas; this is the future. There is nothing inherent to the R language that makes it superior - it just has a lot of packages already written for it. However, that advantage decreases every day as more people contribute to pandas and numpy. I love stats - but the ideas behind statistical analysis aren't "owned" by a programming language. Python didn't really exist when S was created (the precursor to R). S+ and then R had real advantages over other script-based languages for a long time. It's just no longer the case.

Python can realistically be used for 20 other things, unlike R, and the reality of analysis is usually that more than 50% of the work is getting the data into a usable form. R just fails at this. As a result, I used a lot of awk and sed; but python will get things done too. I only turned to awk and sed because R was so terrible at manipulating real-world raw data. R does a fine job at analysis once you have things in table form, but it doesn't do a better job at it than python if the routine exists in both languages (and, unless you're doing something pretty obscure at this point, it likely does).

I really don't see a trade-off on this one. Unless you already know R for some reason, I believe the answer to your question is python, full stop.

Actually, I find Go perfect for working with and pushing around big data on the web (I think it has specific benefits with regard to networking and parallel processing, but there are many additional benefits as well)...But if you are choosing only between R and Python for big data it honestly depends. Python is likely going to have a much larger community and ecosystem for packages that you may be able to leverage.

That really means a lot for a business. R is great, but if it's too obscure and everything must be done from scratch or you have a hard time hiring programmers then is it really worth it? To be frank, it's more for math or academics and less for building a business.

That's another reason why I reach for Go as well - it's gaining a lot of traction and there are a lot of packages, but most important of all...It's fast to build things with. It is wonderful for building an application for business and fast.

I second Benjamin's opinion. scripting in a general purpose language which has libraries like pandas in it, is nearly always a better experience than working is a special built langauge that after the fact was extended to be a general purpose language.

Just one example to illustrate the point. In R, certain operations on a DataFrame object will result in other lower dimensional objects, and sometimes not. I think the rules originated when the operators were specialized statistical steps. Since then R is extended to handle all the things general purpose languages do, but not in a simplest, cleanest way. In Python the entire structure was created clean, then the Panda DataFrame was added, but it does not 'pollute' operations (like textual manipulation of data in a file).

Hasan, noted that Python graphing is primitives compared to R. I do agree on this point.

I generally write up a small python function that dumps the R statements into a file in /tmp and then invoke R on that function. (Once this is done, that graphing tool is available directly within python.)

Hasan also noted other statistical functions that R has that python does not. Certainly true, but if you listed the algs in scipy and scikit-learn I am positive there would be many not found in R.

My only disclaimer I am not a hard core stats guy. I am doing ML, and lots of data preprocessing.

So I cannot assess the completeness of the Python environment from the perspective of a stats guy.

I got degrees in Statistics as well as Computer Science. I love and use R for exploration and once I have played with the data and figured out what model would generalize best, I use python to create a production version algorithm that scales.

If you do not want to learn python you may be able to go very far using Revolution Analytics support. However, I just prefer rewriting in python as it allows me to be more in control of the various optimizations at scale.

Why do I need to sign in?

Popular Topics

Just a few more details please.

DO: Start a discussion, share a resource, or ask a question related to entrepreneurship.DON'T: Post about prohibited topics such as recruiting, cofounder wanted, check out my productor feedback on the FD site (you can send this to us directly info@founderdating.com).
See the Community Code of Conduct for more details.

Title

Give your question or discussion topic a great title, make it catchy and succinct.

Details

Make sure what you're about to say is specific and relevant - you'll get better responses.