Categories

Meta

Category: Data Science

For Data Science and Beyond!

R or Python for Data Science

So you want to be a data scientist, but are stuck on the first polarizing decision of learning R or Python… I’m going to try to help you!

I’m also going to attempt to not make this post follow the typical “R is maybe the best, or Python, or neither”, as I find these kind of articles informative, but not that helpful. If you are wanting a break down of R and Python’s strengths, ease of learning, salaries, etc. you can find soo many with Google, I’ll even save you the work of typing it in – R or Python. But while their infographics are detailed and interesting, reviewing them does not help you make the decision of which to start learning. So that is what I’ll try to do here, in just two simple questions.

Disclaimer : I’m neither an R nor a Python expert, but I think my little bit of experience with both can set you on the right path

1. Do have a data set and a problem in mind?

Yes – R

No – Python

I feel this question may split developers and academics (generalizing a lot here). Academics typically have a thesis, which is a set problem they are wanting to solve, and are looking to a data science language to beef-up what they might have tried to do in Excel. In contrast, developers, may be wanting to find work with a company and be looking to a data science language to add valuable insight to business data. With these two needs in mind, I think the academic approaching a problem with a mathematical rigor could find R a great place to start, and the developer looking to hack together business data, could find Python great. Now obviously, I’m making some assumptions here, but if you can see yourself fitting into either camp, that would be my advice.

TL;DRChoose R – If you have a dataset (sensorlogs.csv, or a database). You’re going be able to get up and running very quickly and answering questions with R. Choose Python – If you need to scrap a website, hack together program outputs, you may need the flexibility of Python to get things together.

2. Do you have programming experience?

Yes – R

No – Python

This may seem a little reverse, but hear me out. The simple approach is to say if you have programming experience, you can learn Python quickly and be solving problems in no time; while learning R is an entirely new beast. However, I think that if you already know how to program, you can use that language to solve the parts of your problem which R would not be as strong at. For example, if you want to build an entire application and require some data analysis. You can build the application in your known language, then introduce R to crunch data as required. Now in contrast, if you have no programming experience, I think you should learn Python. The journey of learning will always open new and unforeseen doors. If a year from now you need to build a web service, you probably won’t get far with R there. Most people will not get a job where they are doing pure data science all the time. If you end up needing to develop something, you will probably need other tools. If you don’t have other tools, learn Python.

TL;DRChoose R – If you already have a working understanding of another programming language. Choose Python – If you have never programmed before, as you can do almost anything with it.

The Choice Is Yours

I tend to see R as a really fancy calculator, and I’m talking really fancy, fancier than even a Titanium TI-89. This isn’t meant to be an insult! On my desk I always have my calculator next to me (yes I know there is one on my computer), but my calculator is better because it is specialized. R could be the best choice for you if you have data, even dirty data. Even if you don’t have a clearly defined goal with that data, R has many great tools for exploring and visualizing that data.

Now Python on the other hand, I see as good kitchen knife (where do I come up with these metaphors??). Sure there are specialized knives, pairing, steak, bread, etc. But for tackling the biggest variety of jobs, that kitch… err Python is going to get the job done.

So, to repeat myself like a broken record (I do similes too). If you have data, or get to work with just data choose R, especially if you already know a different programming language to handle any other development needs. If you don’t know any other programming language, or need to create something to generate your data choose Python.

I know everyone won’t agree with me, and rightfully so. I also had to generalize a fair bit to try to draw a clear line between the two, but I hope I was able to help you. As always, if you have any further questions about my suggestions please reach out to me with any of the links below.