When we built the data science team at LinkedIn a few years ago, we looked for raw talent, assuming that smart people could pick up the needed technical skills on the job. Now that the field has matured, it’s a good idea to learn some of those technical skills in school. Anyone planning to work with big data ought to learn Hadoop and R, the two open-source tools most used by data scientists. It’s also a good idea to take courses in statistics in machine learning. Beyond that, find every opportunity to work with real data sets. Struggling with the warts of real data is a key part of a data scientist’s job — in fact, some would say that the struggle is our “day job.”

(Emphasis mine.) Any student thinking about working with Hadoop and R should check out the RHadoop project, a collection of R packages that make it easy to write map-reduce jobs for Hadoop data stores in the R langauge.