I tend to have some level of disagreement with many of these authors. My disagreement can be summarized as follows:

Rather than defining data scientists by a bunch of skills that few employees possess (though many analytic executives possess all of them and more), it makes more sense to divide data scientists in multiple categories: data engineers, machine learning experts, modelers, business-oriented data scientists, researchers, domain experts, generalists etc each possessing a separate skillset. Google six categories of data scientists for details.

Also, you can train data scientists to have all the required skills. Colleges do a poor job at that, focusing instead on delivering silo-ed, outdated curricula, and being out of touch with the real world. Some modern 6-month training will teach the foundations for self-learners, that's the purpose of our free data science apprenticeship using a project-based approach (real-life projects), though there are other alternatives.

The 22 skills in question

Would you add or remove some to this great list created by Matt Reany? First, I'd categorize these skills. Then, I certainly would add business acumen, domain expertise, hacking skills, presentation and listening skills, good judgment, not trusting models, ability to work in a team or with clients, all sorts of databases and file management systems, some data engineering, some data architecture and dashboard design, data detection, real time analytics, data vendor expert (vendor selection, benchmarking), be the metric expert in your company (even decide which metrics to track, how to collect the data).

You need to be a member of Data Science Central to add comments!

Stone, I agree that you can learn many skills yourself. I learned Perl, SAS, C, C++, R, JavaScript, SQL, XML, time series, decision trees, naïve Bates, Monte-CARLO, and many more all by myself, focusing only on stuff useful for my projects. And eventually developed methodologies far more robust than what I have learned by myself or during my college and graduate years.

The funny thing with C and C++, I was in charge of the computer labs for math students during my PhD years, I was asked to make a decision on which language to use for teaching, and since none of us were modern language users in my math department (we were Fortran and Pascal people), I decided to go ahead with C, later with C++, and learn it by myself... to eventually teach it to the students. This was more than 25 years ago. I even wrote an image processing software in C during these years (for myself, but it was also used by my colleagues to process some of the digital satellite images we were working on). It was loading images in BMP format straight in memory, 20 times faster than Windows. I also used it to produce nice textures and simulate colors that can not be produced with a combination or red / green / blue, such as metallic colors like gold.

One of the intern we hired 6 months ago (she now work full time), submitted her CV with only 1 skills listed above (Monte-carlo simulation). She did a Physics' MSc at University College London and ended up doing a 3 month intern at CERN at the end of last year after she completed her study. She did analysis on data generated from the Higgs Boson experiment using monte carlo. We looked at her CV and I said to my CTO, bring her in for interview. Her CV would probably be put in the bin if she applied for job somewhere else because hers didn't show machine learning skills or statistic skills or majority of the skills from the above 22 skills, on her CV. In the interview she wasn't convincing, but I read her thesis (which she sent as well when she made her application) and then I decided that she's worth hiring. I recommend to my CTO that this young 23 year old physicist will add value to the company. My decision was purely based on what i read on her thesis. The mathematical models (from quantum electro dynamics) on her thesis were so complex that I could finish reading her thesis, which convinced me, that anyone with a brain that can understand quantum mechanics at very deep level, will easily learn skills outside their specialty.

When she started, she only done programming in both Mathematica & Matlab. I quickly introduced her to Java programming & Weka machine learning open source API. After 2 weeks she was able to write Java codes. Her I haven't shown her a single line of Java codes. All I did was point out to her some links on the net for her to start learning Java. I also pointed her to Java Colt (an open source Java Linear Algebra Package from CERN) to do experiment with matrix factorization (SVD) because I intended to put her to experiment with text-mining. I gave a text mining task to do. She used Lucene & Colt (SVD algorithm) to find semantic similarities between movie items. She came up with good results which we're now extending it further for our product development. She also used WEKA for temporal pattern detection and codes she wrote, is now in our production system. There are many topics that were new to her but now she's very proficients in them.

My role is to lead the data science team. When I (or other team members) find academic papers on Google Scholar that are relevant to us, then I print copies & distribute to each one, for a quick discussion in the afternoon about prototyping to see if its better performance than our current model (according to the paper's authors') or not. This physicist can read those academic papers (whether from machine learning, data-mining, signal processing, etc,...) and grasps the concepts with not much difficulty. She can code the algorithm in the paper, etc...

So, my whole point mentioning this is, we've decided that we hire based not primarily on skillsets listed on someone's CV, but if the candidate have the ability to learn very difficult topics. This physicist in my data science team is a star performer. She kept coming up with new ideas, either from a paper in machine learning she read about or her own original ideas.

I'm looking for my next data science interns, so hopefully I get one in two weeks time. I've already sent out the ad to local universities here in New Zealand.

It's obvious that there is a wide spectrum of Data Science skills. The more, the better - so, even the front-end scripting can be handy, but not mandatory. So, pragmatically, the Data Science skills can be perceived based on their nature, usage, complexity & the factors which differentiate Data Scientists from the rest. More appropriately, these skills can be categorized & ranked as 1. Hard-core / Essential, 2. Required / Important and 3. Peripheral / desired skills. For instance, Machine-Leanring/Math/Statistical modeling/R/Python - as Hard-core, Big-Data/Hadoop/Domain - as Important and Front-end programming as the Peripheral. Again, we do not need to ignore any skill-set. But, the emphasis should be on the core/essentials.

Front-end programming is useful in some cases, e.g. if you work for a digital publisher. In my case, I manipulate HTML, Javascript, RSS feeds, XML etc. though emphasis is on automating content creation as much as possible. I don't really write HTML, but I write Perl scripts (back-end) that generate HTML content. So afterall, maybe this is back-end.