How To Build A Successful Data Science Team

Don't try to find one superhuman who does it all. You need three experts: business analyst, machine learning expert, and data engineer, says Lithium Technologies chief scientist.

IBM Predicts Next 5 Life-Changing Tech Innovations

(click image for larger view)

Is there really a data scientist shortage, or are organizations simply trying too hard to recruit a unicorn, a jack-of-all-trades who possesses both advanced technical and business acumen?

If the unicorn hypothesis is true, it would explain why the scarcity of data scientists is expected to worsen in the coming years.

The solution isn't difficult, some industry insiders believe, but rather one that might prove unpopular with cost-conscious organizations unable or unwilling to hire a data science team rather than a single data scientist.

Dr. Michael Wu is chief scientist of Lithium Technologies, a San Francisco-based company that sells social customer experience management software to businesses. Not surprisingly, Lithium captures a lot of data on consumer behavior, and part of Wu's job is to analyze that information and predict customer actions on an aggregate level.

Wu believes term data scientist is tossed around loosely these days, so much so that it's creating a bit of confusion in the tech industry.

"What the industry calls a 'data scientist' now is really several different roles," said Wu in a phone interview with InformationWeek. "When people say there's a shortage of data scientists, (they mean) there is a shortage of people with all of these different skills."

Wu subdivides the data scientist role into three distinct jobs, each requiring a different skill set: business analyst, machine learning expert, and data engineer.

"You need these three groups of people to work together in order to inform the business decision-makers," said Wu.

The role of business analyst existed long before the terms "big data" or "data scientist" were in vogue. This person works with front-end tools, meaning those closest to the organization's core business or function, such as Microsoft Excel, Tableau Software's visualization tools, or QlikTech's QlikView BI apps. A business analyst might also have sufficient programming skills to code up dashboards, and have some familiarity with SQL and NoSQL.

The recent hype surrounding big data, however, has led many business analysts to rebrand themselves as data scientists even though they are not, according to Wu's definition.

"It automatically gives them a little boost in their salary," Wu said, chuckling.

The second data science role is that of machine-learning expert, a statistics-minded person who builds data models and makes sure the information they provide is accurate, easy to understand, and unbiased.

"These are the people who develop algorithms and crunch numbers," said Wu. "They are interested in building models that predict something."

A machine-learning expert, for instance, might develop algorithms that predict consumer sentiment or estimate a person's influence in a particular industry.

"There are even machine-learning algorithms that look at images and tag them automatically, or look at videos and try to understand what the video is about," said Wu.

Like the business analyst, the machine-learning expert isn't a new profession, but rather one that's existed "in the last 30 years or so," Wu estimated.

The third key job, data engineer, is "the bottom layer, the foundation," said Wu. "They are the ones who play with Hadoop, MapReduce, HBase, Cassandra. These are people interested in capturing, storing, and processing this data… so that the algorithm people can build models and derive insights from it."

However, it's nearly impossible to find one person -- that data scientist unicorn -- who excels in each of these three areas, Wu said. And that's why organizations must focus instead on building a data science team.

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

When it's mearly impossible to find a unicorn, we need to learn how to create one. Agree that you can take 3 pros and rotate them from position to position to make them have all relevant knowledge in the domain and become autonomic. Still, the positions will lose what we call "specialist" feature. Indeed, the value of these specialists is still high. You can hire many really good wedesigners, but when you realise the job is tough, what do you do? You go to a specialist. So this is the point, sometimes evening the forces is needed (espesially in the modern world at this pace and with all the modifications), but it's not a "unicorn" answer anyway. Ability to tune them, to make specialists correlate and wok together - that's is a much important challenge. Just like matching customers with services (more on LoansMob portal), matching pros is a core of success.

I like the post, but there are more roles that stated in this article that one should look for in a team like this -- especially if we are looking for roles and not people. Statisctician, data analyst, software developer and project manager are very important roles.

One of the items to address re: data science isn't so much the skills or team, but the understanding of what the data all means. This is not merely an academic exercise. This is business. Patterns need to be mapped and measured into a relevance scale. Without this added step, Big Data will be just that ... more and larger data. Context. Context. Context.

The three roles captured in the term "data scientist" perhaps should be handled by three people of different skill sets. But in an ideal world, these three people would rotate jobs within the trio every three months until each could take a stab at performing the role of the others. The person who consistently performs best in all three roles should be named the team leader. Sounds crazy, but the world will always need generalists on top of the specialists.

Thanks, Rob. I could go on and on about this. A "Data" Scientist? What would we expect an "Air" Scientist to know and do? How about a "Dirt" Scientist", "Atom" Scientist, "Wood" Scientist, "Word" Scientist, "Fur" Scientist (not to be confused with a "Hair" Scientist? Which came first Data Science, or the First Data Scientist? Heck, I doubt that there is an agreed to definition of Data, not among Data Scientists anyway.

IMHO, the very idea that a number cruncher is expected to develop consumer insights is so naive that it does littel more than show the tendency for everyone in every discipline whatsoever to assume that someone else does the work, and the "user" need know nothing, or do nothing other than be a "manager" who hires a bunch of other managers all the way down to the single person who does everything, which is then fed up the food chain to The Manager.

So the best way to succeed is by knowing nothing, but by getting to manage the most managers you are capable of. Just be sure that the lowest level manager has a Data Scientist working for them. That low level manager can always replace the Data Scientist if the individual is not up to carrying the Company.

We as a society tend to throw around the words "scientist" and "science" too liberally. Colleges award bachelor of science degrees in such non-scientific fields as management and philosophy. Urban planners fancy themselves as social scientists. And don't get me started on political science--double-speak is far more of an art.