Top Stories

Miko Matsumura is a Vice President at Hazelcast, an open source in-memory data grid company. He is a 20-year veteran of Silicon Valley. Previously he served as SVP of Platform Marketing and Developer Relations at Kii Corporation, a mobile data cloud company. He also served as a CTO for Software AG and a VP at webMethods, which acquired his startup company INFRAVO for $38 million. He holds an MBA from SFSU as well a Master's degree in Neuroscience from Yale University. Miko also serves as a speaker, recently providing a keynote for the W-Jax developer conference in Munich, and as an advisor to startup companies.

Fun fact: nothing on this blackboard makes any sense.

Data Science is dead.

Science creates knowledge via controlled experiments, so a data query isn’t an experiment. An experiment suggests controlled conditions; data scientists stare at data that someone else collected, which includes any and all sample biases.

Now, before you drag out the pitchforks: I’m not a query hater. You won’t see me standing outside the Oracle Open World conference with a sign that says “NO SQL” on it. Queries are fine. Smart people don’t always have the right answer, but they need to ask the right questions. Yes, building a query is like “forming a hypothesis,” but at that point we enter the realm of observational or “soft” science. Yes, by this standard, Astronomy and Social Sciences are also not sciences. I have no idea what Computer Science is, but no, it’s not a science either.

Oh what’s that? Your kind of “Data Science” includes things such as A|B Testing, and your “experiments” actually involve executing designs that affect the world? Allow me to retort: that’s not Data Science, that’s actually doing a job. You might have a job title like Product Management or Marketing. But if your job title is “Data Scientist,” you are effectively removing yourself from the actual creation of data.

I do sympathize. I appreciate that it’s no longer sexy to be a Database Administrator, and I guess the term “Business Analyst” is a bit too 1980’s. Slapping “Data Warehousing” on a resume is probably not going to land you a job, and it’s way down there with “Systems Analyst” on the cool-factor scale. If you’re going to make up a cool-sounding job title for yourself, “Data Scientist” seems to fit the bill. You can go buy a lab coat from a medical-supply surplus store and maybe some thick glasses from a costume shop. And it works! When you put “Data Scientist” on your LinkedIn profile, recruiters perk up, don’t they? Go to the Strata conference and look on the jobs board—every company wants to hire Data Scientists.

OK, so we want to be “Data Scientists” when we grow up, right? Wrong. Not only is Data Science not a science, it’s not even a good job prospect. In the immortal words of Admiral Akbar: “It’s a trap.”

Well, that sounds like a reasonable—albeit buzzword-filled—job description, no? There is going to be a ton of data in the future, certainly. And interpreting that data will determine the fate of many a business empire. And those empires will need people who can formulate key questions, in order to help surface the insights needed to manage the daily chaos. Unfortunately, the winners who will be doing this kind of work will have job titles like CEO or CMO or Founder, not “Data Scientist.” Mark my words, after the “Big Data” buzz cools a bit it will be clear to everyone that “Data Science” is dead and the job function of “Data Scientist” will have jumped the shark.

Yes, more and more companies are hoarding every single piece of data that flows through their infrastructure. As Google Chairman Eric Schmidt pointed out, we create more data in a single day today than all the data in human history prior to 2013.

Unfortunately, unless this is structured data, you will be subjected to the data equivalent of dumpster diving. But surfacing insight from a rotting pile of enterprise data is a ghastly process—at best. Sure, you might find the data equivalent of a flat-screen television, but you’ll need to clean off the rotting banana peels. If you’re lucky you can take it home, and oh man, it works! Despite that unappetizing prospect, companies continue to burn millions of dollars to collect and gamely pick through the data under respective roofs. What’s the time-to-value of the average “Big Data” project? How about “Never”?

If the data does happen to be structured data, you will probably be given a job title like Database Administrator, or Data Warehouse Analyst.

When it comes to sorting data, true salvation may lie in automation and other next-generation processes, such as machine learning and evolutionary algorithms; converging transactional and analytic systems also looks promising, because those methods deliver real-time analytic insight while it’s still actionable (the longer data sits in your store, the less interesting it becomes). These systems will require a lot of new architecture, but they will eventually produce actionable results—you can’t say the same of “data dumpster diving.” That doesn’t give “Data Scientists” a lot of job security: like many industries, you will be replaced by a placid and friendly automaton.

So go ahead: put “Data Scientist” on your resume. It may get you additional calls from recruiters, and maybe even a spiffy new job, where you’ll be the King or Queen of a rotting whale-carcass of data. And when you talk to Master Data Management and Data Integration vendors about ways to, er, dispose of that corpse, you’ll realize that the “Big Data” vendors have filled your executives’ heads with sky-high expectations (and filled their inboxes with invoices worth significant amounts of money). Don’t be the data scientist tasked with the crime-scene cleanup of most companies’ “Big Data”—be the developer, programmer, or entrepreneur who can think, code, and create the future.

Miko Matsumura is a Vice President at Hazelcast, an open source in-memory data grid company. He is a 20-year veteran of Silicon Valley.

Related Jobs

Comments

By ‘unstructured data’ author probably means data not residing in relational databases. While it is true that most of analysis ( mining, adv. analytics, or whatever name you want to put here ) is done on structured data, how does he explain successes of web companies in analyzing unstructured data ?
Also, analyzing data is being around for decades now in insurance, banks, telcos, marketing. Very old and necessary profession, which is now becoming more common under new buzzword terminology. They actually analyze old, archived data and have some success in doing so.
I agree that scientist is bad name, similar was observed in many other areas trying to get themselves to the next level.

“the winners who will be doing this kind of work will have job titles like CEO or CMO or Founder, not “Data Scientist.” ”

Well, yes.

But the first two are arrival points rather than career paths. Maybe Data scientist is a poor name, but as a career path, I would defend the opportunity, satisfaction and power to affect change versus any other.

Respected author argues that data science/scientist is just a sexy title for a range of already known job disciplines/definitions. Lets think in reverse order:

As a part of known definition, Is a Data Warehouse Analyst concerned about using ontologies, different data representation than relational (e.g. graph) and so on to achieve new knowledge? Does a Database Administrator usually know about how to correctly discretize numerical values? Can a classic Statistician to deal with ambiguous measurements or fuzzy units of measurement based on probability theory? Does the structure only exist in traditional relational or more recent RDF triples schemas? These are some of data scientists’ tasks.

In contrast with idealistic, clean, table formed perfect manageable domestic datasets a new kind of wild, real life, messy, large and less traditionally structured data is out there. The need for interdisciplinary data workers who be able to combine different fields of expertise to explore this wild data in new ways, is what coined the title of data science/ data scientist. It is extremely interdisciplinary and exploratory, mostly dependent on specifically designed experiments on data and so called being a science. This is a growing need and new real emerging field, not a temper.

Based on my interactions with recruiters, I agree with author. Your expertise is probably worth “north of” $100K salary, whereas data scientist label these days is asspciated with $100K on a 1099, which is nowhere near as lucrative

Statistics, language theory, control and instrumentation, signal processing, software engineering are all tools of the scientist. Data Scientists ARE performing “controlled experiments” using data every day.

Based on my interactions with recruiters, I agree with author. Your expertise is probably worth “north of” $100K salary and indicative of REAL SCIENCE, whereas data scientist label these days is associated with $100K on a 1099, which is nowhere near as lucrative. For now, I am keeping my scientist job and not falling for the data scientist lure.

Lets talk about who a data scientist is in the first place.
He is a programmer/developer and a mathematician/statistician rolled into one. He possesses commendable analytical skills and a strong business sense. Any one is enough to land a job, let alone all of them.

Who will who make a ceo?
A certified jack of all trades with knowledge about all areas of the comapny or a specialist programmer who does the same thing day in and day out?

DATA DRIVEN DECISION MAKING is here to stay. That’s because numbers never lie even if u can Mr. Author-without-basic-common-sense.