So… Are We Data Scientists or What?

On Tuesday I introduced the notion of a Data Scientist – it’s a hot new field, there’s a huge shortage of qualified people, and maybe PowerPivot gives us a shot at some of the action.

So… are we Data Scientists? Are we allowed in their hip new club?

Being Careful: Things We Are Not

In our quest for broader horizons and appreciation, Excel Pros need to be careful – if we overdo it, we may look silly. So first let’s cover a few things that we don’t do, and make sure we don’t advertise ourselves as such – because to many people, “Data Scientist” implies these things.

“Bad” News #2: We are not “real-time recommender” programmers

Related to above: The term “data scientist” sprung up out of companies like Amazon, Google, LinkedIn, and Facebook – companies that have a strong interest in analyzing the behavior of individual people and using that analysis, in real-time, to provide the user with a different experience.

“Pages most likely to answer your Google search,” “People you may know” and “Other items that may interest you” – good features for sure. And it’s not like a PowerPivot Pro could do that for you.

But if you’re a data scientist at one of those kinds of companies, you are likely to be doing that kind of work. So don’t walk up to Google and say “hire me, I’m a data scientist because I know PowerPivot.”

Bad News #3: We Are Not Sentiment Analysts

Standard problem for brands: what is Social Media saying about my product?

Imagine scanning millions of tweets and Facebook posts trying to determine if people are saying positive or negative things about my product.

Are these people saying good things or bad things about these products?Can you write a formula? Me neither.

“Bad” News #4: We are not linear regression / statistics gods

Listen, my college statistics course was at the 8 am time slot in the dead of a particularly cold Nashville winter, and I was an underachieving “who needs the class when you have the book” smart aleck from Florida. How often do you think I attended that class? (Note that I also didn’t like learning from books, but was unaware of the irony in my position. I became more disciplined as the years wore on, but it was a slow process).

In all seriousness, even if I had been an eager attendee of that course, I still don’t think that would have changed much. For me, the following chart is enough to show me that good weather probably has a negative impact on sales of cold medicine:

This would never, ever be acceptable to a Statistics PhD, nor would it constitute proof in a true scientific context. It falls under the headings of Good Enough and Probably Significant, which is the reality most of us deal with.

GOOD News #1: But there is always a spectrum!Not All Data Needs are Like That!

But enough about the things we are NOT, let’s focus on why I think we do have a legitimate claim on the Data Scientist title, as long as we don’t oversell it.

Let’s return to that table from the prior post:

The term “Data Scientist” is still new enough that most people have never heard it,but we’re ALREADY in the process of subdividing it!

I don’t think the table above shows that Data Science is a sham, nor do I think it’s silly of the Constellation folks to create it. Quite the opposite, actually – I think it reflects a lot more wisdom than most people possess.

Here’s a diagram I’ve been using for a long time now:

Traditional BI: It Just Isn’t Used as Broadly as Excel, Not by a Longshot(Don’t think of this as a bell curve, think of it as a mountain,with BI representing Peak Sophistication)

In other words, the bright shining center of Data Science is the Silicon Valley, Google/Facebook/Twitter, Type I stuff. OK.

But don’t you FEEL it? The world is changing beneath our feet! Data is everywhere. Data is HOT. People are waking up to the need to be smart – the easy money is all made, we’re into the “hard” stuff now. Being smart is becoming increasingly critical.

The mere EXISTENCE of a term called Data Science, and that it has gained so much traction so quickly, is HUGE news for us! The fact that all of the articles about it emphasize business acumen, curiosity, and liking data – also HUGE for us! That we are simultaneously being handed a tool set with dramatically expanded capabilities couldn’t be a more fortunate case of timing, too.

A Series of Questions That Illustrate What I Mean

Does every company drowning in data have Facebook-style problems? Heck no! Even mid-size business are absolutely awash in data, and they are waking up to how much they are leaving on the table. They are waking up to how much value there is in being smart.

Are there enough PhD data scientists in the world to address all of the world’s data problems? Heck no! And there never will be, just like there were never enough BI pros.

Is the term “Data Science” capturing people’s imaginations because everyone has sentiment analysis or machine learning problems? Heck no! From a business owner’s perspective, I think the “hotness” of the term owes to the intersection of the following psychological factors:

“I am now realizing the potential value of data, of being smart”

“Data Science is a term I can understand, that sounds like it can help me, and doesn’t scare me.” (“Business Intelligence” was always too cold and “standoffish” for its own good).

“It doesn’t sound like yesterday’s stuff. It sounds new and magical. It does not sound like a stuffy and dusty office full of archaic spreadsheets. It sounds like alchemy. It sounds nimble and sharp.”

And PowerPivot very much lives up to #3

If a company has a “real” Data Scientist, does that mean PowerPivot Pros have no role to play? Heck no! Can you imagine the beautiful things we can do cooperatively? Example:

Data Scientist preps the data in ways I cannot.

I build metrics, reports and analyses at the speed of thought, against that data, and identify theories, hunches and patterns.

Data Scientist then runs sophisticated, targeted analyses on those hypotheses, validates or rejects them, and then perhaps implements algorithms that allow the business to leverage those findings in real time, to optimize efficiency.

Is it better to be 100% correct and 100x slower? Heck no!Being 90% correct in 1% of the elapsed time will always be very, VERY valuable. And we are very good at that.

Is PowerPivot a Static Entity? Will We Never Get New Capabilities? Heck no! My former colleagues in Redmond are not ones to sit idle. They know they are onto something big with the PowerPivot, Excel-Pro-Focused approach. And they are going to constantly chip away at the ivory tower on our behalf.

(Don’t worry about the ivory tower though – it’s a renewable resource and always keeps getting taller, even as we chip away at it).

3 Responses to Can PowerPivot Pros Call Themselves Data Scientists? Part 2: Finding a Balance Between “Yes” and “No”

With only a month of PP under my belt i am already finding out how I can use my knowledge of business combined with superior data crunching skills to come up with new ways to analyses large data sets.

i have not seen anything that power pivot does that I could not do in some back ended way in excel (i.e. with pivots, vlookups, and then in some cases doing repivots of a hard code of the pivot results). But it is faster (sometimes) to put together, definitely more flexible, cleaner, better controlled, and processes way quicker than plain excel.

I also like the concept of data analysis. The idea of the business to IT conduit has been something that has been talked about for a while and this is a good example of it coming to fruition.

Most organisations have most data manglers banging away inefficiently in Excel producing anything BUT a decision. Let alone a good one. At ALL levels between your Type I and Type IV data workers. So there’s something missing from your “Amount of World’s decisions powered by Excel”….the thin nano-slice that says “Amount of World’s GOOD decisions powered by Excel”. (Or maybe it’s not missing. It’s there, but is narrower than 1 pixel.)

Put another way, what percentage of spreadsheets do you think are ultimately are fit for some purpose? What percent of data workers across your average organisation contribute a net benefit after their salary and overheads are covered. Do organisations succeed because of all those spreadsheets, or in spite of them.

Sometimes the sum use of my well-honed powerpivot/pivot/sql/vba skills – and my econometrics degree – is to take someone’s large, slow, incomprehensible model and help it reach its inevitably wrong conclusion a heck of a lot faster, with a heck of a lot less manual manipulation of dodgy input along the way.

Often the only gratification I can take from my role is that if someone wants something pretty much irrelevant done, then with my skillset and toolset I can help them achieve it with the minimum efficiency drag on the organisation.

For the majority of organisations, science doesn’t drive most spreadsheets. Dogma does.

While we could do with more people with better tools to help solve the world’s data problems, we also need requisite growth in more/smarter managers capable of asking themselves “Before I make this data monkey crunch all this data, how will I know if the result they come up with is credible? What makes we think we have the data to answer this question? Is this our most pressing question? Will knowing the answer change my behaviour? Do I already know the answer?

Or am I just looking for a safe secure blanket that makes me feel comfortable?

We are Data Scientists, Data Engineers
In addition I feel we are ALSO Data Plumbers. We have data that is leaking and keeping the location from being an airtight operation. It may be small it may be catastrophic but we have in effect either plug small leaks or provide re-piping service The decision-maker wants the water on when he turns on the faucet. We provide the filter that takes the water from the Cuyahoga ( I remember when it caught on fire) and remove the sludge and make it fit for human decision-making.

Part of the idea came from Nixon’s plumbers the unit he formed to plug leaks. Today we still have that problem.

From the inception of data there has been “holes”. Those holes need to be plugged so we can go about the business of maximizing revenue and making intellingent decisions about expenses.