Migraine in My Stat Brain

DataSeer recently “certified” me in Predictive Analytics and modeling that could “maximize profit” and “drive decisions through data.”

TL;DR version: DataSeer’s smart and young team delivers kickass learning on data science and more. Five stars.

It's been years since I’ve had a headache from trying to learn something new, and I got a pleasant taste of it again last week. The culprit was DataSeer’s Predictive Analytics for Data Driven Decision Making and Profit Maximization, as taught by data scientist and competitive weightlifter (more on this later) Isaac Reyes.

(Obviously you got marketers and budget approvers to thank for the latter two-thirds of that course title.)

99 predictive analytics problems and an Excel glitch ain’t one.

It’s not that multivariate regression or association rules were head-splittingly difficult to understand (I majored in math, thank you very much). Mind you, it’s no walk in the park either.

It’s just that if you’re intellectually masochistic like me, you’d treat each of the course’s practical problems and challenges (yes, Practical Problems are different from Challenges) like real, uhm, problems and challenges. Sinkholes and quagmires. Conundrums and predicaments. Worries, cares, troubles, and doubts.

So here’s what we covered (warning: fancy terms ahead):

Multivariate Regression

Predictor Selection

Stepwise Regression

Backward Elimination

Logistic Regression

Market Basket Analysis / Association Rules

Decision Trees

Regression Trees

Classification Trees

But no worries, we had the help of some Excel tools and add-ins.

Solver

Analysis ToolPak

NumXL

Real Statistics

XLSTAT

Crude, yes. Cheap, maybe. But, as Isaac points out, doing it in Excel is not as much of a blackbox as just entering a piece of R or Python code and getting the results instantaneously. (So it’s a greybox?) With Excel, you still gain some appreciation of the inner workings of the predictive model.

Which brings me to my first meta learning from this course:

The meaning of ‘solving by hand’ has changed.

It’s not literally using one’s hand to put pen to paper (then again, when is it ever?), but it’s using the only partially blackbox Excel Solver to determine the coefficients that would yield the model with, say, minimized squared distances from the observed dataset (in a linear regression).

These days, you know how to ‘solve by hand’ if you know how to use Solver in Excel.

Imagine that: using Solver is now considered manual computation.

That says a lot about much higher our standards are for what is considered automated. I need to enter one formula before clicking a button and getting an answer? What is this, the Dark Ages?

The data science nerds have won?

With the now-hackneyed Harvard Business Review quote that data scientist is the sexiest job of the 21st century, you would think the nerds have won, wouldn’t you?

Think further about the composition of my class: statisticians and engineers, either by education or profession.

Or the prerequisite background for DataSeer’s next course in the data science series (Machine Learning, or as marketers would have budget approvers believe, Predicting Customer Behavior and Generating Revenue with Machine Learning): statistics, math, or physics. Because Isaac says, “It gets pretty heavy.”

So have the nerds won? Is geeky the new sexy?

What if I told you data scientist Isaac won a bronze for weightlifting at the recent Philippine National Games? Would he be a nerd or a jock? And if it’s the latter, wouldn’t that mean the jocks—not the nerds—have won?

Add to that the group of brogrammers who kept silently to themselves but you knew were plotting revenge on the next dataset.

But what if my model performed better on the holdout dataset than everyone else’s—the nerds’, the jocks’, the bros’? Would that mean the weird emo kids and goths won?

It would mean that data scientists aren’t anything like high school kids.

Which is more than what Isaac can say about predictor variables: very cliquey and highly selective about who they let in.

Data science is good, but a relaxed culture is too.

I’m not sure if this is something Isaac brought back from Australia, but a positive, customer-focused energy seems to radiate from the DataSeer team. You could see they were really trying to make the experience—from the content to the venue to the refreshments—as good as possible for each student. And I’m normally a surly guy.

In one of the icebreakers, I won a bottle of sweet white wine from a vineyard in Mudgee, by giving the closest guess of its weight. How I did it: I googled an informal survey of the weight of 750mL wine bottles sans contents (the study was about how heavy or thick the bottles were). And I based my guess on one sample total weight (bottle + contents) that most closely resembled the sweet white in question: a chardonnay. My guess: 1.158kg; the actual: 1.1kg in an analog scale. I bet my guess would have been found much closer to the actual, had the scale been digital. The whole lesson here is, when it comes to alcohol, I'm pretty motivated.

Isaac said he was surprised at how strict (personally, I’d call it uptight) Philippine companies and office buildings are about alcohol in the workplace (none). He said, in Australia, crates of beer being trucked into offices are the norm on Friday afternoons.

I heard that another Australian company in the Philippines, the wildly successful Atlassian, has free beer in the pantry all the time. I think Philippine companies—Philippine corporate life, in general—really just need to loosen up. It might do them some good.

Make your core great, and the peripherals will be…peripheral.

The training venue was good, the refreshments quite modest. But I didn’t care, because the core of the experience—learning content—was topnotch. I was learning something new and practical the first five minutes.

I cannot speak for DataSeer’s other services (offshore data scientist staffing and web analytics), but the takeaway here is: if you concentrate most of your energy making your core experience excellent, the peripherals—the bells and whistles—won’t matter as much to your customers.

And it would be good for your margins too.

Choose your analytics pitcher appropriate to the expected returns.

The Philippine data analytics market seems to be in a weird place right now.

On one hand, Isaac says the amount of time and effort he spends pitching data science projects to local companies (who require too much proof of benefits and don’t want to make the appropriate level of investment) would yield greater returns if he’d spent it delivering projects from Australia, where the analytics market is way more mature.

On the other hand, the analytics market in the Philippines does need to be educated before the floodgates open.

True, but that’s because it’s Isaac himself doing the pitching locally. His time is far better spent leading the delivery on high-margin projects.

It’d be a different picture if he had local business development with the right balance of seniority/experience/expertise (to influence and represent to prospects that DataSeer’s the real deal) and cost-effectiveness (not a principal data scientist like himself).

Fluffy intuition still matters in data science (kinda).

Call it what you will—domain knowledge, experience, inference, common sense, insight, street smarts—but you need it in data science, not just knowledge of code or statistical modelling.

Three times this was evidenced during the course. First, total elapsed time in a flight is a straightforward sum of time in the air and time taxiing in and out—no regression needed.

Finally, in a dataset of commercial flights sorted by carrier and data, don't just choose the first x,000 records to be your training dataset—it obviously won't be representative of the entire dataset and would likely predict poorly on the holdout data. (I believe smart selection of records for the training dataset allowed my team to win a box of Krispy Kremes.)

My inclination towards simplicity and minimalism served me well in this Market Basket Analysis data visualization, despite skipping the Data Storytelling for Business visualization course.

The Philippine data science industry is still nascent.

Hint 1: In addition to DataSeer, Isaac knows only of four other Philippine players in the data science space: Thinking Machines, lloopp, Savvysherpa, and Z-Lift.

Google them, look at their founders’ and employees’ LinkedIn profiles, and you’ll soon find they’re all legit and all Filipino-founded (except for Savvysherpa, whose ownership I have yet to determine, but it is definitely Cebu-operating).

Now, there’s a number of other Philippine companies saying they do analytics. But compared to these five guys (PhDs, Stanford grads, ex-Googlers, rumored ex-NSA agents—did I mention this post is part-fiction?), everyone else looks like a hack. I don’t know, I guess I have to vet the competitive landscape further.

Meanwhile, one of the largest unibanks in the Philippines—with around 15,000 employees, less than 1,000 branches, and about 3,000 ATMs—has a dedicated analytics team of only four.

That’s some disproportion right there.

Data science is not your normal backend work.

When Isaac mentioned that DataSeer’s model is to acquire projects in Australia and get the backend work done in the Philippines, I asked him what exactly is ‘backend work’ in analytics. “Is it data cleansing and preparation?”

“No,” he said. Australian companies apparently have internal people to cleanse the data, and DataSeer receives data that’s pretty cleansed already.

The backend work he refers to that’s being done in the Philippines is modeling. Still pretty high-end stuff, compared to the typical ‘backend’ work that most Philippine outsourcing companies do. And what Isaac himself does is guide the modeling effort and ensure that the methods applied are sound.

To power him through all these, he takes Philippine-style curry chicken. Because like data science, curry may not have originated from us, but we sure make a badass adaptation of it.