Search canberratimes:

Search in:

Big Data is great, but so is intuition

Steve Lohr

Data brain... How do you draw the best insights from your data? Photo: iStock/lawrence.dutrieux

It was the bold title of a conference this month at the Massachusetts Institute of Technology, and of a widely read article in The Harvard Business Review last October: "Big Data: The Management Revolution."

Andrew McAfee, principal research scientist at the MIT Center for Digital Business, led off the conference by saying that Big Data would be "the next big chapter of our business history." Next on stage was Erik Brynjolfsson, a professor and director of the MIT center and a co-author of the article with McAfee. Big Data, Brynjolfsson said, will "replace ideas, paradigms, organisations and ways of thinking about the world."

These drumroll claims rest on the premise that data like web-browsing trails, sensor signals, GPS tracking, and social network messages will open the door to measuring and monitoring people and machines as never before. And by setting clever computer algorithms loose on the data troves, you can predict behavior of all kinds: shopping, dating and voting, for example.

The results, according to technologists and business executives, will be a smarter world, with more efficient companies, better-served consumers and superior decisions guided by data and analysis.

Advertisement

I've written about what is now being called Big Data a fair bit over the years, and I think it's a powerful tool and an unstoppable trend. But at year-end, I thought, might be a time for reflection, questions and qualms about this technology.

Quest for insights

The quest to draw useful insights from business measurements is nothing new. Big Data is a descendant of Frederick Winslow Taylor's "scientific management" of more than a century ago. Taylor's instrument of measurement was the stopwatch, timing and monitoring a worker's every movement. Taylor and his acolytes used these time-and-motion studies to redesign work for maximum efficiency. The excesses of this approach would become satirical grist for Charlie Chaplin's Modern Times. The enthusiasm for quantitative methods has waxed and waned ever since.

Big Data proponents point to the internet for examples of triumphant data businesses, notably Google. But many of the Big Data techniques of math modeling, predictive algorithms and artificial intelligence software were first widely applied on Wall Street.

At the MIT conference, a panel was asked to cite examples of big failures in Big Data. No one could really think of any. Soon after, though, Roberto Rigobon could barely contain himself as he took to the stage. Rigobon, a professor at MIT's Sloan School of Management, said the financial crisis certainly humbled the data hounds. "Hedge funds failed all over the world," he said.

The problem with math

The problem is that a math model, like a metaphor, is a simplification. This type of modeling came out of the sciences, where the behavior of particles in a fluid, for example, is predictable according to the laws of physics.

In so many Big Data applications, a math model attaches a crisp number to human behavior, interests and preferences. The peril of that approach, as in finance, was the subject of a recent book by Emanuel Derman, a former quant at Goldman Sachs and now a professor at Columbia University. Its title is "Models. Behaving. Badly."

Claudia Perlich, chief scientist at Media6Degrees, an online ad-targeting start-up in New York, puts the problem this way: "You can fool yourself with data like you can't with anything else. I fear a Big Data bubble."

The bubble that concerns Perlich is not so much a surge of investment, with new companies forming and then failing in large numbers. That's capitalism, she says. She is worried about a rush of people calling themselves "data scientists," doing poor work and giving the field a bad name.

Indeed, Big Data does seem to be facing a workforce bottleneck.

"We can't grow the skills fast enough," says Perlich, who formerly worked for IBM Watson Labs and is an adjunct professor at the Stern School of Business at New York University.

A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needed 140,000 to 190,000 more workers with "deep analytical" expertise and 1.5 million more data-literate managers, whether retrained or hired.

Step one: defining the problem

Thomas H. Davenport, a visiting professor at the Harvard Business School, is writing a book called Keeping Up With the Quants to help managers cope with the Big Data challenge. A major part of managing Big Data projects, he says, is asking the right questions: How do you define the problem? What data do you need? Where does it come from? What are the assumptions behind the model that the data is fed into? How is the model different from reality?

Society might be well served if the model makers pondered the ethical dimensions of their work as well as studying the math, according to Rachel Schutt, a senior statistician at Google Research.

"Models do not just predict, but they can make things happen," says Schutt, who taught a data science course this year at Columbia. "That's not discussed generally in our field."

Behavioural loop

Models can create what data scientists call a behavioural loop. A person feeds in data, which is collected by an algorithm that then presents the user with choices, thus steering behavior.

Consider Facebook. You put personal data on your Facebook page, and Facebook's software tracks your clicks and your searches on the site. Then, algorithms sift through that data to present you with "friend" suggestions.

Understandably, the increasing use of software that microscopically tracks and monitors online behavior has raised privacy worries. Will Big Data usher in a digital surveillance state, mainly serving corporate interests?

Personally, my bigger concern is that the algorithms that are shaping my digital world are too simple-minded, rather than too smart. That was a theme of a book by Eli Pariser, titled The Filter Bubble: What the Internet Is Hiding From You.

It's encouraging that thoughtful data scientists like Perlich and Schutt recognise the limits and shortcomings of the Big Data technology that they are building. Listening to the data is important, they say, but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model?

At the MIT conference, Schutt was asked what makes a good data scientist. Obviously, she replied, the requirements include computer science and math skills, but you also want someone who has a deep, wide-ranging curiosity, is innovative and is guided by experience as well as data.

4 comments

Big data - what a load of crap! There is a well known aphorism - "garbage in garbage out". Furthermore, the analysis results are only as good as the theory behind the analysis. Big data is the lazy way of thinking that if only you crunched larger sets of data you will would get better results.The key behind understanding what data means is to come up with a model (based on expected behaviour) and tailor the data analysis to refine the model. If the model is not inherently right to start with, crunching the data will never give you "accurate" results. As an example, take sports statistics. Only someone with a good understanding of the particular sport can analyse the data to make judgements based on the data. Without this key understanding, the data is useless.

Commenter

Misha

Location

Tumbi Umbi

Date and time

December 31, 2012, 12:31PM

All this data analysis seems to be more about power, control and manipulation than business. It seems to more about how to herd people as data points in order to monetise just about everything. But is it necessary? surely if a business builds a useful product/service and people are willing to pay for it end of the story. It does seem to be a struggle now for businesses because all products and services these days are monopolised by a handful of corporations, so business needs to find more market data to find a new product or service to extract some business out of.

Commenter

Dave

Location

Canberra

Date and time

December 31, 2012, 12:39PM

yep - as an IT person I was rather impressed to read about the US Presidential campaign where voter preferences were tracked down to the local level so they could target ads to each community - e.g. for black, single, 30-35yo with kids, in a small town - that specific. Apparently Obama's team again targeted this better despite a lower spend than the Romney team, so won the election.

With that as a case study - proponents and sales people will be licking their lips in anticipation..

But I agree that we can expect a Big Data Bubble when the demand for and shortage of skilled workers drive a push for quick profit over reliable quality, and the inevitable crash happens - maybe like the 2000 dot com boom/bust, we may have a 2015 big data boom/bust.

Commenter

frank

Location

sydney

Date and time

December 31, 2012, 4:13PM

Big Data is nothing new it is merely more sophisticated models enabled by technology. The problem is we overemphasise technology in these areas and do not pick up on point like maybe the models are not that great. If you look at finance we had program trading blamed for the stock crashes of the late eighties, hedge funds buying and selling on models etc.... What it shows is that sometimes its not about the technology or the model but how logic and thought is applied to be a process. This seems to be missing and too often we still prove that a "fool with a tool is still a tool:.

Subscribe to IT Pro

Follow Us

Editor's Choice

Prime Minister Tony Abbott has bolstered Malcolm Turnbull's ministerial duties, handing him greater responsibility for e-government in a push to expand the use of a single digital identity for Australians.

Data

The new roof that spans Margaret Court arena does more than keep out the weather. Built into the gantries that surround the sliding ceiling are Wi-Fi antennas that beam web access to every ticket holder.