Big Data for the Rest of Us

The hype around Big Data is growing to deafening proportions, fueled by the prospect that tools now exist that can let small businesses reap the benefits that companies like Google, Amazon, and Facebook so obviously enjoy from mining vast quantities of all sorts of data.

But is that so? The answer is well, yes, kind of — though probably not as simply and easily as many vendors might like you to think. Small businesses are now testing the waters, and their early experience is already shedding light on what challenges the rest of us need to consider before taking the plunge.

One such is VinoEno, a San Francisco-based wine-recommendation start-up founded by Kevin Bersofsky. The wine industry, Bersofsky says, has been ruled by small data for the longest time. People have had very little to go on to work out whether a bottle of wine is good or not — really just the opinion of a handful of wine reviewers, whose descriptions may or may not be tacked up on the racks in the local liquor store. “You can’t get much smaller — or flawed — than one person giving a score, the Wine Spectator model, that drives an entire industry.” Such limited data, Bersofsky knew, made people very uncomfortable spending even $20 on a bottle.

Bersofsky wants to give consumers a better way to decide what wine to buy — a recommendation engine that can match people’s various tastes to the myriad attributes of various wines. He’s envisioned that he could market such a personalized wine-recommendation engine to wine retailers, to be placed in a kiosk in a store or accessed from an iPad, a personal mobile device, or as a plug-in from the retailer’s Web site.

So the VinoEno staffers have set out to build a system to collect and combine wine sensory attribute data and consumer preferences data, to determine a consumer-specific recommendation. Ultimately, they could see, the project would potentially require the collection of massive amounts of data, much iterative work to develop effective recommendation algorithms, some way to validate consumer preferences, and a lot of experimentation to develop a rewarding end-user experience. A big challenge, and they soon found out that they needed help.

The first challenge was to find talented people who know Big Data and analytics. Google might be able to attract armies of top-flight data analytics people for large-scale number crunching, but chances are that a small business is not going to be able to build its own analytics solution, at least not without help from a data analytics vendor. After unsuccessfully trying to hire in a very talent-constrained market, VinoEno quickly turned to an outside provider, Fabless Labs. In selecting Fabless, VinoEno was looking for an experienced partner that was not too set in its ways — one that was willing to experiment with solutions more suitable for small operations and not be unduly influenced by approaches that had worked for its deeper-pocketed clients.

The second challenge was to decide what tools to use. Here, of course, VinoEno’s people were clueless, depending on Fabless Labs to know which big-data collection and analysis tools would be powerful enough to handle the volume, velocity, and variety of data they’d be working with but also simple enough for them, as non-techy business users, to employ and maintain on their own. “We needed to be able to create and use data that didn’t exist, which was both exciting and scary,” says Bersofsky. VinoEno also depended on its vendor to teach its small, non-technical staff how to handle all that data — about how to implement a cloud strategy, how to move data efficiently, how many data points would make a mathematical model work, data cleanliness requirements, and how to test market the concept.

The third challenge was to decide what types of information matter. What kind of information is worth the cost of collecting it? Should VinoEno be trying to match customers to various attributes of wine? Should it be trying to keep track of which groups of people buy what kinds of wine? Should it include Wine Spectator information: Even though it was the competition, could it be dismissed? In such uncharted territory, the questions simply multiplied. “To be truthful,” Bersofsky says, “we’re still working out what will ultimately solve our problem, and only trial and error will tell us. With online content, you can watch 30 movies per month and build data points quickly. With wine, the work has never been done before.”

The fourth challenge was to remain open-minded. As VinoEno’s staffers developed their application, they had to learn to avoid thinking they already knew the answers. As an early example, they had always believed that people’s flavor preferences would correlate with a wine’s many attributes. But the analytical results suggested that only a far smaller set of attributes matter (and the lack of negative attributes like astringency or burning). It was hard to accept that so many traditional attributes like oakiness or fruitness really don’t figure into people’s buying decisions at all. But if they really were going to learn something new, they simply had to let go of old ideas. After all, Bersofsky points out, “How many times has someone found a radical conclusion on the way to looking for something else? When 3M invented the Post-It Note, they were looking for something entirely different. If we stay glued to our conviction, we’ll miss other sign posts along the road.”

The fifth challenge was to spot the finish line. VinoEno’s founders needed to define what overall success for the recommendation app really meant. This turned out to be more of an art than a science, a combination of trial and error and gut feel for what a good recommendation would eventually be for an individual consumer. “We had no way to validate the result and no one to confirm the recommendation,” Bersofsky explains. “We decided the secret to making the answer valid was to promote the result as the best possible answer available based on the trial data, especially to the sensory scientists on the team who struggled to buy in.”

Wine consumers will be pleased to know that the first generation of the engine is now complete. With Fabless Labs’ help, VinoEno got it up in just three months and for far less than the estimated $250,000 it would have cost to hire a dedicated team. VinoEno is currently test-marketing VinSpin, the first iteration of the engine, and the proof will be in the pudding — in this case, the consumers’ perception of the value of a recommendation.

That’s one company’s story. I’d be interested to hear from others. Have you had a different experience with big data? What would be your recommendations to those just starting down this path?