When you think of big data and its impact on ecommerce, words such as Hadoop, NoSQL, and predictive modeling might spring to mind. DNA research? Not so much. But Wayfair, an online retailer of home furnishings, is applying research from the scientific discipline of protein analysis to a more pragmatic problem: How to recommend relevant products to shoppers on its site.

In late 2011, the company was searching for a better customer recommendation system. "We found that the family of techniques most well known in that area just didn't work with our data," said Ben Clark, Wayfair's director of search and recommendations, in a phone interview with InformationWeek.

Clark and his team of data scientists scoured academic and industry research papers for innovative approaches, or new ways to look for patterns in data.
"One of my guys had a particularly inspired flight of intuition in connecting things that don't--on their face--look well connected at all," said Clark.

That insight came in the form of a 1997 research paper from Dutch bioinformatician Stijn van Dongen. Bioinformatics is a branch of biological science that explores ways to store, analyze, and retrieve biological data.
Clark's team began using the clustering techniques that van Dongen had used to analyze proteins, as well as a software toolkit that the Dutch researcher had written and provided a free license to.

"Sure enough, when we ran our data through it, we could tell immediately that the results looked intuitively good. Then we put it up on our site, and people seem to like it," Clark said. A February 2012 blog post by Clark summarizes how Wayfair used van Dongen's techniques to build its recommendation engine. The post includes a series of four photos, each showing a series of lines and dots that represent clusters of proteins and their connections with one another.

"I don't know what that represents in the protein world, but in my world, it represents a connection between two items," said Clark.

The connections, for instance, could carry several different definitions when applied to an ecommerce site, such as two people who use the same item, or one person who bought two items in the same shopping cart. The thicker the line, the stronger the connection between two items.

Wayfair needed a way to weed out the less relevant connections. Customers "are surfing around our site, and we're trying to make useful lists of things they might want to buy," Clark said. "If we just say that everything is connected, that gives us too much data."

The Dutch researcher's mathematical process allowed Wayfair to remove the "wispy, tenuous connections that aren't as strong," and uncover clusters of things with strong enough connections to be useful to its customers, said Clark.

It's difficult to estimate the economic impact of the new technique, Clark said. However, a similar approach that Wayfair used for another recommendation system has increased customer click-through rate by 18%. "From where I sit in this business, that's a huge increase," said Clark.

It's unclear if van Dongen's clustering techniques and software toolkit would work for other ecommerce sites as well. Clark points to a quote in his blog post from Data Analysis with Open Source Tools, a book by software project consultant Philipp Janert, who states that only spam filtering, credit card fraud detection, and credit scoring applications have been effective across a wide range of usage scenarios.

As for customer recommendation engines: "The approaches that work tend to be quite ad hoc. I think it's still a very difficult problem to solve these things in a general way," said Clark.

See the future of business technology at Interop New York, Oct. 1-5. It's the best place to learn about next-generation technologies including cloud computing, BYOD, big data, and virtualization. Register by Friday, Sept. 28, to save 40% off on Interop New York Conference Passes with code WEYLBQNY09.

Welcome to
TechWeb, the IT professional's online resource for news coverage of the
information technology industry. We know technology news. Our mobile
and wireless news coverage moves as fast as wireless technology itself.
We follow all the devices you depend on to stay connected. Our software
coverage follows the multi-faceted software industry from every angle.
We've got a lock on network security and computer security issues.
We're all over the business of the Web--the Internet business--and the
engines that run it. We have our eyes and ears tuned to the players who
make and run the tools that tie us all together--Google, Microsoft,
eBay, Cisco, Yahoo, Oracle, Apple, Sony--and scores of others. And we
keep close tabs on the backbone of information technology, PC hardware.
We know PCs and Apple computers inside and out. We cover computer
technology, computer news, software news, search engine news, business
software, operating systems, and software development. Our coverage of
tech news includes a strong focus on the security business, its
attendant spyware and viruses, how security relates to wireless
technology and business networking and the security issues surrounding
RFID technology. We closely follow developments in Internet news and
Internet technology, including the spread of broadband and its effect
on Web browsers and the Web business. We watch the VoIP business, and
how VoIP technology is affecting the state of telephony in the
enterprise. And if all that isn't enough, we also track developments in
the IT industry that affect IT jobs, IT careers, and outsourcing.