The Velocity of eBay

SAN DIEGO - Of all the lessons and presentations offered at last weeks well-attended conference held by The Data Warehouse Institute, none generated more buzz than TDWIs Executive Summit session with Oliver Ratzesberger, senior director, Architecture & Operations at eBay Inc.

What Ratzesberger offered the audience was a glimpse at the future, reflected through the internal requirements of eBay, a corporation that does not allow six, 12 or 18 month data projects or dwell excessively on what has gone before. What dictates this policy is purely a reflection of velocity at the massive online reseller: an automobile sold every minute, a diamond ring sold every two minutes, more than three watches and five womens handbags sold every minute, on and on across 50,000 categories of goods.

As data architects know, theres scale, and then theres scale. At eBay, there is yet another dimension. No less than 5,000 business users and analysts turn over a terabyte of data every eight seconds. eBay inputs 40 terabytes of new incremental data every day and processes 25 petabytes in the same 24 hours. This happens all day, every day, each the sum of millions of queries parsed with more than 99.9 percent availability in near real time.

This necessarily precludes some traditional data warehouse practices (which well get to), yet eBay is most certainly an analytics driven business, considering that it hosts approximately 113 million listings worldwide at any given time and adds 6.7 million per day.

Analytics are in our DNA from the bottom-up and from the top down, Ratzesberger told the audience. So are KPIs, which he described as the metrics used to measure teams and individuals and tie to compensation.

Fair enough, but at eBay, KPIs also roll up into trees and bigger trees full of subtrees that look at multiple organizational metrics of visitors, engagement, buyer and seller retention as well as the various formats available to auctioneers. They are meant to align individual and departmental performance objectives with corporate goals. The comprehensiveness of this would certainly be a topic all by itself.

Consider eBays technology operations, where a specific KPI measures the efficiency of distributing large workloads over pools of tens of thousands of servers. A simplified KPI of parallel efficiency states that, while 100 percent efficiency is good, less than 70 percent efficiency is bad. By raising parallel efficiency from 50 percent to 80 percent, eBay can realize millions of dollars saved in operational spending.

Beyond standard measures, Ratzesberger offered up a dozen kinds of analytics as just a sample of an attitude that is open to measuring pretty much everything possible. Eighty-five percent of eBays analytical workload is new and unknown, meaning that exploration is at the core of its delivery philosophy.

The metrics you know are cheap, he said. The metrics you dont know are expensive but also high in potential ROI. As a result, design cant be static or dependent on specific questions or dimensions. It calls for a decentralized model that doesnt hinge on project TCO, the multiple databases, inconsistencies, complexities and redundancies of data marts. At eBay, says Ratzesberger, A data mart cannot be cheap enough to justify its existence.

Take that uh, everybody. eBay thinks differently if for no other reason than it must and has summoned the means to do so.

The alternative lies in massive scale analytical utility computing across thousands of boxes where users and analysts bring their own data and perform their own analytics, a prototyping environment or sandbox accessed through a Web portal.

The goals include improved time to market (days vs. months) through quick and agile prototyping that allows users to fail fast and make it easy to try new ideas without the burden of long timelines and dedicated programs. Even the mantra of data quality does not precede inquiry at eBay, since bad data is assumed and can be dealt with as the project is delivered. It is an attitude that seems to say lets move on, which is the general impression I got of life at eBay.

In the absence of stray data marts, which are virtually gone, agile analytics as a service involves a rather simple Web upload combining custom data and code with fully private utility access to an endless backbone of infrastructure. More than 50 prototyping environments are active at a given time, each assigned a lifespan of 90 days, and in most cases they are small (less than 100GB). Because all of the main data already resides in the enterprise data warehouse, prototyping environments are offered at no cost to the business units.

Across the audience I noticed more nodding than shaking of heads. eBay operates at a different scale and faces different challenges than a bank or a manufacturer does, butI spoke with TDWI attendees who saw the presentation as one example of a long-term trend.

Because weve been writing about massive grid computing, utility models and outsourcing, I was approached by a few people after the session who asked about the commercial implications of what eBay is doing internally. Frankly, Ive had little success so far translating the plans of what potential grid players including eBay might bring to market.

When I asked the same questions, Ratzesberger would have none of it, though hes heard them again and again. For anything and everything Ratzesberger was showing us, he wasnt selling a thing. But in sharing a day in the life at eBay, he gave many of us a glimpse at large scale data processing, and perhaps an alternate reality.