View from the 'Xu

dataxu's thoughts on data, analytics and the industry

DataXu’s Six Stage R & D Pipeline Process

It was the desire to work at the cutting edge of technology which originally brought me to the U.S. In my first decade here, I broke ground in a number of exciting fields while working in corporate research for large companies. Yet I still felt something was missing. I produced many papers and patents, but only on rare occasions did I contribute to something that I could demo (i.e. something interactive). I all but gave up on large companies, and henceforth set my sights on startups.

At DataXu, where I am the VP of Optimization, I lead the teams that create products from ideas. We’ve backed up our ideas with many successes, but the way we go about testing and creating the ideas is the key. When I first began at DataXu, the data science team was tiny and things were simpler. We could only work on a few high priority projects at one time. We wrote most of the code ourselves. The stakes were lower for testing; we had far fewer customers, and they were spending less. But the stakes were higher for live products; if we didn’t provide value for our customers, we wouldn’t have a company! There was no time to waste on things that didn’t work. We had to place our bets and run with them.

As DataXu matured, we made the decision that data scientists would no longer be responsible for writing production code. This freed them up to focus on what they do best: digging into our customers’ problems and solving them using algorithms. And we decided to create a set process for creating products from ideas.

In this process, each idea travels through a lifecycle of “stages” as it progresses from idea to product. Each stage increases our confidence that the idea will become a viable product. We have six stages we move through: Idea, Lab Experiments, Small Live Tests, Large Live Tests, Product Requirements and Product Backlog. The diagram below shows a typical weekly report of projects that shows at a glance the status of projects and their order of priority.

Using this six stage process, our company becomes its own rich source of new ideas. Many ideas come from our product team members, who are always in close contact with customers. But twice a year, all DataXu employees are encouraged to pitch ideas to senior leadership during our official Innovation Day series, in which teams compete to have their ideas funded by the business.Our new products pipeline starts with Stage 0: Ideas. We typically have more ideas than we can build. Many come directly from the team. As a team, we spend about 10% of our time advising our campaign managers on the best way to setup, run and troubleshoot live campaigns. When we receive the same question dozens of times, or realize that there is a need for a tool that we know how to build, we enter it into the Ideas phase. We also read about our industry and constantly talk to engineers, business development team members, product managers and executives to get additional ideas.

The first “gate” that an idea must pass through before it officially becomes a project is for it to be presented in a weekly team meeting. Everyone is encouraged to pitch solutions and explain the business value that would be created by the proposed solution. These meetings are an open environment, and there is usually consensus if a particular idea is “sound.” If the idea doesn’t make the cut, I and the CTO are the official gatekeepers.

Once resources are allocated to the approved idea (although folks still spend spare moments on projects which did not end up “officially approved”), the next stage in the pipeline is to prove feasibility in the lab (preferably at scale) through Stage 1: Lab Experiments. As the idea progresses through each of the six stages of the pipeline, confidence increases that this is a viable product. Many projects die in the lab stage, but this is a great thing. It’s much better to kill something before too many resources are dedicated to live experiments.

Live experiments take place in the next stage. Ideally, live experiments occur by changing backend configuration, or through manual intervention by campaign managers, perhaps using reports from our internal dashboard. If not, we may need an engineer’s help to modify actual code. This is where having a dedicated data science engineering team is helpful. The engineers attend our team meetings and are aware of what’s coming down the pipeline, but we still try to make minimal changes in order to run a live experiment. Our aim is to fail fast (and inexpensively).

There are two stages in our pipeline bundled into running live experiments. The first stage looks at performance on a limited amount of data (Stage 2: Small Live Tests). For example, we may run a live experiment on only a few ad campaigns. The team has learned not to read too much into a negative result on one live experiment, because often the first version of an algorithm doesn’t work perfectly. However, with a few modifications, it usually does. The worst thing we can do is prematurely kill a project because it failed due to a small initial error. We make a habit of starting more live experiments than we need, because things often come up that derail at least some of the experiments.

The gate before we can leave the Small Live Tests stage of the pipeline is achieving success with three campaigns. We may have tried the algorithm in 10 campaigns, with three campaigns experiencing success and the other seven not seeing impact. What the three successful campaigns tell us, however, is that there are cases where this algorithm is useful. If a large number of team members feel the algorithm will indeed provide sufficient value at scale, we move to Stage 3: the Large Live Tests stage.

In the Large Live Tests stage, the DataXu team increasingly ramps up the new algorithm to prepare for running on many campaigns. Since this can get risky with prototype code, we ramp in stages, monitor vigilantly, limit to certain internal “low risk” campaigns and sometimes have engineers write more code.

Live experiments sometimes take weeks or months to produce results, since some customer metrics such as Cost Per Action take several days to measure. Whenever possible, we run randomized A/B tests in which some campaigns get the new algorithm and others do not. This allows us to compare results without regard to seasonality. The larger the A/B test, the less time we have to run it.

The exit criteria for the Large Live Tests stage is the ability to write product requirements (Stage 4). This means my team and I understand the specifications required in order to have the algorithm run unattended in our system. Sometimes, this requires no additional code. However other times, a particular algorithm requires a lot of additional work. Products with requirements enter Stage 5: Product Backlog, where they compete for resources with other new and existing projects.

The six stage process is highly effective for DataXu. However, we have also designed a system to measure success and to answer questions such as typical project velocity or if our team is providing value or just spinning their wheels. For velocity, we invented a metric that tracks “stage changes per week” divided by “number of team members”. This rewards the progress of ideas as they advance forward through the six stages. It is important to note that killing a project in say, Stage 3, counts as a stage change, since it saves company resources from continuing to work on a non-viable idea. Below is our velocity chart for the past several years. There are ups and downs, but overall I am proud to say that we have achieved quite steady velocity.

To close the loop on our six step process, we track the value of how our products help advance the corporate KPIs of DataXu. Sometimes we can link products created by the Optimization team directly to increased revenue. Other times, our products simply assist with efficiency. Either way, we keep tracking value even after the product goes live. Since our industry is constantly evolving, some products may become less effective as time goes on. Other initially small contributors may grow in importance over time as the industry shifts and develops.

I love working at DataXu, which is still a relatively small company at around 360 employees (although we’re growing fast!). There is a lot of freedom here to create and change processes for the common good, and I love working in a constantly changing industry in which there are always new problems waiting to be solved through technology. Perhaps by sharing our way of moving ideas from thoughts to actual products, I might be able to indirectly help solve problems for you or your customers, too.