How Machine Learning Drives Better Ad Performance

Long-Ji Lin is chief scientist at Rocket Fuel, a programmatic media-buying company that uses artificial intelligence to improve marketing. He leads a data mining team, and develops online ad targeting and optimization technologies.

Artificial Intelligence has built a substantial track record over the past few years: It is reliably good at doing things people once considered impossible.

It beats humans at Jeopardy. It creates music and art. It drives cars without human intervention. It helps diagnose diseases.

So we're surprised when some people publicly doubt that machine learning can drive better ad performance. It's like people who still think we never landed on the moon, despite the fact that it would have been much harder to fake the landing than it was to actually do it. What probably confuses people is the blinding speed at which machine learning happens. From the outside, it looks instantaneous and impossible. It's like watching a movie showing someone at the foot of Mt. Everest, and in the very next frame of film the same person is at the summit. "The average length of a human leg is about 32 inches", a skeptic might argue, "and the height of Mt. Everest is 29,029 feet. Come on Sheeple, do the math! It is physically impossible for a human to climb Everest in a single step!"

It IS impossible to get there in a single step, and the math is undeniable. But it ignores the fact that humans can climb to get there. In fact in 2012, in a single day, 234 climbers made it to the summit.

A Basic Primer on Machine Learning The core of any Artificial Intelligence is a set of data-driven algorithms based on sound statistics. The first step is to select from different predictive models, each of which is custom-engineered for different objectives. If the goal of an advertising campaign is to drive conversions (as opposed to clicks), then we build a model to predict conversion probability of any given impression. If the goal is to drive click-through conversions (as opposed to view-through conversions), then the model is to predict click-through conversion probability. If the campaign objective is to reach people in a certain demographic audience, then we build a model to predict the likelihood that a given user belongs to that demographic segment.

Those models may use the same input features, such as the website where ads will be shown, the content category of the site, the device being used, behavioral interests, and so on. Once the inputs and outputs are defined, the algorithms to build predictive models are well understood in the machine learning community, although many different modeling algorithms are available for data scientists to choose from.

What confuses people is the science-fiction notion that AI is some sort of mysterious all-knowing super-brain suspended in a black box that can instantaneously solve any problem you throw at it. It's not. You couldn't take Google's self-driving car algorithm and say "go solve this ad optimization problem" any more than you could tell a Rocket Fuel algorithm to go drive a car.

Each is a specialized kind of intelligence, engineered for a specialized kind of learning. Google's self-driving car algorithm "thinks" about how fast the cars around it are driving and where to turn next. Rocket Fuel's algorithms think about entirely different things:

What sites are better for driving conversions?

What behavioral interests are more predictive of conversions?

What times of day are people more likely to be influenced by ads?

How does this differ by device used?

Machine Learning Doesn't Need To Consider All Possible Data Points At Once

Rocket Fuel could deliver an advertiser's campaign on thousands of possible websites, 7 days of the week and 24 hours of the day, to prospects with thousands of different behavioral interests, and so on and so on - ad nauseam. "With billions of possible feature combinations", a skeptic might say, "a typical campaign will never get enough impressions for each combination, and therefore data-driven algorithms cannot work".

But this assumes that all data points or features are created equal, and that they all must be considered simultaneously. In reality, some features are more important than others, and our predictive models focus on those features first. As the campaign obtains more data points, then the model can focus on the secondary features and so on. Like human learning, we refine our learning as we go. An analogy is a baseball outfielder hearing the crack of the bat. He can't see the ball yet, but the sound is enough data for him to start running in the right direction. As he continues moving his brain is calculating the wind speed, how much of a factor the sun will be in seeing the ball, whether he risks colliding with the center fielder, et cetera, et cetera, until he determines the optimal glove placement to catch the ball. Our models work in a similar way: they consider tens of millions of features before converging to a million model coefficients. They learn as they go.

What's confusing to some observers who aren't yet experienced or trained in the science of machine learning is just how fast this machine learning occurs. To an outsider, it looks like impossible leaps are happening because the individual steps that make up the climb happen at such a blistering pace. Rocket Fuel's algorithms get a lot of opportunities to learn, and our algorithms learn more every millisecond.Wonder + Experimentation = Wisdom It's only natural to wonder whether something you haven't tried yet can work. Is it really possible? Can it really do everything it claims it can do?

This, in fact, is at the core of science. The difference between rhetoric and science is that science does not accept logical pro or con arguments, no matter how good they may sound. The list of cognitive biases that all humans share leaves that approach with far too much room for error.

Science works through experimentation and iteration: we work until we definitively prove - or disprove - that a given approach works.

"The only way of discovering the limits of the possible is to venture a little way past them into the impossible." - Arthur C. Clarke

We know that machine learning works. Our job - every day - is to learn new ways to make it work better.