AdWords Spam Fighting Methods with Google’s David Baker

Key Points

In 2011 Google had billions of ads that were submitted. Of these, roughly 130 million ads and 800,000 advertiser accounts were disapproved/suspended for violations of policy.

Some of the spam problem is handled algorithmically while some is assisted with manual review. The system is constantly under work and revision.

One of the biggest areas of bad ads is counterfeit. In 2011, of the 800,000 advertisers, 150,000 were for counterfeit violations.

Of the counterfeit removals, 95% were proactive measures that Google has in place and 5% were from user complaints.

Economic downturn gives way to new scams, as people are looking to save money and thus are easily tricked into deceptive billing practices or schemes.

Bad ads are attacked with a 3 pronged approach:

Looking for bad ad text or landing pages.

Looking at sites in an industry, agnostic of which advertiser is advertising the site, to see if there is a whole class of policy that is being violated.

Looking at individual advertising accounts (all of the ads and everything they advertise)

The account review portion has a Risk Model where Google predicts the risk of a certain account to violate policy based on account history and a variety of account signals.

Many complaints are from users reporting their competitors after they have been banned.

Full Interview Transcript

Eric Enge: Hi, I'm Eric Enge. I'm the CEO and founder of Stone Temple Consulting, an internet marking optimization firm that does SEO, pay per click, social media, and lots of other wonderful stuff.

And I'm here today with David Baker from Google. David, can you introduce yourself, please?

David Baker: Sure. My name's David Baker. I'm a director of engineering with Google and I've been working with Google going on eight years now.

Eric Enge: Awesome. Thank you. And we're going to talk about some fun stuff today. We're going to talk about the spam police, but it's not the organic spam police. It's the ad spam police.

That's probably not the term you use internally for it, but I'm going on the fly here. Sounds pretty good, doesn't it?

David Baker: Yes, indeed.

Eric Enge: OK. Great.

So, I think a great place to start is you had shared with me some great metrics about, you know, some of the things that sort of characterize how you guys are attacking the problem with bad advertisers.

You've talked about how many advertisers you've shut down; things like that. Can you talk about some of that?

David Baker: Absolutely. So, you know, the numbers bring to life just how much a challenge of scale it is that we have here at Google. And in 2011 we had billions of ads that were submitted to us, and of those, 130 million were disapproved.

And the number of advertisers that we shut down for policy reasons was 800,000 advertisers.

Eric Enge: Wow.

Yeah. And I think there were a lot of accounts that were shut down as well, isn't that right?

David Baker: Yeah. So, then when I say “suspended advertiser,” that is shutting down an account. So, the number of accounts was 800,000 last year that we shut down.

Eric Enge: That's amazing. And 130 million ads; that's a significant percentage of the total. I mean, it's not 20 percent but it's, you know, it's several percentage points, it sounds like.

David Baker: Yeah, you know, 130 million of billions is a large number. It's a lot of policy enforcement that requires a lot of hard work; a lot of good engineering and a lot of good people behind it.

Eric Enge: Yeah, and I saw that, I think you have hundreds of Googlers working in this area?

David Baker: That's correct. So, you know, we spend millions of dollars building systems and applications and the staffing behind this is hundreds of Googlers. You know, both engineering and specialists who are able to enforce policies but, you know, manually looking at the edge cases when our; the systems that I build and my team builds, when we don't have a high degree of confidence in a result, we send it to a real person to make a decision.

And that human feedback is essential to all the technology that I and my team built. We learn from the things that real people tell us, specialists tell us, about what's good and what's bad.

Eric Enge: Right. So, it sounds like some percentage of the problem is addressed purely algorithmically and then some falls into the class of what I'll call algorithmically-assisted manual review. Or algorithmically-inspired manual review might be a better way to put it.

So, you kind of flag it, you say, “OK, we're not quite sure but this looks like it might be a problem. Let's have someone look at it.”

David Baker: That's correct. That's correct. And, you know, it's; we believe that we've got an industry-leading solution here, but it is also something that is constantly under work and under revision.

The people that are trying to do bad things to our end users or to violate our policies are not giving up and I doubt will ever give up. And so it's a constant battle and arms race where we are constantly revising our systems, building new components, or rebuilding existing components, trying to make them better.

And one of the key things is, you know, making sure that we've got good metrics to know how we are doing and whether or not we are doing better.

Eric Enge: Right. And can you classify for me some of the kinds of spam that you end up seeing and dealing with?

David Baker: I think one of the biggest areas of the bad ads that we see is counterfeit. It's a very lucrative market where a well-known brand can be established and is known for having quality goods, but people can create very cheap knockoffs and trick unsuspecting users into thinking they're getting a really good deal, when, in fact, they're not.

And so, with counterfeit, we see various brands from clothing to electronic goods, being counterfeited and various attackers trying to get these ads in front of users. And that one is, you know, a big area that we see lots of people trying to make money and we work very hard to shut them down.

I believe that in 2011, of the 800,000 advertisers that we shut down, 150,000 of those were for counterfeit. Ninety-five percent of that was caught by our proactive systems; the other 5 percent from complaints from the outside.

Eric Enge: So that's a case like someone makes a cheap knockoff of some Nike sneakers and they put an ad up which makes it sound like they're either advertising real Nike products or that they're Nike themselves or. . .

David Baker: Yeah, that's correct. And counterfeit is a particularly tricky space because the attacker is trying to fool not just Google but also the user. And they're working very hard to make it look like they are a legitimate business.

The same techniques that they would use to fool the user make it difficult for Google to distinguish between a real good and a counterfeit good. But, you know, we've done a pretty darn good job.

Eric Enge: And are there other classes of spam that happens in the ad world beyond the counterfeit ads?

David Baker: Yes. Illegal goods are some of them. Scams that try to trick users into deceptive billing practices. A variety of different things.

My observation is that the problem has gotten much worse over the past few years, correlating with an economic downturn and that scams and the allure of something free or cheap to get is much more attractive to people who are hurting for money.

And so we've had to redouble our efforts over the past few years in order to progressively and dramatically improve the situation.

Eric Enge: And you referred to it as an arm race, I think, in a recent post, you did.

David Baker: Yes. It never ceases to amaze me how this is a constant battle that we're just never going to be satisfied with how good we're doing and how persistent the attackers are.

And, you know, we have reasonable data to infer that they are paying very close attention to what we're doing. When we adopt new protections, they adapt. And one of things that, being steeped in the work that we do, that you learn very quickly, is that simple rules, simple solutions just don't work. Because simple solutions are brittle, the attackers adapt very quickly and it requires a more sophisticated, nuanced approach and multiple approaches applied at once in order to solve this problem.

Eric Enge: Right. And can you describe to me the new risk model that you spoke about in your recent post?

David Baker: Sure. So, we attack the problem of bad ads using a three-pronged approach. One is; and the different dimensions are looking at the ads themselves; the specific ad text and the specific landing page that the ad goes to.

The second dimension is looking at sites agnostic of which advertiser is advertising the site, we aggregate all sites that are advertised and inspect the sites to see if there is a whole class of policy that applies to the entire site.

And the third is looking at individual advertiser accounts. Looking at all of their ads, all of the sites that they advertise. And the risk model that we talked about was addressing that third approach. The account review, the third prong, is oftentimes called a risk model because we're kind of trying to predict the risk of this account to violate our Google policies.

And so the risk model looks at a variety of different signals, constantly evaluating advertiser accounts, and when we get a new piece of information like a new keyword from an advertiser or an advertiser logs in from a new IP address, we reevaluate the riskiness of that account.

And we're able to calculate, you know, what is the percentage of bad ads on a particular keyword, or what is the percentage of bad ads on a particular; from advertisers who came from a particular IP address.

And when you send all of these signals into a machine learning model, we're able to come up with very good predictions on whether or not this advertiser is somebody that we should take a very close look at or is somebody who is pretty good.

Eric Enge: Right. That sounds great. One last thread. When you get a manual complaint from someone: I saw some metrics on how quickly you respond to that and talk also a little bit about how you respond.

David Baker: Sure. I'm not sure of the turnaround time guarantees that we have for complaints. But we look at every single complaint. I myself take a look at individual complaints (not all of them), but a lot of complaints to see what's going on.

But the way that we handle is it goes into a queue. We have a staff of people who take a look at the complaint, try to identify which advertiser is this impacting, and see whether or not they are violating policy.

It's; an interesting anecdote is that we get a lot of complaints from people who have been shut down for violating policies. It's interesting how they sometimes rat out themselves. And that can be helpful.

But we also get some very useful complaints from end users and other outside agencies.

Eric Enge: It's kind of interesting. So, if I get torched then I'm gonna get vengeance on everybody else? Is that kind of what's going on when that person who's been banned is complaining about others?

David Baker: Well, you know, we treat complaints from people who have been banned with no more credibility than we would any other. And a lot of times their complaints aren't legitimate.

But these counterfeiters and other attackers know their space and know their competitors; they're competing with each other. A lot times they do rat out each other when it comes down to it.

Eric Enge: Well, thanks a lot, David. I appreciate your taking the time to speak with me today. Some great information. And thanks much.

David Baker: Thank you very much, Eric, and I hope we get to talk some time in the future.

Eric Enge: I imagine we will.

David Baker: Great.

About David

David Baker is an engineer at Google and has been working for Google for 8 years now. In this interview, he discusses the constant efforts by Google to protect their search users from deceptive advertising.