Google preparing to police web

May 08, 2007

Increasingly worried by the use of conventional web sites to distribute the viruses that turn innocent PCs into botnet "zombies," Google appears to be readying a plan to police the web. If the plan goes forward, Google will use new software to automatically identify compromised web pages in its database and label them as "potentially harmful" in its search results. Because being labeled as suspicious by Google could devastate a site's traffic, the move would raise the security stakes for site owners dramatically.

Google security specialist Niels Provos tells New Scientist, "The firewall is dead." He's referring to a shift in the way botnet infections are spread - and it's this shift that's making Google particularly nervous. Botnet viruses used to be distributed mainly through email attachments or computer worms, both of which could be blocked by firewalls or sniffed out by antivirus software. Over the past year, however, the operators of botnets have shifted to using regular web sites to distribute their malware. Reports New Scientist:

As users have grown wary of email attachments and installed firewalls and anti-virus software, however, the bad guys have shifted their attentions to websites in a bid to find more victims ... Even an ordinary website can be risky. At a meeting on botnets held last month in Cambridge, Massachusetts, Provos warned that many web users are becoming the victims of "drive-by" downloads of bots from innocent websites corrupted to exploit browser vulnerabilities. As firewalls allow free passage to code or programs downloaded through the browser, the bot is able to install itself on the PC. Anti-virus software kicks in at this point, but some bots avoid detection by immediately disabling it.

A recent Google study, led by Provos, discovered "around 450,000 web pages that launched drive-by downloads of malicious programs. Another 700,000 pages launched downloads of suspicious software. More than two-thirds of the malicious programs identified were those that infected computers with bot software or programs that collected data on banking transactions and emailed it to a temporary email account."

Anything that makes people wary of visiting web sites or clicking on links stands as a big threat to Google's business. It's not surprising, then, that the company has a unit investigating the dissemination of malware through the web. The paper that Provos and four of his Google colleagues have written on the subject, The Ghost in the Browser, explains how Google is preparing to respond to the threat by incorporating an automated security analysis into its routine spidering and indexing of sites:

To address this problem and to protect users from being infected while browsing the web, we have started an effort to identify all web pages on the Internet that could potentially be malicious. Google already crawls billions of web pages on the Internet. We apply simple heuristics to the crawled pages repository to determine which pages attempt to exploit web browsers. The heuristics reduce the number of URLs we subject to further processing significantly. The pages classified as potentially malicious are used as input to instrumented browser instances running under virtual machines. Our goal is to observe the malware behavior when visiting malicious URLs and discover if malware binaries are being downloaded as a result of visiting a URL. Web sites that have been identified as malicious, using our verification procedure, are labeled as potentially harmful when returned as a search result. Marking pages with a label allows users to avoid exposure to such sites and results in fewer users being infected.

The authors note that Web 2.0 trends, including the incorporation of user-generated content and third-party widgets into sites, raise the risk of innocent sites being exploited by botnet masters. For example, they write:

Many websites feature web applications that allow visitors to contribute their own content. This is often in the form of blogs, profiles, comments, or reviews. Web applications usually support only a limited subset of the hypertext markup language, but in some cases poor sanitization or checking allows users to post or insert arbitrary HTML into web pages. If the inserted HTML contains an exploit, all visitors of the posts or profile pages are exposed to the attack. Taking advantage of poor sanitization becomes even easier if the site permits anonymous posts, since all visitors are allowed to insert arbitrary HTML.

The paper goes into considerable detail about the system Google is building for identifying suspicious pages. Given the stakes involved, site owners and designers may want to give it a careful read.

UPDATE: As noted in a comment to this post by Google's Matt Cutts, the company's anti-malware program is actually already under way.

Posted by nick at May 8, 2007 12:06 PM

Comments

Sigh. Google is turning into a censoring, paternalistic, grand-master of the web. So Google gets to play god as to what a good site is versus a bad site. What if I don't want Google to police the web? Who elected them sheriff? I didn't.

Call it another step on google's path to getting sued for anti-competitive behaviors (a subject near and dear to my heart - http://www.sawickipedia.com/blog/2007/04/15/how-to-become-a-monopolist-the-google-story/).

I think this is a great idea. While Google certainly isn't obligated to do this sort of scanning and testing, it's in their best interest, because it's in our best interest.

What I'd really like to see is Google taking these sites out of their index completely until they're cleaned up (that is, if the site owner is a victim rather than an accomplice). That would take care of the intentionally malicious sites, because I'm guessing they get most of their traffic from Google search results. And, it would give a second chance to sites unintentionally/unknowingly hosting malware.

It doesn't sound like they're going this far though, which is a shame.

I've always said that google is just a user-friendly DNS. It's a text box, you type something, stuff comes up in the browser. It's probably inevitable that they start behaving a little ICANN-ish. Might not be a bad thing either...

double sign. i don't need someone else to protect me. i can do it fine by myself - that's my point. And what happens as google has already done to sites that it mislabels that it banishes unnecessarily - where's the due process?

I agree that I have a choice as a consumer today and I use google less and less for that reason. However if/when google achieves 80-90% marketshare competitors might live the market leaving me stuck with a sheriff I didn't elect. In most states sheriff's are elected.

It's not often I sign up to comment but this news was too much. There are so many issues with the idea that Google will pro-actively filter content. Unless it's exceptionally clever it is going to make mistakes. Sophisticated legal systems make mistakes and, as tas points out, there is due process and it that fails, there's an appeals process. Have tried appealing to Google? The words "Ears", "Fall", "Deaf" and "on" come to mind.

Google is a global phenomenon. In the US I can imagine that the pro-active legal system will occasionally kick-in to give Google the occasional slap-on-the-wrist. But what about in other countries with less energetic legal eagles and more compliant populations? I live in the UK and I can't imagine anyone being successful challenging Google. It takes the weight of a government to encourage Google to change its policies.

I have a company that produces software. We make a trial version available for download. Interested parties click on a link to download it. How does Google automatically distinguish between a link to genuine software and software there for the purposes of a drive-by-download?

Larger companies will be able to respond to this by sending a legal team around to Google head office to ensure downloads from their sites are not affected. But what about the millions of little guys? What would this kind unintentional filtering do for competition if the only choices you have a Big Corp and Bigger Corp?

Google is not a Sheriff. It is not elected nor has it been appointed. I don't accept that by clicking on links you are effectively electing them. Where are the ballot notices? Where are election monitors?

Posted by: Bill Seddon at May 9, 2007 04:10 AM

I think some people here are getting a little ahead of themselves and reading too much into this. The article above clearly states:

"The pages classified as potentially malicious are used as input to instrumented browser instances running under virtual machines. Our goal is to observe the malware behavior when visiting malicious URLs and discover if malware binaries are being downloaded as a result of visiting a URL. Web sites that have been identified as malicious, using our verification procedure, are labeled as potentially harmful when returned as a search result."

Google are NOT removing malicious sites from their results. They are initially going to create a catalogue of suspected malicious sites. Then, from that catalogue, they will run tests to see if they are indeed malicious. It is these sites that will be marked as being potentially unsafe in the search results.

I personally think this is a good idea, if you don't agree with google then there are plenty of other search engines out there that will do just as good a job as google.

Posted by: Insomniac at May 9, 2007 04:50 AM

I think there are two views you should take in account. The first is that of people who search trough Google. The second are sites that want to be found.

For the users there is a easy option of choosing another search engine if you don't like Google's approach.

On the other hand sites don't have that luxury. They need to be on Google else they will miss out on visitors. They don't have a choice (besides the obvious of not being found at all).

Posted by: highstrung at May 9, 2007 07:24 AM

Everybody seems to be forgetting that despite Google's god-like presence it is still a private entity and can do pretty much whatever it wants to with it's own products without needing a mandate from the masses. You don't like it - go use Yahoo.

Posted by: greyish at May 9, 2007 09:41 AM

Leaving the right or wrong of Google's newest initiative aside for now, it seems to me that they might want to reconsider this tactic just out of a sense of self-preservation.

Maybe it's just me, and I have to admit that I don't know that much about the spread or use of botnets, but it seems like they're taking a shot at a community that's been known to indulge in scorched earth tactics before when they thought their interests were threatened. Google may be able to impact the creation of new botnets in this manner but, based on articles I've read, it seems like there's already some pretty heavy hitters out there that may decide to fire back.

Posted by: cbauman at May 9, 2007 09:55 AM

If they have it report an error in the webmaster tools, I think this would be great. What's worse, Google warning the user before hand, or your users getting malware?

I've seen some sites this has happened to.. The owner generally has no idea it's even there. The people who would even care about being labeled on Google are the people who would know how to prevent the malware from getting there in the first place..

I'm pretty sure Google is talking about malware, and not just downloads like mentioned above (so I read comments after posting mine).

Malware is pretty easy to identify, because it isn't a link to download, its something that tries to download without you knowing. It's usually hidden behind something that looks like a counter, has a bunch of hidden iframes, and has obfuscated javascript.

With all due respect, this is old news. Based on Google's historical behavior and general internet user trends over the last ten years, I'm willing to bet that malware is most of the reason that Google exists right now. If malware wasn't Google's focus from the beginning, cloaked content, weird redirects, and doorway pages would still be acceptable. Irrelevant results wouldn't be as much an issue if so many of them weren't especially dangerous. Google realized early on that in order to build brand loyalty they would have to make their search results the safest, and they accomplished this goal. A site with no malware which incurs a Google ban is likely to be collateral damage rather than "public enemy #1". Yeah it sucks but you have to take some responsibility for your own sites. Remove the malware or anything that looks like malware. If you specialize in legitimate downloads, you probably have nothing to worry about. Google's probably looking for something remotely executable on a vulnerable Windows box. If you're still freaked out, copy whatever SourceForge is doing.

This is no different from any other SEO concern. Either give Google exactly what it wants, or focus on improving your spam. Either option beats complaining about how mean Google is.

Just my 2 cents.

Posted by: STRZA at May 9, 2007 11:10 AM

I think overall it may play out to be a good move, as there is just SO many sites out there dropping code, sending out SPAM (who really reads all the crazy spam offers?).

If anyone can pull it off, it will be Google. If they're able to convert and do it without collateral damage to legitimate sites of course only time will tell. I think based on what I've read though, it's a feasible scenario. Identifying sites and testing manually to see what code gets dropped will certainly be a painstaking process, but there are definitely people out there willing/needing to work - and it can only help to better the internet for everyone.

I don't think anyone can really argue that wanting to get rid of malware, spyware and otherwise harmful code that is dropped on your PC without your consent is a bad thing. I think people immediately want to jump the gun and extrapolate that out to "what's next", they're going to censor my e-mail, my news, etc. No one is suggesting that. Let's embrace the premise, allow them to flush it out, and then form an opinion.

Nick, I normally love your posts, but your headline ("Google preparing to police web") isn't very accurate, because we've been tackling malware for quite a while. Here's some historical context.

Almost exactly a year ago, Google and other search engines were raked over the coals for exactly the opposite reason: allowing users to get infected with malware from search engine results. See
http://www.mattcutts.com/blog/siteadvisor-study/
for more background. At the time, we were already anticipating the issue and had added "Don’t create pages that install viruses, trojans, or other badware." to our webmaster guidelines.

Google's response when we believe malware is detected is to warn the user via an interstitial when they click on a search result that might infect their computer. See
http://www.mattcutts.com/blog/info-about-malware-warnings-and-how-to-appeal-them/
for an example post about this process and how to appeal it if you have removed the malware or believe there was an error.

Users liked the malware protection a lot, so we added some unobtrusive annotation to listings for sites that could potentially infect a machine. See
http://googlesystem.blogspot.com/2007/02/google-flags-pages-that-install.html
for more info.

Of course, it's important to help regular webmasters who might have been hacked and not even known that they were infecting their users. To that effect, we added sample urls with suspected malware to our webmaster console. See
http://www.mattcutts.com/blog/got-malware-google-will-help-you-find-it/
for more details.

I've highlighted Niels Provos' fantastic work on my blog before, but Provos also provides free tools at http://www.spybye.org/ to help webmasters scan their own sites for malware.

All in all, I think Google does a pretty good job of protecting users from getting infected, while at the same time providing tools that assist webmasters in detecting and correcting hacked urls that could spread malware. Certainly I think we provide more notice to users about potential malware urls, and we provide more info to webmasters about potentially hacked urls. So I think Google's response to this issue balances the needs of users and webmasters pretty well.

It takes the weight of a government to encourage Google to change its policies.

I have a different opinion on the matter. A government, as an agent of force, has to be bound by objective laws and limitations. Otherwise, any action deemed by a mob as appropriate can be carried out and rights become a secondary issue.

Google as know it exists solely because its creators had a vision and carried it out. It is their property (and the property of their shareholders.) We have no right to demand government crackdowns or anti-trust lawsuits against them for the "crime" of making their search engine safer.

Posted by: Jay at May 9, 2007 01:57 PM

"It takes the weight of a government to encourage Google to change its policies."

Bill Seddon, I've seen Google modify its policy on this particular issue in response to a single person's blog post. After someone complained that Google made it too hard to know which urls on a site were believed to be infected, Google worked hard to add example urls to our webmaster console. That helps a webmaster quickly drill down to exact pages that appear to be hosting malware.

So I have to say that I'd disagree. We have responded on this specific issue to help webmasters more easily identify potential malware pages, and that in response to feedback on the web.

What do I think about it ... well ... it's basically a good idea however
It's not completely new as McAfee (SiteAdvisor) and TrendMicro (TrendProtect) for instance are already using this approach more or less.
I posted today some other reactions as well at my anti-malware blog which you can find at
http://www.anti-malware.info/weblog

The links were displayed as "sponsored links" after visitors entered specific queries into Google's search service. Clicking the links would ultimately go to a legitimate site but by way of another site that attempted a "drive-by installation" of password-stealing software. Miscreants placed the links using Google's AdWords service for advertisers."

I'm glad to see Matt Cutts step in here to try and explain Google's position on this, but I fear that many people either don't get it or are haveing a knee-jerk reaction. I'm certainly no fan of censorship, and I think Google has been a little short of the mark on more than one occassion with regard to its "don't be evil" motto. But I find it pretty hard to fault Google in this case. Google is not "policing" anything here. All they are doing is presenting search users with some additional information regarding websites that appear in the search results. A little blurb that informs search users that following a link will lead them to a site that has tested positive for malware hardly counts as censorship.

I've been offline for a couple of days, so missed a lot of the comments. Thanks to all, particularly Matt Cutts for reviewing Google's approaches to warning people away from malware.

My post referred in particular to the paper from the Google researchers that was presented last month, which laid out a troubling scenario of a widespread hijacking of innocent sites by botnets and Google's development of a system to identify such sites automatically and label them as suspicious in search results. Given that the paper says hundreds of thousands of pages may be affected, this seems like a fairly large-scale effort. I can't tell from Matt's comment whether the specific system described in the paper is now operational or whether it is still in development (as I thought the paper indicated).

Given the threat posed by botnets (and I'm assuming reports of their spread and damaging effects are accurate), I understand Google's desire to counter them, and I think that Google may well, at this point, be the only organization that has the scale, knowledge, and influence over web users (and site owners) to take on botnets in a meaningful way. At a practical level, then, I applaud the effort. At the same time, it does raise tricky issues about the extent to which a single for-profit company can exert unilateral control over the contents of the web - making the rules, policing sites (I don't think "policing" is too strong a word here), and doling out punishment. Even if we assume that Google is acting in everyone's best interest, the company's power over what is emerging as the world's central informational and commercial infrastructure should give us pause.

Posted by: Nick Carr at May 10, 2007 10:38 PM

Sigh. Google is turning into a censoring, paternalistic, grand-master of the web. So Google gets to play god as to what a good site is versus a bad site. What if I don't want Google to police the web? Who elected them sheriff? I didn't.

Don't like it? Don't use it. It's that simple. Go use a "superior" service like MSN or Yahoo. Don't complain about free.

Posted by: abandonedhero at May 11, 2007 01:22 AM

Its funny, I once wrote a article about online security, where I suggested we take a lesson from a great old Sci-Fi movie.
In "The day the Earth stood still", Robots patrolled the universe looking for bad guys and preventing them from doing harm.
Did Google steal my idea?