Posted
by
kdawson
on Tuesday October 13, 2009 @11:28PM
from the see-yourself-as-others-see-you dept.

alphadogg writes "In an effort to promote the 'general health of the Web,' Google will send Webmasters snippets of malicious code in the hopes of getting infected Web sites cleaned up faster. The new information will appear as part of Google's Webmaster Tools, a suite of tools that provide data about a Web site, such as site visits. 'We understand the frustration of Webmasters whose sites have been compromised without their knowledge and who discover that their site has been flagged,' wrote Lucas Ballard on Google's online security blog. To Webmasters who are registered with Google, the company will send them an email notifying them of suspicious content along with a list of the affected pages. They'll also be able to see part of the malicious code." Another of the new Webmaster Tools is Fetch as Googlebot, which shows you a page as Google's crawler sees it. This should allow Webmasters to see malicious code that bad guys have hidden on their sites via "cloaking," among other benefits.

Google Spreadsheets can be abused to create phony login pages. Here's one for "Free Habbo credits" [google.com], designed to collect Habbo logins.
It's been reported via the usual "Google abuse" mechanism, repeatedly, and it's still up. It's been up since
October 28, 2008.

We track major domains being exploited by active phishing scams. [sitetruth.com] ("Major" here means only that it's in Open Directory, with about 1.5 million domains.) There are 39 exploited domains today. Only 7 have been on that list since 2008. The most abused site is Piczo.com, which is a hosting service/social network/shopping site for teenagers.

Just about everybody else has cleaned up their act. 18 months ago, that list had 174 entries, including Yahoo, eBay, Microsoft Live, and TinyURL. All those companies have become more aggressive about checking for phishing scams that were injected into their domain. Google's cluelessness in this area ought to be embarrassing to someone.

An ordinary scam (like the Habbo one listed above) is different from a phishing attack (which requires that the attacker impersonates another entity).

You have absolutely no hard evidence (other than your own experience and cynicism) that the site collecting Habbo logins isn't doing so for purely honest reasons and will only use them to deposit 500 credits in each account submitted.

This comes down to a matter of trust. If you trust random people on the Internet, you're going to get screwed over.

Google Spreadsheets can be abused to create phony login pages. Here's one for "Free Habbo credits", designed to collect Habbo logins. It's been reported via the usual "Google abuse" mechanism, repeatedly, and it's still up. It's been up since October 28, 2008.

We track major domains being exploited by active phishing scams. ("Major" here means only that it's in Open Directory, with about 1.5 million domains.) There are 39 exploited domains today. Only 7 have

It is my opinion that (Google is no more "secure" than any other website or corporation. Google is doing the same thing Sony does; they're just slapping their name on their new product and letting a bunch of people assume it's good because their name is on it. The only interesting thing mentioned in the article synopsis is the "Fetch as Googlebot" feature, because now when you search for a picture and Google lists some 4000x3000 photo that matches what you want and it turns out that was

Google finally fixed this. The offending page now reads "We're sorry. You can't access this spreadsheet because it is in violation of our Terms of service. If you feel this is in error, please contact us."

Sometimes you just have to use a big clue stick to get their attention. It took some help from The Register to get Yahoo, Microsoft, and eBay to clean up their acts.

Five more long-term exploited sites remain. A bit more nagging, and we'll have this cleaned up.

If Google's determination on whether a site has malicious content is based solely on crawling it, wouldn't a hacker be able to manipulate robots.txt to ignore the file with the malware? These tools would allow a hacker to test that theory out, by trying different things on his own sites and seeing what generates an email, instead of waiting around for Google to re-crawl them and having to check each one to see if it is filtered...

I think you are correct but it might be counter productive. GoogleBot obeys robots.txt so if the hacker listed the infected page in robots.txt google shouldn't ever request it. However, if you are a hacker and you have infected a page then I assume they want people to view it. Hiding the page from google probably lowers the number of visitors to an unacceptable low number.

Also, I think allot of infected pages are a result of SQL injection or simply dropping some cross-site scripting code into form fiel

Google would probably first try sending mail to the Google account that confirmed its control of the site.

If not, Google would just assume "localhost" is an error for whatever domain the site actually uses. For example, given webmaster@localhost at www.example.com, Google might look up the MX for www.example.com, not see it, look up the MX for example.com, and send mail.

If the site doesn't list such an address at all, there's an RFC that strongly recommends webmaster@example.com as the WWW technical co

You obviously have no idea about the early days of the internet and HTTP. The whole point of HTTP was to publish documents, if you host something you are implicitly allowing other people to fetch a copy of it.

robots.txt came about in the very early days of HTTP. An enterprising hacker wrote a crawler to index the whole internet (which wasn't that big at the time). But his crawler got stuck fetching pages from one machine with dynamically generated pages. This obviously tied up the band

Google is not playing police, they merely tell searchers it's a bad idea to go there. If you don't want others to link to you, don't go on the intarwebs. Also getting indexed by google is only possible if you sign up.

Yes it's terrible, you have to type in "User-agent: *\n Disallow / " I can feel you pain.

This happened to my site and the google webmaster tools were helpful but frustrating, it took 2 weeks of my site being banned in all major browsers before they officially sanctioned it OK. It did give me a list of all the URLS where there was problems, so it wasn't too hard to debug.

Another of the new Webmaster Tools is Fetch as Googlebot, which shows you a page as Google's crawler sees it.

Heh, could find some use outside of the designed purpose then... A number of pay-to-view web forums allow the Googlebot to freely navigate it, but requires payment from users. Among other boards, those involving erotica.:p

Alas, I think you can only view your own sites with the Googlebot... So unless you can sneak in the "yes, this domain is mine" HTML file or DNS entry, in which case you probably don't need to worry about this anyway, probably not a chance...;)

A number of pay-to-view web forums allow the Googlebot to freely navigate it, but requires payment from users. Among other boards, those involving erotica.

This sort of cloaking is frustrating even for people who aren't porn fans. A lot of scholarly journals spam search engine result pages with their cloaked, noarchived pages <cough>elsevier and springerlink</cough>. Even more frustrating is that Google provides no way for users 1. to exclude noarchived pages from its results or 2. to report sites that violate Google's stated cloaking policy.

A friend of mine works at Bluecoat ( http://www.bluecoat.com/ [bluecoat.com] if you care...) (they do internet security and filtering services). He says they regularly send reports to Google when they find that Google is compromised with malicious code... so its good to know Google's taking part in helping fix a problem they certainly deal with.

My site was once getting hit really hard from some other web site with a hole on their feedback page. I tried to email their webmaster but my message got flagged as spam. I guess including IP addresses, multiple links, phrases like "spam", "execute script", "spambot", and "exploit" aren't looked kindly upon by the internet powers that be. I just blocked any connections coming from their IP, but I wish I could have gotten through to shut down the security exploit.

Phishing types are already preparing false communications and false sites with such warnings "from google". There are certainly many mechanisms in existence to help authenticate that a communication is actually from google. Hopefully the use of such mechanisms is clever enough to avoid more contamination.

All the diagnosis information and messages are presented through the Google Webmaster Tools UI, not through email. There is an option in Webmaster Tools to forward messages [blogspot.com] to email, but this is opt-in.

You have a point though...there are lots of "from google" false emails floating around. As you know it's a tough problem to solve:/