Archives for April 2006

Aha. Gord Hotchkiss has joined that rollicking scrum that we call the blogosphere. 🙂 Go Mattdot his new site, Out of my Gord. You might not have seen that Bruce Clay, Inc. has a blog either. Let’s see, what else? Performancing is on my RSS reading list too.

See, this is why everyone’s blog should have times down to the milli/micro/pico/femto-second, so you can tell who really noticed something first. I’ve been hanging around with SEOs too long, because my brain immediately goes to the “optimizing your clock skew so it looks like you posted something first” angle. Or maybe I’m just sleepy. 🙂

If you don’t want to read the full post, the executive summary is that Google’s Webspam team is working with our Sitemaps team to alert some (but not all) site owners of penalties for their site. In my world (webmasters), this is both a Big Deal and a Good Thing, even though it’s still an experiment. Sign up for Sitemaps to try it out. Oh, and the Sitemaps team introduced a bunch more new and helpful features too. Check out the Sitemaps Blog for more info. 🙂

The responsibility of picking “Don’t be evil” as an informal motto is that everybody compares Google against perfection, not to our competitors. That’s mostly a good thing, because it keeps us working hard and thinking how we would tackle each issue in the best possible way. Lately, I’ve been thinking a lot about how the ideal search engine would communicate with webmasters.

There’s a Laurie Anderson song called “The Dream Before” based on a quote by Walter Benjamin. Part of it goes

History is an angel being blown backwards into the future.
History is a pile of debris,
and the angel wants to go back and fix things, to repair things that have been broken.

But there is a storm blowing from Paradise, and this storm keeps blowing the angel backwards into the future.
And this storm is called Progress.

In the early days when Google had 200-300 people there was no way we could do everything we wanted to do. But as Google grows, we get more of a chance to “go back and fix things,” to build the ideal search engine. And part of doing that is having more and better communication with webmasters.

I believe the ideal search engine would help site owners debug and diagnose crawl problems, and the Sitemaps team has made great strides with that in Google’s webmaster console. But I think the ideal search engine would also tell legitimate site owners when they risk not doing well in Google.

For example, I recently saw a small pub in England that had hidden text on its page. That could result in the site being removed from Google, because our users get angry when they click on a search result and discover hidden text–even if the hidden text wasn’t what caused the site to be returned in Google’s results. In this case it was a particular shame, because the hidden text was the menu that the pub offered. That’s exactly the sort of text that a user would like to see on the web site; making the text visible would have made the site more useful.

That’s an example of a legitimate site. On the other hand, if the webspam team detects a spammer that is creating dozens or hundreds of sites with doorway pages followed by a sneaky redirect, there’s no reason that we’d want the spammer to realize that we’d caught those pages. So Google clearly shouldn’t contact every site that is penalized–it would tip off spammers that they’d been caught, and then the spammers would start over and try to be sneakier next time.

The way that we’ve been tackling better communication over the last few months is by testing a program where we try to email some penalized sites that we believe are legitimate. The issue is that it can be hard to contact a site by email: some sites don’t give any way to contact them, and some sites don’t receive/read/respond to the emails that we send. Overall, the experiment has been very successful, but email has definite limitations.

The Webspam team and the Sitemaps team have been working together for several months on a new approach: we are now alerting some sites that they have penalties via the webmaster console in Sitemaps. For example, if you verify your site in Sitemaps and then are penalized by the webspam team for hidden text on your pages, we may explicitly confirm a penalty and offer you a reinclusion request specifically for that site.

I’m really happy about this new way to communicate with webmasters, even though it is a test for now. If the initial results are positive, I wouldn’t be surprised to see us gradually broaden this program.

Here’s some questions from a webmaster perspective:

Q: Are you going to show every penalty for a site in the webmaster console?
A: No. Our program to alert webmasters by email has been successful, and this new program is a natural extension of that, but we’re still testing it. We are not confirming every site that is penalized for now, and I don’t expect us to in the future.

Q: I don’t understand why you wouldn’t show every single penalty to every single site owner that asks?
A: Let me give you a couple examples to illustrate why. First, let’s take an example of a site that we would like to confirm a penalty for. Check out this site:

This is a small hotel. They offer 18 bedrooms in Bath, England, for you to rest and relax. It’s a real site for a legitimate business. But notice the hidden text at the bottom of the page where I’ve highlighted in red. This is a perfect example of a site that should be able to find out that their page conflicts with our quality guidelines. Google wants this hotel to know about potential violations of Google’s webmaster quality guidelines on its site.

Now let’s look at an example site that we wouldn’t want to notify if they were penalized:

From this picture alone, you can see that the site is doing
– keyword-stuffing
– deliberately including misspellings
– nonsense or gibberish text, probably auto-generated by a program
– you might be able to guess from the left-hand side and all the variants of “tax deferred” that there are many other pages like this. You’d be right: the site has thousands of doorway pages.

What you can’t tell from the snapshot is that
– the site owner attempted to gather links by programmatically spamming other sites. Specifically, the site owner found a vulnerable software package on the web that doesn’t yet support the nofollow attribute for untrusted links, and then spammed several good sites trying to get links.
– this site is also cloaking. Search engines get the static page loaded with keywords that you see. Users get a completely different page.
– the pages returned to users employ sneaky redirects. Users get a small page with a JavaScript redirect and also a meta refresh; each page just does a redirect to the root page of this domain.
– Given all this, would it surprise you to find out that when a user finally arrives at the root page, every single link that they are offered is a link that the spammer makes money from?

Needless to say, I’d rather not tip off spammers like this when we find their pages.

I hope these two examples give you some idea of the sites that we’d like to alert (and not alert) to issues with their site. Just to repeat: not every site with a penalty will receive confirmation and the offer of a reinclusion request. But if this program works well, we’ll certainly look for ways to keep improving communication with legitimate site owners while not tipping off spammer sites.

Q: Okay, okay, I understand that not everyone will be notified of penalties, and that it’s a test. What will it look like if I do have a spam penalty?
A: In the webmaster console, once you verify a site, click on the tab labeled “Diagnostic” and one of the page sections is called “Indexing summary.” The specific text will say

No pages from your site are currently included in Google’s index due to violations of the webmaster guidelines. Please review our webmaster guidelines and modify your site so that it meets those guidelines. Once your site meets our guidelines, you can request reinclusion and we’ll evaluate your site. [?]Submit a reinclusion request

If you find the issue and clean it up, then just click on the “Submit a reinclusion request” and fill out the form.

(Someone asked me this at a recent conference, so I’m throwing it in.)
Q: I’m the SEO for a client’s site; can I enroll my client’s site in Sitemaps on their behalf?
A: If you have the ability to upload files to the root directory for the client, then yes. Just log into Sitemaps, add the site, and you’ll get a file to upload to the root level of the domain. Multiple people can verify the same site in Sitemaps, so both client@gmail.com and seo@gmail could sign up and get Sitemaps stats for a domain, for example.

As long as we’re talking about the second Summer of Code, what other open-source projects would people like to see? Here’s one I’d like. Ubuntu is really good at handling wireless connections, even on a laptop. But I’d love a simple-to-use VPN wizard or client for Linux/Windows. That way, if you have VPN configuration instructions for Windows, the new VPN utility would have look-alike options as Windows. A similar-looking VPN set-up would make it much easier for VPN users to transition from Windows to Linux/Ubuntu. Also, Ubuntu is looking for someone to implement easy multiple monitor support.

How about it? If you had a few dozen students thinking about open-source projects, what would you ask them to work on?
– Are there improvements you’d like to see in Linux, Apache, MySQL, or Python/Perl/PHP?
– Do you wish Asterisk, the open-source PBX software was easier to try out or configure?
– Are there features you’d like to see in Juice, the cross-platform podcast receiver? Personally, I’d like the ability to save all my podcasts in one directory, instead of forcing podcasts into different directories for each source.
– Or maybe you wish that Audacity could easily help you produce podcasts and handle the RSS wrappers for you?
– Wish you could convert .mod files from camcorders into an easier-to-process video format?
– There’s gotta be something you’d tweak in Firefox. Every Firefox user has 1-2 things that they’d like to see. There’s already a list of potential ideas for Firefox.