The Moz Blog

Mike Davidson: Invalid Code Can Get You Booted from Google

Mike Davidson, of Mike Industries (and CEO of NewsVine), has done some research into whether invalid code, nested tables and some other HTML factors can affect rankings at the search engines. My favorite bit:

So can invalid code get you penalized on search engines?The answer: Yes, to a draconian degree, in fact....

...What’s really interesting to me is that Google is doing one of two things (or both):

1. Somehow grading pages based on how they are rendered as well as how they are coded.

2. Simply counting the rest of the page as an attribute of the invalid table because the attribute is never officially closed off with an end quote.

There may also be other explanations to why this is happening, but this was the most interesting test in the bunch for me.

Conclusion: It’s not clear that validity helps search engine ranking, but it’s definitely true that certain errors in your code can get you completely removed from indexes.

Take a peek at the three test pages and his results - 3a, 3b and 3c. My problem with this is that he's speculating that Google will damage all nonvalid code, when in fact, they're simply only indexing the visible content on those pages. If they were to show the invalid pieces of code (that won't display in the browser), that would be akin to cloaking.

The rest of his research is less interesting, IMO, but I'd love to hear a take on your opinions about the code issue.

15 Comments

So what you're basically saying is that the search engine doesn't care if you use nice and clean CSS/HTML, put your content div as high up in the code (keeping in mind linearization and theming here) or use div id=header(or whatever) instead of H tags etc?

Sverre - I disagree. As I said, there's plenty of reasons to make valid code, use CSS, have clean HTML, etc. - Search Engines are not one of those reasons. It's like saying the Olympic bobsled judges will be taking off time for how well groomed the team looks after they finish the run - not applicable.

Precisely! Linearization means that code placement is virtually useless - as it should be. No one should be using code placement tricks to rank better - it's nonsensical. Search engines want to rank content based on what the user sees on the page, not how well the builder can make it show up in the code.

I wouldn't exactly call it tricks, but rather ways to make the search engine make sense of your site more easily.
Just like with proper site structure and theming and clustering products (read key words and phrases), I believe you can do the same with the document structure and hierarchy as well. This just makes sense to me.
With linearization, unwanted, or inefficient theming may occur - I believe CSS can aid to get the most beneficial theming.
I think the best example I can come up with off the top of my head is this article that explains my point pretty well http://www.miislita.com/fractals/keyword-density-optimization.html
Another thing is some of the info found in the Hilltop "documentation", which I'm sure you're more than familiar with:
Every key phrase has a scope within the document text. URLs located within the scope of a phrase are said to be "qualified" by it. For example, the title, headings (e.g., text within a pair of <H1> </H1> tags) and anchor text within the expert page are considered key phrases.
as well as
LevelScore(p) is a score assigned to the phrase by virtue of the type of phrase it is. For example, in our implementation we use a LevelScore of 16 for title phrases, 6 for headings and 1 for anchor text.
Maybe I'm completely wrong here, but unless this document is utterly obsolete in regards to the current Hilltop algorithm, it is quite obvious to me that, at the very least Google, look at proper document structure and theming of keywords that way.

"My problem with this is that he's speculating that Google will damage all nonvalid code, when in fact, they're simply only indexing the visible content on those pages."

Other people might speculate about that, but I certainly didn't in my article. I would say if anything, I insinuated the opposite: that general code quality has very little effect whatsoever on your search engine placement. The only interesting bit to me was that I was physically able to break a page so badly that it negatively affected placement.

Hi Randfish,
I have designed a website ( http://www.rajasthantourindia.com/) few month before and it's completely driven by Content Management System in MySQL and PHP.
The problem I am facing is that when I have launched this website it's has gained google pr 4 mostly in all the pages of the website and indexed pages by google was approx 350 before 30 days. But it's really amazing for me, today when I tried to check indexed page of my website in google by site:domainname.com command, it's showing that only 1 page is indexed by google. I don't know what's happening in the google's indexing services. I coded this complete search engine friendly.
I need your expert suggestion to look into the problem.
Hope to get reply from your side.
With Regards,
Nick

Even though validating code in it self doesn't result in better SERPS, I believe using H tags properly, CSS for layout (putting the content div all in the way in the top) and so on will greatly improve your rankings. The cleaner, the better.
And when you do make an effort in this area, why not make it as valid and accessible as possible (yeah, I read your article at ALA, Andy, and I really liked it)?
The H tag is primarily to be used to structure a document, not to style text. It’s clearly defined by W3C how to use it -- both in the HTML/XHTML specs, but also in WACAG and other guidelines.
The search engines doesn’t care about the pretty colours or the cool fonts, only the document structure and will rank the page based on that. Follow the W3C specifications, use the H tags to construct proper page hierarchy and the search engines will love you for it.
H1 is the main heading, and thus you should only have one occurrence of the tag per page. A page should always be focused on a over all theme, and you can use these headings to cluster them around this overall theme
Ultimately, I believe that if you *could* do A/B testing between two sites with identical content: a nested table site with not great HTML and one that's acessible, validates and use CSS for layout, the CSS site will win hands down.

...not again another dev vs. marketer fight...
...as for conclusion... lol ;)
when in fact, they're simply only indexing the visible content on those pages
if you check page cache in code view, it's all there - both valid and nonvalid parts. This is mere display problem with visual browsers, the content is completely indexed...

Ugh, now people are going to freak out about having perfect code. Mike himself notes that the vast majority of pages do NOT have perfect code and rank well. But this one example of where a page got "completely" removed from the index for bad code? Not the case. Here is is:
http://www.google.com/search?q=...
And here it is ranking ahead of 2.3 million other pages:
http://www.google.com/search?hl...
It was listed with Google, just not ranking for the term he was seeking because the code error meant that word wasn't registered as being on the page.

> I normally respect you, but I hate any search marketer that helps self aggrandizing designers think they are good marketers and competent SEOs.
I know, but in my defense I was under the influence of an extremely powerful AListApart backlink :-(

>I HATE this meme although I did my part to spread it ;-)
I normally respect you, but I hate any search marketer that helps self aggrandizing designers think they are good marketers and competent SEOs.
I fixed some of this a little while back, but up until I did, I had 2 copies of almost the exact same invalid page with various hidden text misspelled anchor text linking back and forth between them and I was making affiliate commission for ranking for the misspelled terms...and that site is still in Google. Does not valilidate and never will.