Following on from my previous post about the new Irish Times website, I’ve been watching how Google gets on with the migration from ireland.com to irishtimes.com.

Could They Make It Any More Difficult?

The first thing I noticed when irishtimes.com went live was that www.irish-times.com was a mirror. It was even showing up in the SERP for [irish times], and you could clearly see the dupe content issues from the pages listed. Thankfully that domain has now been properly redirected.

But perhaps more worrying is the fact that Google is still returning Ireland.com in the #1 position for [irish times]:

Image of Google results for [irish times]

Am I surprised? Well not really given the gratuitous use of META refreshes and 302 redirects on Ireland.com. As I mentioned in my previous post, a request for http://www.ireland.com results in:

The only surprise is that Google has gotten as far as it has. The snippet displayed for ireland.com appears to be the META Description, but that title certainly isn’t from the page (very often a good sign that Google knows the page is there, but cannot crawl it). And why might Google still believe that ireland.com is the Irish Times? Well? At a guess I’d say it’s anchor text related (I don’t think the old ireland.com homepage had that page title). If you check the Google Directory you can see that the DMOZ anchor is in fact ‘Irish Times’, so I’m sure that’s a factor here. Again I’m guessing, but I imagine those refreshes and redirects are seen as temporary, and Google remains to be convinced that ireland.com is not the Irish Times. This is probably a good example where requesting a change to DMOZ would be a wise step. Fixing the canonical URL issue would also be advisable given the issues I’ll look at in a short bit.

The Plot Thickens…

Whenever I think Google is confused about a page I always check the cached version. Here’s Google’s cache of ireland.com:

Images of Google’s cache of www.ireland.com

The initial links are hidden page anchors for accessibility usage. That’s fair game, and it’s perfectly acceptable to hide those links for visual browsers. But the next links contained in h1′s?

Now don’t get me wrong – I’m a big fan of using text alternatives when text content is image-based. I often add in a text node containing the text portrayed in the image and hide that node away. Google doesn’t mind this as long as the text content is representative of what’s in the image. But I think it’s rather risky for a site like ireland.com to add hidden text to their page that is not also rendered in one form or another on the page. Here’s the mark-up:

The ‘Homepage’ link might pass a manual review, but I doubt the “Information for Ireland or abroad, travel, entertainment listings, sports news, games, puzzles, recipes, TV listings and more” would. It’s no where on the page, and basically hidden text. No idea why they have it there TBH, but I think it’s a little risky personally.

More Horrible Redirects

I always wonder where some of the URI constructions come from these days. You can always tell when no one from the SEO side is consulted when you see URLs like:http://ireland.com/home/Looking_for_cheap_flights_Try_our_Find_it_fast/maxiview.ie?mx_ext_UNCLASSIFIED_uuid=/travelnow/landing.ie?afs=false That URL comes from the Most Read list on the homepage:

Most Read items on Ireland.com

Perhaps worse still is the server response after clicking on one of those URLs:

Really, enough already – either NOFOLLOW those links, or use the correct URLs…

What Else Don’t I like?

I could spend a long time going through what is undoubtedly a very large site. I did take a look at what Google has indexed, and how Google appears to be dealing with new content. I always knew that dealing with Google would be a serious task for such a large site. But I’m not convinced of whatever redirect-strategy they are using. I found some very odd migration of old content to the new irishtimes.com site, complete with the old theme. I’m also seeing jsessionid variables indexed by Google – http://www.ireland.com/goingout/Bruce_Springsteen_[ ...]jsessionid=3FF79D689D230AA75EAD8956CE97DA9A?[ ...]affiliatewindow.com_uuid=23106443_irretailaffiliat.

Conclusion

While this post started out as another look at irishtimes.com, it quickly became apparent that its predecessor has quite a few SEO flaws. I’ve looked at a few above, and I could go on, but I think the pattern is clear – SEO seems to have taken a back seat when it came to ireland.com. Large site SEO is more about crawlability and internal navigation, and very often good internal linking and architecture let’s you push around site authority and pick off less competitive and long-tail keywords very effectively.

I think ireland.com is a great example of why baking in SEO at the design and development stage of any large site is essential.

Good analysis Richard. I looked at their robots.txt, and it needs to be updated, 1/2 the pages on it don’t exist anymore. I couldn’t find an XML sitemap at any of the usual locations either. It is funny to such large sites with the canonical URL problem not fixed too.

Weird thing is I don’t honestly know why I do these posts. I’m just so conditioned to look at websites from an SEO perspective now. Thankfully I don’t really have any need to pitch for work (x’s fingers that doesn’t put the mockers on me now). And I suppose in a way I am being somewhat patronising when I highlight some failings, but I hope that some of what I write might actually be useful to those sites I write about (in the main that is).

Hi Paul – the archive.org cache can be very useful for picking issues from the past. I’ve seen it used to find sites that previously sold links and the like. Google will often point you to the archive of a page, although we all know that they have a much more powerful archive at their disposal.