Our website uses cookies to improve your user experience. If you continue browsing, we assume that you consent to our use of cookies. More information can be found in our Cookies Policy and Privacy Policy.

Robots.txt And 404 pages: sometimes funny, always important

My last few posts have been fairly serious, so I figured I would mix in a little fun with this one and I’m soliciting input too (there’s a £50 prize for the best contribution,see at the end).

I’ve seen a spate of “Easter Egg” links doing the rounds on Twitter recently, some of which have been pretty inspired. They’re generally either the 404 page or the robots.txt page.

The topic for this post is to talk a little bit about what these pages are actually for and to share some of my favourites among the new ones I’ve seen recently.

404 page, no juice available

The 404 page is the page you are shown if the page you are looking for, via your browser, doesn’t exist (here is our 404).

Generally, if a user follows a link to a non-existent page, the site should indicate clearly to the visitor that they have followed a broken link and indicate to the search engines that the page doesn’t exist (via 404). So far, so straightforward.

A temptation arises if one has a “fun” 404 page that attracts links and social media traffic from people who find the page amusing.

Any internet marketer knows that links help sites rank better in search engines (and social media signals may well too) and rankings mean business.

The temptation is therefore to mask the fact that this is a non existent page by returning a 200 “OK” code, so that the search engine is fooled and the ranking benefit is still gained by the site.

Google calls such pages “crypto 404s” or “soft 404s” and, because they don’t want to be sending searchers to broken pages, they seek to identify such pages and remove them from the index nonetheless.

Whilst I haven’t read anything that says Google punishes such behaviour in terms of site rankings, it’s duplicitous so it’s hardly sending Google the most trustworthy signals about your site and your behaviour.

404s are definitely something that should be monitored and addressed if possible. In Google’s webmaster central you can pull a list of all links the Googlebot has followed to “pages” on your site where a 404 was returned.

While having many links pointing to non-existent pages on your site is not, as far as I’m aware, penalised by Google, it could still have a negative impact. First by the ranking “juice” of links being lost and secondly by adversely affecting browsing stats, such as bounce rates, that Google is believed to collect via its toolbar and factor into its ranking algorithms.

To fix 404s you can either create a page at that location where relevant, set up a permanent (301) redirect from the non existent page to a relevant page or contact the site where the broken link is sitting and ask them to amend the link to point to an appropriate page. It’s boring but important.

If you have taken care of the above, then there’s then no reason you shouldn’t then try and make the page fun / memorable in order to perhaps garner some social media visits and drive general brand awareness.

Here are five of 404s that I’ve liked recently, in ascending order of weirdness:

Robots.txt, juice available

The robots.txt file is a page of plain text on a website (here is our robots.txt) which is read by search engine bots and other automated web crawlers to understand which pages the site owners does not want them to crawl and index.

It is very common, for example, to “disallow” crawling of anything to do with the basket and any pages where a customer might be signed in to a “My Account” area or equivalent.

Whilst it is unlikely that a bot would get into an account area requiring sign in, there is no harm in disallowing it anyway, just in case; having Google crawl all your customers’ private pages and put them in its index would not make you very popular with your customers.

You can also use robots.txt to ask certain robots not to crawl you site (though they might ignore you and do it anyway) – we have blocked a few that kept crawling and wasting valuable server resources during peak periods (Valentine’s and Mother’s day, in our case).

NB robots.txt is not to be confused with the robots tag. This is a tag that you can use to give specific instructions about a page to a robot.

Rather than have Google index all of these similar looking pages, we added the robots meta tag “noindex, follow” to all of these pages meaning that any link juice coming into the pages flows through but the pages themselves do not go into search engine indexes (indices?), make us look spammy and thus risk getting us a penalty.

Because the main robots.txt file is relatively simple, there is room for humour, albeit it must be said that this is humour that will most likely only be read by geeks. Additionally, because these pages are crawled, generating links to them should be contributing to the site’s overall ranking.

It is no coincidence that SEO companies often do something quirky with their robots page, presumably to capitalise on this opportunity.

Recommended

I have spent the past two years working in what was Europe’s largest independent digital marketing agency. An agency that won countless awards, conducted some ground breaking online display work, was the first large agency to embrace social media and managed huge search budgets for it’s clients.

The agency was innovative and forward thinking, we even had someone who held the job title Head of the Future.

However they just didn’t get affiliate marketing. In my two years as Head of Affiliate a great deal of my time there was spent justifying the fact that affiliates should be included within plans for the client. The widespread feeling was that the affiliate marketplace was a murky area and sat very much towards the darker end of the marketing spectrum. It was a necessary evil in some cases but somewhat of a dark art which was to be neither understood or fully embraced.

It is widely accepted that SEO has evolved from simplistic initiatives such as keyword density and frequency within a page of content. Search algorithms are continually evolving to understand context and attitudes based on advanced semantic analysis.

That said, understanding the target keywords of a given web page will always remain a key focus for SEO development. The main principal remains; each page should have a clear and unique theme.

Two ex-Hill & Knowlton executives have launched PRINT, a new measurement system that aims to show significant correlation between social media footprint and value & growth.

Similar to the likes of Klout, PeerIndex and Kred from PeopleBrowsr, the PRINT methodology measures five key attributes of social media ‘performance’: popularity, receptiveness, interaction, network reach and trust.

Companies are pouring billions of dollars a year into social media and influencer marketing campaigns, many of which target consumers on Facebook-owned Instagram, in an effort to parlay social engagement into sales.