Why Are You Linking To 404s?

18 Jun 2012

I wish this could be a cool post about how to detect and correct external links from your site which 404. Unfortunately, it appears that I need to roll the clock back a full decade and talk about links on your website, pointing to your website, which 404.

This is actually something I've noticed more and more recently. My amazement at this degeneration of basic HTML skill culminated last week when, amidst the Azure fanfare, I noticed that its own search engine returned internal results which 404. A few days later, while talking to someone about the Play Framework, I came across links in its documentation pointing to non-existent pages.

There are a number of ways to deal with it, but the simplest is probably to monitor your log files and reactively correct 404s. Here's a simple ruby script that reads the standard nginx log format (also used by Apache) and extracts 404s:

Of course, this can all get more advanced. Most notably, we might also log the referrer to make it easier to track the source of our 404. Regardless of what approach you take, how simple or complicated you make it, there's no excuse for not monitoring and correcting 404s on your site. This doesn't just apply to internal links (which make you look particularly incompetent), but also external ones as it represents a lost opportunity to help and engage users.