May 26, 2008

Check your 404 traffic!

Recently a problem was discovered with several web sites. I must say that the problem is not TYPO3-specific but will happen to any web site that uses <base> tag. We will solve it for TYPO3, of course.

The problem happens as follows. Some user agents seems to ignore the <base> tag and requests invalid URLs. For example, while viewing the page at domain.tld/hello/world/ such agents may seea reference to the image at typo3temp/pics/12345.gif . If user agents ignores <base> tag, it will request domain.tld/hello/world/typo3temp/pics/12345.gif. The result for such requests is obvious: page not found and 404 error.

Now look closely to the URL. It is a TYPO3 URL. It means that TYPO3's 404 handler willbe invoked. Now if that page contains link to typo3temp/pics/12345.gif the process will become recursive. This will cause huge amount of useless traffic for the web site.

The problem is pretty serious and important for many web sites already. How to solve it?If web site does not use linking across domains (new feature of TYPO3 4.2), the solution is pretty easy:

config.absRefPrefix = /

This will make all links absolute. There will be no need for config.baseURL at all. However this will not work with links across domains. So this solution is not universal. TYPO3 documentation does not recommend using config.absRefPrefix because it is not applied consistently across the system. So, is there a better solution?

Yes, there is. The following piece of code in .htaccess will solve the problem:

This code checks if there is anything before top-level TYPO3 directories in the URL. If yes, it redirects using 301 code (permanent redirect) to the proper place. As a result no 404 happens at all. This solution works independently of multi- or single domain and it does not invoke TYPO3/PHP at all. So it is better than any TYPO3-based solution.

Important: in order to work, the above shown code should be placed before this line in .htaccess: