Posts tagged with status codes

During the development of Simple Site Checker I realised that it would be useful for test purposes if there is a website returning all possible HTTP status codes. Thanks to Google App Engine and webapp2 framework building such website was a piece of cake.

The home page provides a list of all HTTP status codes and their names and if you want to get an HTTP response with a specific status code just add the code after the slash, example:http://httpstatuscodes.appspot.com/200 - returns 200 OKhttp://httpstatuscodes.appspot.com/500 - returns 500 Internal Server Error
Also at the end of each page is located the URL of the HTTP protocol Status Codes Definitions with detailed explanation for each one of them.

... or how to avoid duplicate content by keeping the current language in the URL

Preface: Earlier this year I posted about Django CMS 2.2 features that I want to see and one of the things mentioned there was that once you have chosen the language of the site there is no matter whether you will open "/my_page/" or "/en/my_page/" - it just shows the same content. The problem is that this can be considered both duplicate and inconsistent content.
Duplicate because you see the same content with and without the language code in the URL and inconsistent because for the same URL you can get different language versions i.e. different content.

Solution: This can be easy fixed by using a custom middleware that will redirect the URL that does not contain language code. In my case the middleware is stored in "middleware/URLMiddlewares.py"(the path is relative to my project root directory) and contains the following code.

Now a little explanation on what happens in this middleware.
Note: If you are not familiar with how middlewares work go and check Django Middlewares.
Back to the code. First we split the URL by '/' and take the second element(this is where our language code should be) and store in lang_path(8).
URLS_WITHOUT_LANGUAGE_REDIRECT is just a list of URLs that should not be redirected, if lang_path matches any of the URLs we return None i.e. the request is not changed(9-10). This is used for sections of the site that are not language specific for example media stuff.
Then we get language based on the request(11-13).
If lang_path is empty then the user has requested the home page and we redirect him to the correct language version of it(14-15).
If lang_path does not match any of the declared languages this mean that the language code is missing from the URL and the user is redirected to the correct language version of this page(16-17).
To make the middleware above to work you have to update your settings.py.
First add the middleware to your MIDDLEWARE_CLASSES - in my case the path is 'middleware.URLMiddlewares.CustomMultilingualURLMiddleware'.
Second add URLS_WITHOUT_LANGUAGE_REDIRECT list and place there the URLs that should not be redirected, example:

URLS_WITHOUT_LANGUAGE_REDIRECT=['css','js',]

Specialties: If the language code is not in the URL and there is no language cookie set your browser settings will be used to determine your preferred language. Unfortunately most of the users do not know about this option and it often stays set to its default value. If you want this setting to be ignored just add the following code after line 10 in the middleware above:

It removed the HTTP_ACCEPT_LANGUAGE header sent from the browser and Django uses the language set in its settings ad default.

URLS_WITHOUT_LANGUAGE_REDIRECT is extremely useful if you are developing using the built in dev server and serve the media files trough it. But once you put your website on production I strongly encourage you to serve these files directly by the web server instead of using Django static serve.

Final words: In Django 1.4 there will be big changes about multilingual URLs but till then you can use this code will improve your website SEO. Any ideas of improvement will be appreciated.

... or how to make user editable 404 page that stays in the pages tree of the CMS

Basics: Yes you need it! You need 404 page cause you never know what may happen to a link: bad link paste, obsolete or deleted article, someone just playing with your URLs etc. It is better for both you and your website visitors to have a beauty page that follows the website design instead of the webserver default one that usually contains server information which is possible security issue. With Django this is easy, just make a HTML template with file name 404.html, place it in you root template directory and voilà - you are ready. You will also automatically have a request_path variable defined in the context which caries the URL that was not found.

Problem: sometimes clients require to be able to edit their 404 pages. Or other times you need to use some custom context or you want to integrate plug-ins and be able to modify them easy trough the CMS administration. For example: you want do display your brand new awesome "Sitemap Plug-in" on this 404 page.

Solution: Django allows you to specify custom 404 handler view so you just need to define one, set it in urls.py and make it to render the wanted page:

# in urls.pyhandler404='site_utils.handler404'# in site_utils.pyfromcms.viewsimportdetailsdefhandler404(request):returndetails(request,'404-page-url')

Where '404-page-url' is the URL of the page you want to show for 404 errors. So everything seems fine and here is the pitfall. If you use it this way your web page will return "200 OK" instead of "404 Not Found". This could kill your SEO(except if you want your 404 page as first result for your website). So you just need to add a 404 header to the response:

Final words: Why are these HTTP status codes so important. The reason is that they tells the search engines and other auto crawling services what is the page status. Is it normal page, redirect, not found, error or something else. Providing incorrect status codes may/will have a negative effect on your website SEO so try to keep them correct, especially when it is easy to achieve as in the example above.

Note: If the code above is not working for you, please check Allan's solution in the comments

...Why, Where, What and How of caching web sites

Basics of web page load: Every time when you open an webpage this results in a series of requests and responses between the client(your browser) and the web server that hosts the requested sites(most of the times this includes more the one servers). Each request tells the server that the client wants specific resource(CSS/JavaScript/image/etc.) and each response carries the resource content.

The main purposes of caching is to decrease bandwidth and to improve website performance.

Where are resources cached: According to the image above there are 2 places where content can be cached: on the client(browser cache) and on the server(server cache). The first one(browser cache) is more suitable when the client often requests single recourse again and again. As an example for this can be pointed CSS-files/JavaScripts/Images and other resources that form the page layout(these are requested on every page load). The second one(server cache) is hold on the server. It also can be split in two major pieces web server cache(when entire resource is cached) and application cache(when data chunks inside the application are cached).

How browser cache works: In order to tell the browser to cache such resources you have to supply each response with the following headers:

Last-Modified - datetime indication of when the content was last modified/updated on the server

Expires - the datetime after which the response will be considered "out of date"

Cache-Control - a directives about whether the resource must be cached and its maximal age of validity

Date - origin servers datetime

When the client request the resource for a first time the response contains both the resource data and the headers mentioned above. This tells the browser to store the response body(data) into its own cache. On the next request for the same resource the browser sends "If-Modified-Since" header with datetime value identical to the "Last-Modified" received from the server on the previous request. If the cache is still valid the web server returns HTTP Status code "304 Not Modified" which tell the browser to retrieve the content from its cache. Otherwise if the resource is modified between the two requests the response contains the new data along with the corresponding date headers.
This one is extremely useful to decrease websites bandwidth and speed up their loading because the common resources are read directly from the client and not pulled again and again from the server over the network.
For static resources this can be set in the webserver configuration. For dynamic ones(generated images or resources pulled from the database) these headers must be provided and checked from the application itself.

How server cache works: As I stated above the server cache can can be split into web server cache and application cache. Both are used when a single resource is requested by multiple of clients. For example the pages HTML is requested by every site visitor. If the content is the same for all of them then the HTML computation(for dynamically generated pages) can be done only for the first request and the latter ones can get the precomputed one from cache. For this you will need a caching proxy. The most common applications that are used for this purpose are Squid, Varnish and NginX
The process is similar to the one on the first figure but here the client communicates with the proxy instead of directly to the web server. If the proxy does not has a copy of the requested resource it gets if from the web server and then serves it to the client. Otherwise it pulls the content from its cache and return it directly.

Application cache: Sometimes caching full page is not an option. For example if you have a news website with two columns: one that is same for all users and one that is customizable by user preferences you can not serve every one with the same page. But you can pre-compute first column HTML, store it in the application cache and retrieve it when need instead of computing it for every user. The most common tool for this is Memcached. Its implementation is application specific but in pseudo code it works the following way:

key='cached_item_unique_key'result=cache.get(key)ifnotresult:# do some computation hereresult=computed_resultcache.set(key,result,is_valid_period)returnresult

Each cached item is represented in the cache with unique key. When requested if there is such data in the cache it is pulled directly from there and server. Otherwise the date is computed, stored in the cache and then served. Server cache will decrease server load and allow your pages to be computed faster.

Final words: Caching is a double-edge razor. Using if carefully and with comprehension will make your life easier. Playing with it without knowing what you do may ruin your application. Do not cache rarely requested content. Do not try to cache everything, cache also has its limits. Use it wise.
If there is something not well explained, or missed, or wrong feel free to ask/comment. Your participation will be appreciated. Also if you find this article helpful share it and help other too.