A weird race condition

Updated solution, please check below!

Some background:
We have a model that is edited only via the Django admin. The save method of the model fires a celery task to update several other records. The reason for using celery here is that the amount of related objects can be pretty big and we decided that it is best to spawn the update as a background job. We have unit and integration tests, the code was also manually tested, everything looked nice and we deployed it.

On the next day we found out that the code is acting weird. No errors, everything looked like it has worked but the updated records actually contained the old data before the change. The celery task accepts the object ID as an argument and the object is fetched from the database before doing anything else so the problem was not that we were passing some old state of the object. Then what was going on?

Trying to reproduce it:
Things are getting even weirder. The issues is happening every now and then.

Ok, so the save is actually wrapped in transaction. This explains what is going on. Before the transaction is committed the updated changes are not available for the other connections. This way when the celery task is called we end up in a race condition whether the task will start before or after the transaction is completed. If celery manages to pick the task before the transaction is committed it reads the old state of the object and here is the error.

Solution (updated): The best solution is to use transaction.on_commit. This way the call to the celery task will be executed only after the transaction is completed succesfully. Also, if you call the method outside of transaction the function will be executed immediately so it will also work if you are saving the model outside the admin. The only downside is that this functionality has been added to Django in version 1.9. So it wasn't an option for us. Still, special thanks to Jordan Jambazov for pointing this approach to me, I'll definitely use it in the future.

Unfortunately we are using Django 1.8 so we picked a quick and ugly fix. We added a 60 seconds countdown to the task call giving the transaction enough time to complete. As the call to the task depends on some logic and which properties of the models instance are changes moving it out of the save method was a problem. Another option could be to pass all the necessary data to the task itself but we decided that it will make it too complicated.

However I am always open to other ideas so if you have hit this issue before I would like to know how you solved it.

... and some answers

Well I haven't conducted any interviews recently but this one has been laying in my drafts for a quite while so it is time to take it out of the dust and finish it. As I have said in Python Interview Question and Answers these are basic questions to establish the basic level of the candidates.

Django's request response cycle
You should be aware of the way Django handles the incoming requests - the execution of the middlewares, the work of the URL dispatcher and what should the views return. It is not necessary to know everything in the tiniest detail but you should be generally aware of the whole picture. For reference you can check "the live of the request" slide from my Introduction to Django presentation.

Middlewares - what they are and how they work
Middlewares are one of the most important parts in Django. Not only because they are quite powerfull and useful but also because the lack of knowledge about their work can lead to hours of debugging. From my experience the process_request and process_response hooks are the most frequently used and those are the one I always ask for. You should also know their execution order and when their execution starts/stop. For reference check the official docs.

Do you write tests? How?
Having well tested code benefits you, the project and the rest of the team that is going to maintain/modify your code. Django's built-in test client, Django Webtest, Factory boy, Faker or whatever you use, I would like to hear about it and how you do it. There is not only one way/tool for it so I would like to know what are you using and how. You should show that you understand how important is to test your code and how to do it right.

select_related and prefetch_related
Django's ORM (as every other) can be both a blessing and a curse. Getting a foreign key property can easily lead you to executing million queries. Using select_related and prefetch_related can come as a saviour so it is important to know how to use them.

Forms validation
I expect from the candidates that they know how to build forms, to add custom validation for one field and how to validate fields that depend on each other.

Building a REST API with Django
The most popular choices out there are Django REST Framework and Django Tastypie. Have you used any of them, why an how. What issues have you faced? Here is the place to show that you understand the concept of REST APIs, the correct request/response types and the serialisation of the data.

Templates
Building website using the built-in template system is often part of your daily tasks. This means that you should understand the template inheritance, how to reuse the block and so on. Having experience with sekizai is a plus.

There are a lot of other things to ask about internationalisation(i18n), localisation(l10n), south/migrations etc. Take a look at the docs and you can see them explained pretty well.

In a few words PyCon Sweden 2015 was awesome. Honestly, this was my first Python conference ever but I really hope it won't be the last.

Outside the awesome talks and great organisation it was really nice to spend some time with similar minded people and talk about technology, the universe and everything else.
I have met some old friends and made some new ones but lets get back to the talk. Unfortunately I was not able to see all of them but here is a brief about
those I saw and found really interesting:

It all started with Ian Ozsvald and his awesome talk about "Data Science Deployed" (slides / video). The most important point here were:

log everything

think about data quality, don't use everything just what you need

think about turning data into business values

start using your data

Then Rebecca Meritz talked about "From Explicitness to Convention: A Journey from Django to Rails" (slides / video). Whether the title sounds a bit contradictive this was not the usual Django vs Rails talk. At least to me it was more like a comparison between the two frameworks, showing their differences, weak and strong sides. Whether I am a Django user, I am more and more keen to the position that none of the frameworks is better than the other one, they are just two different approaches for building great web apps.

Flavia Missi and "Test-Driven-Development with Python and Django" (slides / video). TDD will help you have cleaner, better structured and easy to maintian code. If you are not doing it the best moment to start is now. Whether it is hard at the beginning you will pretty soon realise how beneficial it is. Especially if someone pushed a bad commit and the tests saved your ass before the code goes to production.

Later Dain Nilsson talked about "U2F: Phishing-proof two-factor authentication for everyone" (video). Whether I don't use two-factor authentication at the moment I am familiar with the concept and I really like it. The U2F protocol looks like a big step towards making it more applicable over different tools and applications and the key holding devices are more and more accessible nowadays. Maybe it is time for me to get one )))

The second day started with Kate Heddleston who talked about ethics in computer programming (video). About how social networks can be used as a tool for ruining peoples lifes and that we as a developers should take a responsibility and work towards making the internet a safer place for everyone. A place where you can have your privacy and have protection if harassed. It is a big problem which won't be solved in a night, but talking about it is the first step towards solving it.

"Why Django Sucks" by Emil Stenström. Don't rush into raging cause this was one of best talks at the conference. Emil showed us the parts in Django where he sees flaws and that need improvement. The main point was the lack of common template language between Django and Javascript and what possible solutions are available or can be made.

Daniele Sluijters reminded us how easy is to work with "Puppet and Python" (video). No more fighting with Ruby, you can easily use your favorite language to build your own tools and helpers

Dennis Ljungmark and "Embedded Python in Practice". The last time I programmed embedded devices was 15 years ago as a part of short course in the high school. Dennis' work is much more complex than what I did then but his talk reminded me of things that are applicable to general programming. Whether using non-embedded systems we often have much more memory and processing power available that does not mean that we should waste it. So think when you code - RAM is not endless and processors are not that fast as we often wish. Also don't forget that Exceptions wil sooner or later occur so make your code ready to handle them.

"How to Build a Python Web Application with Flask and Neo4j" (video) by Nicole White. Well I have heard about Neo4J, but I have never used it or seen it in action so this one was really exciting. Neo4J offers you a whole new perspective about building databases and relations between objects but it is much far from panacea. Actually I can see it it is more like a special tool then as a general replacement of a relation database but it still worths to be tried. Oh, and the Neo4J browser - totally awesome.

In the lightning talks Tome Cvitan talked about Faker. If you are still not familiar with it now is a good time to try it. Especially if you are writing tests.

At the final Kenneth Reitz told us about "Python for Humans"(video). About the not that obvious things in Python and what solutions are out there. And also about The Hitchhiker’s Guide to Python! a great place for beginners and not only and shared the idea to make Python easier and more welcoming by introducing better tools for higher level of operations.

Finally I want to thank to everyone - the organisers, the speakers, the audience and practically everyone who was a part of the conference. Without you it would be the same (or be at all). Thanks, keep up the good work and hopefully we will sea each other again.

P.S. Have I mentioned that the whole conference was recorded on video so hopefully we will see be able to see all the talks pretty soon. I will try to keep this post updated with the links to the videos and/or slides when they become available. Of course if you know about any published slides from the conference that are not linked here please let me know.

Brief: In one of the project I work on we had to convert some old naive datetime objects to timezone aware ones. Converting naive datetime to timezone aware one is usually a straightforward job. In django you even have a nice utility function for this. For example:

Explanation: The reason for the first error is that in the real world this datetime does not exists. Due to the DST change on this date the clock jumps from 01:59 standard time to 03:00 DST. Fortunately (or not) pytz is aware of the fact that this time is invalid and will throw the exception above. The second exception is almost the same but it happens when switching from summer to standard time. From 01:59 DST the clock shifts to 01:00 standard time, so we end with a duplicate time.

Why has this happened(in our case)? Well we couldn't be sure how exactly this one got into our legacy data but the assumption is that at the moment when the record was saved the server has been in different timezone where this has been a valid time.

Solution 1: This fix is quite simple, just add an hour if the exception occurs.

The second solution probably needs some explanation. First lets check what make_aware does. The code bellow is take from Django's sourcecode as it is in version 1.7.7

defmake_aware(value,timezone):""" Makes a naive datetime.datetime in a given time zone aware. """ifhasattr(timezone,'localize'):# This method is available for pytz time zones.returntimezone.localize(value,is_dst=None)else:# Check that we won't overwrite the timezone of an aware datetime.ifis_aware(value):raiseValueError("make_aware expects a naive datetime, got %s"%value)# This may be wrong around DST changes!returnvalue.replace(tzinfo=timezone)

To simplify it, what Django does is to use the localize method of the timezone object(if it exists) to convert the datetime. When using pytz this localize method takes two arguments: the datetime value and is_dst. The last argument takes three possible values: None, False and True. When using None and the datetime matches the moment of the DST change pytz does not know how to handle the datetime and you get one of the exceptions shown above. False means that it should convert it to standard time and True that it should convert it to summer time.

Why isn't this fixed in Django? The simple answer is "because this is how it should work". For a bit longer check the respectful ticket.

Reminder: Do not forget that the "fix" above does not actually care whether the original datetime is during DST or not. In our case this was not criticla for our app, but in some other cases it might be, so use it carefully.

Thanks: Special thanks to Joshua who correctly pointed out in the comments that I have missed the AmbiguousTimeError in the original post which made me to look a bit more in the problem, research other solutions and update the article to its current content.

Or how to use the benefits of Django template system during the PSD to HTML phase

There are two main approaches to start designing a new project - Photoshop mock-up or an HTML prototype. The first one is more traditional and well established in the web industry. The second one is more alternative and (maybe)modern. I remember a video of Jason Fried from 37 Signals where he talks about design and creativity. You can see it at http://davegray.nextslide.com/jason-fried-on-design. There he explains how he stays away from the Photoshop in the initial phase to concetrate on the things that you can interact with instead of focusing on design details.

I am not planning to argue which is the better method, the important thing here is that sooner or later you get to the point where you have to start the HTML coding. Unfortunately frequently this happens in a pure HTML/CSS environment outside of the Django project and then we waste some extra amount of time to convert it to Django templates.

Wouldn't be awesome if you can give the front-end developers something that they can install/run with a simple command and still to allow them to work in the Django environment using all the benefits it provides - templates nesting and including, sekizai tags etc.

I have been planning to do this for a long time and finally it is ready and is available at Django for Prototyping. Currently the default template includes Modernizr, jQuery and jQuery UI but you can easily modify it according to your needs. I would be glad of any feedback and ideas of improvement so feel free to try it and comment.

As a follow up post of Automated deployment with Ubuntu, Fabric and Django here are the slides from my presentation on topic "Automation, Fabric and Django". Unfortunately there is no audio podcast but if there is interest I can add some comments about each slide as part of this post.

Explanation

At the top I have a directory named as the project and virtual environment inside of it. The benefit from it is complete isolation of the project from the surrounding projects and python packages installed at OS level and ability to install packages without administrator permissions. It also provides an easy way to transfer the project from one system to another using a requirements file.
The src folder is where I keep everything that is going to enter the version control project source, requirements files, web server configuration etc.
My default .gitignore is made to skip the pyc-files, the PyDev files and everything in the static and media directories.
The media directory is where the MEDIA_ROOT settings point to, respectively static is for the STATIC_ROOT.
All required packages with their version are placed in required_packages.txt so we can install/update them with a single command in the virtual environment.
In a directory with the name of the project is where the python code resides. Inside it the project structure partly follows the new project layout introduced in Django 1.4.
The big difference here is the settings part. It is moved as a separate module where all common/general settings are place in __init__.py

. We have one file for each possible environment(development, staging, production) with the environment specific settings like DEBUG and TEMPLATE_DEBUG. In local.py are placed the settings specific for the machine where the project is running. This file should not go into the repository as this is where the access database password and API keys should reside.
I having one local Nginx configuration file because I use the webserver to serve the static files when working locally this is the <project_name>.development.nginx.local.conf.
For each environment there is also a couple of configuration files - one for the web server(<project_name>.< environment_type>.nginx.uwsgi.conf) and one for the uwsgi(<project_name>.< environment_type>.uwsgi.conf). I make symbolic links pointing to these files so any changes made are automatically pushed/pulled via version control, I only have to reload the configuration. There is no option to change something in the configuration at one station and to forget to transfer it to the rest.

Benefits

Complete isolation from other projects using virtual environment

Easy to transfer and update packages on other machines thanks to the pip requirements file

Media/static files outside the source directory for higher security

Web server/uWSGI configuration as part of the repository for easier and error proof synchronization

Probably it have some downside(I can not think any of these now but you never know) so if you think that this can be improved feel free to share your thoughts. Also if there is anything not clear enough, just ask me, I will be happy to clear it.

A few months ago I started to play with Fabric and the result was a simple script that automates the creation of a new Django project. In the last months I continued my experiments and extended the script to a full stack for creation and deployment of Django projects.

As the details behind the script like the project structure that I use and the server setup are a bit long I will keep this post only on the script usage and I will write a follow up one describing the project structure and server.

So in a brief the setup that I use consist of Ubuntu as OS, Nginx as web server and uWSGI as application server. The last one is controlled by Upstart. The script is available for download at GitHub.
In a wait for more detailed documentation here is a short description of the main tasks and what they do:

startproject:<project_name>

Creates a new virtual environment

Installs the predefined packages(uWSGI, Django and South))

Creates a new Django project from the predefined template

Creates different configuration files for development and production environment

Initializes new git repository

Prompts the user to choose a database type. Then installs database required packages, creates a new database and user with the name of the project and generates the local settings file

Runs syncdb(if database is selected) and collectstatic

setup_server

installs the required packages such as Nginx, PIP, GCC etc

prompts for a database type and install it

reboots the server as some of the packages may require it

Once you have a ready server and a project just use this the following task to deploy it to the server. Please have in mind that it should be used only for initial deployment.

deploy_project:<project_name>,<env_type>,<project_repository>

Creates a new virtual environment

Clones the project from the repository

Installs the required packages

Creates symbolic links to the nginx/uwsgi configuration files

Asks for database engine, creates new DB/User with the project name and updates the settings file

Calls the update_project task

update_project:<project_name>,<env_type>

Updates the source code

Installs the required packages(if there are new)

Runs syncdb, migrate and collect static

Restart the Upstart job that runs the uWSGI and the Nginx server

The script is still in development so use it on your own risk. Also it reflects my own idea of server/application(I am planning to describe it deeper in a follow up post) setup but I would really like if you try it and give a feedback. Feel free to fork, comment and advice.

Preface: Have you noticed how on some websites when you click on a link that opens a lightbox or any overlay for first time it takes some time to display the border/background/button images. Not quite fancy, right?
This is because the load of this images starts at the moment the overlay is rendered on the screen. If this is your first load and these images are not in your browser cache it will take some time for the browser to retrieve them from the server.

Solution: The solution for this is to preload the images i.e. to force the browser to request them from the server before they are actually used. With a simple javascript function and a list of the images URLs this is a piece of cake:

Please have in mind that the code above uses the jQuery library.
Specialty: Pretty easy, but you have to hardcode the URLs of all images. Also if you are using Django compressor then probably you are aware that it adds extra hash to the URLs of the images in the compressed CSS files. The hash depends from the COMPRESS_CSS_HASHING_METHOD settings and can not be avoided. It is pretty useful cause it forces the client browser to reload the images every time when something has been changed. unfortunately our hardcoded list of URLs does not have this hash. So wouldn't it be much simpler if instead of hardcoding URLs we just read them from the CSS files?
Solution 2:

Now with the help of regular expressions we can read the image URLs directly from the CSS file together with the hash part. Please note the zero index in the css file selector, if your main CSS is not the first declared style-sheet then you will have to change the index according to its position.

I hope you will find this solutions simple and useful. As always feel free to comment, share and propose code improvements.