This document assumes that you are familiar with the changes between Python 2
and Python 3. If you aren’t, read Python’s official porting guide first. Refreshing your knowledge of unicode handling on
Python 2 and 3 will help; the Pragmatic Unicode presentation is a good
resource.

Django uses the Python 2/3 Compatible Source strategy. Of course, you’re
free to chose another strategy for your own code, especially if you don’t need
to stay compatible with Python 2. But authors of pluggable applications are
encouraged to use the same porting strategy as Django itself.

Writing compatible code is much easier if you target Python ≥ 2.6. Django 1.5
introduces compatibility tools such as django.utils.six, which is a
customized version of the sixmodule. For convenience,
forwards-compatible aliases were introduced in Django 1.4.2. If your
application takes advantage of these tools, it will require Django ≥ 1.4.2.

Obviously, writing compatible source code adds some overhead, and that can
cause frustration. Django’s developers have found that attempting to write
Python 3 code that’s compatible with Python 2 is much more rewarding than the
opposite. Not only does that make your code more future-proof, but Python 3’s
advantages (like the saner string handling) start shining quickly. Dealing
with Python 2 becomes a backwards compatibility requirement, and we as
developers are used to dealing with such constraints.

Porting tools provided by Django are inspired by this philosophy, and it’s
reflected throughout this guide.

Adding from__future__importunicode_literals at the top of your Python
modules – it’s best to put it in each and every module, otherwise you’ll
keep checking the top of your files to see which mode is in effect;

However, Django applications generally don’t need bytestrings, since Django
only exposes unicode interfaces to the programmer. Python 3 discourages using
bytestrings, except for binary data or byte-oriented interfaces. Python 2
makes bytestrings and unicode strings effectively interchangeable, as long as
they only contain ASCII data. Take advantage of this to use unicode strings
wherever possible and avoid the b prefixes.

Note

Python 2’s u prefix is a syntax error in Python 3.2 but it will be
allowed again in Python 3.3 thanks to PEP 414. Thus, this
transformation is optional if you target Python ≥ 3.3. It’s still
recommended, per the “write Python 3 code” philosophy.

Django also contains several string related classes and functions in the
django.utils.encoding and django.utils.safestring modules. Their
names used the words str, which doesn’t mean the same thing in Python 2
and Python 3, and unicode, which doesn’t exist in Python 3. In order to
avoid ambiguity and confusion these concepts were renamed bytes and
text.

In Python 2, the object model specifies __str__() and
` __unicode__()`_ methods. If these methods exist, they must return
str (bytes) and unicode (text) respectively.

The print statement and the str built-in call
__str__() to determine the human-readable representation of an
object. The unicode built-in calls ` __unicode__()`_ if it
exists, and otherwise falls back to __str__() and decodes the
result with the system encoding. Conversely, the
Model base class automatically derives
__str__() from ` __unicode__()`_ by encoding to UTF-8.

six provides compatibility functions to work around this change:
iterkeys(), iteritems(), and itervalues().
It also contains an undocumented iterlists function that works well for
django.utils.datastructures.MultiValueDict and its subclasses.

In Python 3, all strings are considered Unicode by default. The unicode
type from Python 2 is called str in Python 3, and str becomes
bytes.

You mustn’t use the u prefix before a unicode string literal because it’s
a syntax error in Python 3.2. You must prefix byte strings with b.

In order to enable the same behavior in Python 2, every module must import
unicode_literals from __future__:

from__future__importunicode_literalsmy_string="This is an unicode literal"my_bytestring=b"This is a bytestring"

If you need a byte string literal under Python 2 and a unicode string literal
under Python 3, use the str builtin:

str('my string')

In Python 3, there aren’t any automatic conversions between str and
bytes, and the codecs module became more strict. str.encode()
always returns bytes, and bytes.decode always returns str. As a
consequence, the following pattern is sometimes necessary:

If you use xrange on Python 2, import six.moves.range and use that
instead. You can also import six.moves.xrange (it’s equivalent to
six.moves.range) but the first technique allows you to simply drop the
import when dropping support for Python 2.