2007-04-21

Semantics of Python variable names from a C++ perspective

If you are going to start programming in Python and come from languages like C or C++, there are a couple of things you should know about variable names. In Python, variable names do not have the same behaviour as in C++. Part of this is clarified in the Python documentation, but I’ll try to give specific examples of things that work and don’t work as you may expect, and why.

All variables are references

Yes, that’s right. In Python, all variables are references. In C++, a reference is a type of data that behaves like a pointer but that has “normal” variable syntax. Other languages coined the name alias for this type of data. In other words, given the following C++ code:

int foo = 3;
int &bar = foo;

When creating the reference to int named bar, both foo and bar are names for the same variable or object. If you assign a value to bar, foo will see the change. They’re names for the same integer. References in C++, however, can’t be modified. You can’t make a reference start referencing a different object. Some people would argue that it’s because there’s no syntax to perform that operation. In Python, however, a reference can be modified. In fact, that’s what assignments do, because all variables are references.

For a Python beginner, it’s difficult to notice that all variables are references. That’s because they can be modified and because Python has two classes of objects: mutable objects and immutable objects. Boolean values, integers, floats and strings, among others, are immutable. A C or C++ programmer may write the following Python code:

a = 1
b = 2
c = 0
c = a + b

And they may not perceive that it’s all references. They have C++ in mind, and think those variables are all integers, and that when c = a + b is executed, the value of c (zero) is being overwritten with the value of a + b, and that’s not the case. The value of c can’t change because an integer is immutable, and c is a reference. This is what’s happening:

A new integer is created with value 1. Variable a is a reference to that object.

The same for integer value 2 and b.

The same for integer value 0 and c.

Due to the plus operator, a new integer is created with value a + b, and c is changed to refer to or point to that new object. The previous integer it was referencing (with value zero) can be discarded.

However, other data types such as lists or dictionaries are mutable, and they store references to other objects. Let’s see some examples of this variables-as-references interpretation.

Examples

Creating aliases for the same object:

>>> a = [1, 2, 3]
>>> b = a
>>> a.append(4)
>>> b
[1, 2, 3, 4]

Proof that lists store references too, and not the objects directly (this appears in the Python tutorial):

However, you can’t modify them when they’re immutable. In this example, a function receives a copy of the reference. Inside the function, this local copy is changed to refer to another object, but the original reference doesn’t change.

The case above may mislead a C or C++ programmer. If some_int is a reference and you perform a += operation, you may have expected to see the integer changed after calling the function, because you may be used to the C interpretation that the += operator modifies the integer in place. However, this is similar to the c = a + b example we used previously. Inside the function, a new integer is created as the sum of some_int and 1. some_int is then changed to refer to this new integer object. However, some_int is the function’s local copy of reference a, and changing its reference value won’t change the reference value of a. The integer object IS NOT being modified in place. Integers are immutable.

One last case that gave me problems in the past is the following: you can create a list by replicating a reference. For example:

>>> a = [0] * 3
>>> a
[0, 0, 0]

What’s happening here? First, a new integer object is created to hold the value 0. This reference is stored inside a list of one element. Then, this reference is replicated 3 times (or 2, to be precise) and a new list is created, consisting of 3 references pointing to the same integer object. If we change a list element, it will work as you expect:

>>> a[0] += 1
>>> a
[1, 0, 0]

This works because integers are immutable. As soon as you try to repeat this with mutable objects, problems appear. For example, I expected this to work when trying to create a 2×2 matrix:

>>> a = [[0] * 2] * 2
>>> a
[[0, 0], [0, 0]]

That’s apparently correct. The only problem is that it’s not, or at least not for what I expected:

>>> a[0][0] = 1
>>> a
[[1, 0], [1, 0]]

If you’ve understood up to here, you’ll know why. The “matrix” (outside list) contains 2 references to the same list object. When we change something in that list, it’s reflected in every row. How do you create a 2×2 matrix of zeros in Python? You may try to use the slice operator, which creates a new list. In particular, [:] creates a new list object that holds the same sequence of references as the original list: