Meta

Month: June 2015

It’s really tempting, when you first start to use Python, to use “is” rather than “==”. It’s a bit more readable, and it feels like it should just work, especially when you’re dealing with integers. In a language that uses “or” and “and” instead of “||” and “&&”, it seems logical to use “is” instead of “==”. And if you try “is” with small integers, or even with short strings, you might be lulled into thinking that you should use “is” in lots of places.

But you shouldn’t. Really, in almost no case, should you use “is”; rather, you should almost certainly use “==”. In fact, there’s only one case in which most Python programmers should be using “is”, and that’s to check to see if something is None.

In this blog post, which is the result of many questions and discussions I’ve had with students in my Python classes, I’m going to try to describe the reasons for this — and along the way, describe some parts of how Python’s objects are allocated, and what we mean when we say that two objects are “the same.”

Let’s start with the basics: Everything in Python is an object. Every object in Python has a unique ID number, which we can retrieve from an object by using the built-in “id” function:

Now, if two variables are pointing to the same object, they will (not surprisingly) return the same ID:

>>> x = [1,2,3]
>>> y = x
>>> id(x)
4504494160
>>> id(y)
4504494160

Given that x and y point to the same list, changes to the list will be reflected in both variables:

>>> x[0] = '!'
>>> y[1] = '?'
>>> x
['!', '?', 3]
>>> y
['!', '?', 3]

In such a case, it’s pretty clear that x and y are both pointing to precisely the same object. They aren’t just equal in value; they are one and the same — aliases for one another.

We can ask Python if this is true by using the “is” operator, also known as the “identity operator.” “is” doesn’t compare the values of x and y. Rather, it checks to see if x and y have the same ID. If so, then they are the same object. If not, then they aren’t. It’s as simple as that. Perhaps it goes without saying, but two objects that “is” each other are also “==” to each other, since an object’s value should be equal to itself:

>>> x == y
True
>>> x is y
True
>>> id(x) == id(y)
True

The above code shows that x and y have the same ID. This means that they “is” each other; we’re dealing with two names for the same object. Their values are thus equal, which is what “==” checks.

Again: The “is” operator returns “True” if two names are referring to the same object. And the “==” operator returns “True” if two names point to objects that contain the same value.

The most common usage, by far, is when we want to know if something is None. True, we would use “==”. But in both readability and speed, “is None” trumps “== None”. So your code should generally say:

if x is None:
print("x is None!")

It shouldn’t surprise us to find out that “is” is faster than “==”. After all, “is” is implemented in C, and is a simple comparison of the IDs of the two objects. No function call is needed, and we certainly don’t need to compare the values of the two objects, which can also take some time.

The use of “is None” works because the None object is a singleton in Python. No matter what you do, id(None) will always return the same value. (Note that this value won’t stay constant across different invocations of Python.) In other words:

So no matter how you slice it, None is a singleton. Which is why you can (and should) use “is None”, rather than “== None”, in your code.

But what happens if you decide that you want to use “is” in other places? The problem is that it will sometimes work. That “sometimes” is because “is” exposes some of Python’s internal optimizations in ways that can be a bit surprising.

Strings are how I was initially introduced to the difference between “==” and “is”, and the danger of using “is” over-zealously. Two equal strings should be “==”, but are they “is”?

>>> x = 'a' * 5
>>> y = 'a' * 5
>>> x == y
True
>>> x is y
True

Well, that’s interesting — and I got the same result in Python 2.7, 3.4, and also in PyPy. But why should this be the case? One possibility is that strings are immutable, and that having Python use a single object for each string that we create, would be efficient. And indeed, this is true — so long as the string is short:

The above, which works the same in Python 2.7, 3.4, and in PyPy, demonstrates that Python won’t reuse just any string that we have created. There is a limit. I experimented with things a bit, and I found that 21 is the magic length at which strings are no longer “is” to one another. That is:

The above was true in Python 2.7 and 3.4, and also in PyPy. However, I also found some seemingly weird behavior, which is undoubtedly because of the way in which Python byte-compiles and then executes for loops:

(Forgive the re-formatting that WordPress did to the above assignments; in Python, they were both on one line.)

I’m not sure what is going on here, but it just goes to show that you really shouldn’t use “is” unless you know what you’re doing. And even if you think that you know what you’re doing, you might still be surprised! Bottom line: Using “is” on strings is almost always a bad idea.

Now, this is generally something that we don’t need to think or care about very much. But let’s say that you’re working with large strings, and that these strings might repeat themselves on occasion. In such a case, you will end up with many copies of the same string. Python helps us to solve this problem by “interning” strings. Interning is a technique that has been around for many years in the programming world, which allows us to store only one copy of any given string. In Python 2, we use the built-in “intern” function. In Python 3, we must use sys.intern; intern is no longer a builtin.

“intern” takes a string (and only a string) as a parameter. It returns a reference — either to a new string that was created, or to a string that was already allocated. Thus, the length of the string doesn’t matter; even in the case of a long string, it will only be allocated a single time:

As you can see, using “intern” guarantees that every unique string is allocated only once. If you use “intern” on the same string a second time, Python returns a reference to the first string.

Python uses “intern” internally for a variety of purposes. If you’re working with long strings that repeat themselves, then it might be worth using intern. But for the most part, Python creates and allocates so many objects that a few strings here and there are probably not going to make a difference. Certainly, you should only use “intern” once you have identified bottlenecks.

You might think that even if strings are allocated multiple times, and are thus not “is” to one another, at least integers are going to be identical. After all, Python wouldn’t allocate new objects for numbers, would it?

We can test this pretty easily, of course:

>>> x = 200
>>> y = 200
>>> x is y
True

Well, that’s encouraging, right? Let’s try something bigger:

>>> x = 2000
>>> y = 2000
>>> x is y
False

So yes, it turns out that even integers that are equal aren’t necessarily pointing to the same object. As Amy Hanlon pointed out in her fantastic talk about Python “wats”, this is because Python pre-allocates a number of integers. If your integer is within that range, then they will use the same object, and be “is” to one another. But if you’re outside of that range, then you’ll have two separate objects. Unless, of course, you allocate them in the same line of code:

>>> x = 2000; y = 2000
>>> x is y
True

Have I mentioned that you really shouldn’t use “is” to compare objects except for None? I hope that you’re increasingly convinced.

I’ll close this post with a bit of mischief: In theory, if two objects are “is”, then they’re pointing to the same object — which means that they should be identical to one another, and thus also give us a True response to “==”. While Python doesn’t allow us to redefine “is”, we can redefine what an object says when we try to compare with using “==”:

If you’re a programmer, then you have likely heard about regular expressions (“regexps”) before. However, it’s also likely that you have tried to learn them, and have found them to be completely confusing. That’s not unusual; while regular expressions provide us with a powerful tool for analyzing text, their terse, dense, and cryptic syntax can make the effort not seem worthwhile.

On June 23rd, I’m going to be offering a one-hour free Webinar introducing regular expressions, showing how they can make your code more powerful and expressive.

While I’ll mostly be using Python, I’ll also show some other languages and platforms (e.g., Ruby, JavaScript, and the Unix “grep” command).

My demo and discussion will be about an hour long, and will be followed by ample time for Q&A. My previous Webinars have been lots of fun; I hope that you’ll join in! You can get (free) tickets at EventBrite.

And hey, if you’re an independent consultant, you can get a double dose of me on that same day; we Freelancers Show panelists will be doing our monthly Q&A just beforehand. Come and get your questions about consulting answered by our panel of experts!

Share this:

As many people know, I’ve visited China seven times over the last three years, traveling there to give courses in Python and Ruby. I just got back from my most recent trip, and found it to be as fun and exciting as ever. You could say that I’ve gotten a bit obsessed with the country; I read books about China, have been taking daily Chinese lessons since August, and publish a free weekly newsletter (Mandarin Weekly) with links to useful resources for people learning Chinese.

Given that I keep kosher and Shabbat, other religious Jews are increasingly asking me for advice on what, where, and how they can be Jewishly observant when visiting China on business or pleasure. No one in China is likely to know or care about such subjects, let alone know anything about Judaism, so it can be a bit daunting to visit there for the first time.

I’ve collected my advice into a 40-page ebook, the “Jewish guide to visiting China.” If you’re a religiously observant Jew who will be visiting China for short periods of time, then I believe this guide can significantly reduce the time (and stress) you’ll need to invest before your trip.

I’m just launching it now — and for the first week it’s online, I’m offering a discount coupon (“YouTaiRen” — aka 犹太人 — the word for “Jew” in Chinese) giving 20% off of the normal $6 price. This price includes PDF, Mobi, and ePub formats, which should suit any computer or ebook reader.