Meta

Raw strings to the rescue!

Whenever I teach Python courses, most of my students are using Windows. And thus, when it comes time to do an exercise, I inevitably end up with someone who does the following:

for one_line in open('c:\abc\def\ghi'):
print(one_line)

The above code looks like it should work. But it almost certainly doesn’t. Why? Because backslashes (\) in Python strings are used to insert special characters. For example, \n inserts a newline, and \t inserts a tab. So when we create the above string, we think that we’re entering a simple path — but we’re actually entering a string containing ASCII 7, the alarm bell.

Experienced programmers are used to looking for \n and \t in their code. But \a (alarm bell) and \v (vertical tab), for example, tend to surprise many of them. And if you aren’t an experienced programmer? Then you’re totally baffled why the pathname you’ve entered, and copied so precisely from Windows, results in a “file not found” error.

One way to solve this problem is by escaping the backslashes before the problematic characters. If you want a literal “\n” in your text, then put “\\n” in your string. By the same token, you can say “\\a” or “\\v”. But let’s be honest; remembering which characters require a doubled backslash is a matter of time, experience, and discipline.

(And yes, you can use regular, Unix-style forward slashes on Windows. But that is generally met by even more baffled looks than the notion of a “vertical tab.”)

You might as well double all of the backslashes — but doing that is really annoying. Which is where “raw strings” come into play in Python.

A “raw string” is basically a “don’t do anything special with the contents” string — a what-you-see-is-what-you-get string. It’s actually not a different type of data, but rather a way to enter strings in which you want to escape all backslashes. Just preface the opening quote (or double quotes) with the “r” character, and your string will be defined with all backslashes escaped. For example, if you say:

print("abc\ndef\nghi")

then you’ll see

abc
def
ghi

But if you say:

print(r"abc\ndef\nghi")

then you’ll see

abc\ndef\nghi

I suggest using raw strings whenever working with pathnames on Windows; it allows you to avoid guessing which characters require escaping. I also use them whenever I’m writing regular expressions in Python, especially if I’m using \b (for word boundaries) or using backreferences to groups.

Raw strings are one of those super-simple ideas that can have big implications on the readability of your code, as well as your ability to avoid problems. Avoid such problems; use raw strings for all of your Windows pathnames, and you’ll be able to devote your attention to fixing actual bugs in your code.