subprocess.check_output returns a bytestring with the filenames on my desktop. To deal with them in a more serious way, and to have the ASCII 10 characters actually function as newlines, I need to invoke the “decode” method, which results in a string:

output = subprocess.check_output('ls').decode('utf-8')
print(output)

This is great, until I want to pass one or more arguments to my “ls” command. My first attempt might look like this:

The most important part of this error message is the final line, in which the system complains that I cannot find the program “ls -l”. That’s right — it thought that the command + option was a single program name, and failed to find that program.

Now, before you go and complain that this doesn’t make any sense, remember that filenames may contain space characters. And that there’s no difference between a “command” and any other file, except for the way that it’s interpreted by the operating system. It might be a bit weird to have a command whose name contains a space, but that’s a matter of convention, not technology.

Remember, though, that when a Python program is invoked, we can look at sys.argv, a list of the user’s arguments. Always, sys.argv[0] is the program’s name itself. We can thus see an analog here, in that when we invoke another program, we also need to pass that program’s name as the first element of a list, and the arguments as subsequent list elements.

Oh, no! Python thought that I was trying to find the literal file named “*.py”, which clearly doesn’t exist.

It’s here that we discover that when Python connects to external programs, it does so on its own, without making use of the Unix shell’s expansion capabilities. Such expansion, which is often known as “globbing,” is available via the Python “glob” module in the standard library. We could use that to get a list of files, but it seems weird that when I invoke a command-line program, I can’t rely on it to expand the argument.

But wait: Maybe there is a way to do this! Many functions in the “subprocess” module, including check_output, have a “shell” parameter whose default value is “False”. But if I set it to “True”, then a Unix shell is invoked between Python and the command we’re running. The shell will surely expand our star, and let us list all of the Python programs in the current directory, right?

Hmm. We didn’t get an error. But we also didn’t get what we wanted. This is mighty strange.

The solution, it turns out, is to pass everything — command and arguments, including the *.py — as a single string, and not as a list. When you’re invoking commands with shell=True, you’re basically telling Python that the shell should break apart your arguments and expand them. If you pass a list to the shell, then the parsing is done the wrong number of times, and in the wrong places, and you get the sort of mess I showed above. And indeed, with shell=True and a string as the first argument, subprocess.check_output does the right thing:

The bottom line is that you can get globbing to work when invoking commands via subprocess.check_output. But you need to know what’s going on behind the scenes, and what shell=True does (and doesn’t) do, to make it work.

One of Python’s mantras is “batteries included.” which means that even with a bare-bones installation, you can do quite a bit. You can (and should) install packages from PyPI, but many day-to-day tasks can be accomplished with just the built-in data structures, functions, and methods.

What I’ve discovered over the years is that some of the functions and methods have useful parameters that can make our code shorter and more elegant. Here are some of the most elegant ones that I’ve found and use in my work:

1. str.split (part 1)

One of the methods I use most often in my work is str.split. This method always returns a list, breaking the string into different elements. For example:

Yuck! Of course, this is one of those times that the computer does what we tell it, not what we want: We said that every time it encounters a space character, it should give us a new element in the output list. Sure enough, by having multiple space characters between the letters, we end up having lots of empty strings in our resulting list.

What’s worse is that I often use str.split to take input from users, or from files, and break it into individual elements. I’d love to break on one or more whitespace characters, ideally without reverting to the “re” module’s re.split(‘\s’).

Solution: Don’t pass any argument. The first parameter, named “sep”, has a default value of None. And when it has a value of None, str.split does indeed use one or more whitespace characters. That’s right — str.split, when called with zero arguments, actually does more (and is often more useful) then when called with an argument:

2. str.split (part 2)

Let’s say you ask a user to enter their name, which you want to split into first and last names. For example:

In [13]: person = raw_input("Enter your name: ") # "input" in Python 3
Enter your name: Reuven Lerner
In [14]: first_name, last_name = person.split()
In [15]: print("First name is '{}', last name is '{}'".format(first_name,
last_name))
First name is 'Reuven', last name is 'Lerner'

Line 14 uses Python’s unpacking; since we know that the list produced by person.split() will contain two elements, I can safely assign those two elements into two variables (first_name and last_name).

Yikes! str.split() returned a list of three elements. And you cannot assign three elements into two variables. (Fine, Python 3 does allow for this, but we won’t discuss this here.)

We can, however, tell str.split how many times it should split. This is done by passing the second, optional argument. If you want to split on all whitespace, as we saw above, then you must explicitly pass None as the first argument:

In [18]: first_name, last_name = person.split(None, 1)
In [19]: print("First name is '{}', last name is '{}'".format(first_name,
...: last_name))
First name is 'Reuven', last name is 'Moshe Lerner'

Remember that the second parameter is called “maxsplits”, meaning the number of times str.split should do its thing. This means that the number you give will be the index of the final element in the returned list. In other words: I call person.split(None, 1), which means that I’ll get back a list with two elements — the latter of which has an index of 1.

What if I want to split things in the other direction, such that my first and middle names are in the first variable, and just my last name in the second variable? We can use a variant of str.split called str.rsplit (“right-side split”):

In [20]: first_name, last_name = person.rsplit(None, 1)
In [21]: print("First name is '{}', last name is '{}'".format(first_name,
...: last_name))
First name is 'Reuven Moshe', last name is 'Lerner'

3. enumerate

“enumerate” is a built-in function that exists to let us number things as we iterate over them. For example, let’s assume that I have a string, and want to print the letters of the string:

Because this is such a common thing that people want to do, we can instead use enumerate, which returns an iterator that produces tuples. Each tuple contains two elements, the first of which is the index and the second of which is the element from the enumerated sequence. Because we know that each tuple will contain two elements, we can grab them with unpacking:

But wait, what if you are presenting this information to a non-programmer, for whom it seems weird to start numbering with zero? One solution is to send them to a programming course, but if there’s no time, then you can pass “enumerate” a second argument, the number with which numbering should start:

4. int()

“int” is the integer type, widely used in Python to represent whole numbers. Many Python developers know that we can turn strings into integers by invoking the “int” function.

Of course, there’s not really an “int function.” Instead, we’re using “int” as a class to create a new instance of int. So when I say

int('5')

I get a new instance of “int” back, with the value 5. And when I say

int('12345')

I get back a different instance of “int”, with the value 12345.

But “int” takes a second argument, which lets us tell Python the base of the source data. For example, if I say

int('12345', 16)

then we get back 74565, because we asked Python to give us the value of 0x12345.

You can actually use any number base you want, from 1 through 36 — which is particularly useful for those of us with 36 fingers. But it’s not uncommon for my clients to be reading from files containing hexadecimal numbers. For example, let’s assume that we want to sum the hex numbers on each line of the following file:

10 20 30 4a 5b 6c
ff ef fa 00 20 3b
ab cd ef af be cd

We can do something like this:

In [35]: for one_line in open('hexnums.txt'):
...: print(sum([int(one_number, 16)
...: for one_number in one_line.split()]))

In other words:

We open the file, and read it line by line

We split each line on whitespace, resulting in a list

We interpret each number in hex

The resulting list of integers can then be passed to the “sum” function

We print the sum for each line

If you work with files that contain binary, octal, or hex numbers, this can really be handy. To be honest, I don’t do this very much — but I work with a number of companies that do, and for whom this is a real lifesaver.

5. dict.get

Dictionaries are everywhere in Python. They’re easy to define, and easy to work with. For example:

Oh, right — but if you request a key that doesn’t exist, you’re going to get a KeyError exception.

Let’s write a little program that lets the user repeatedly query a dictionary. If the user gives us an empty string, then we’ll exit from the loop, but otherwise we’ll either print the value associated with the key, or give an error:

This works fine, and is a pretty standard way that I’ve seen people check for keys in order to avoid exceptions. But often, the dict.get method will work even better. Basically, dict.get does the same thing as square brackets ([ ]), except that if the key doesn’t exist, it returns None. For example:

Notice that when the key does exist, there’s no difference between d[k] and d.get(k). The question is how you want to deal with a key that doesn’t exist, and dict.get often makes life much easier in these cases.

What optional parameters do you find most useful in Python?

Also: I’m teaching three live Python courses (on functional programming, advanced objects, and decorators) in the next few weeks; early-bird tickets are still available for a limited time. Grab them now, and improve your Python fluency from your own computer.

If you’re like many of the Python developers I know, the basics are easy for you: Strings, lists, tuples, dictionaries, functions, and even objects roll off of your fingers and onto your keyboard. Your day-to-day tasks have become significantly easier as a result of Python, and you’re comfortable using it for tasks at work and home.

But some parts of Python remain difficult, mysterious, and outside of your comfort zone:

When you want to use a list comprehension, you have to go to Stack Overflow to remember how they work — to say nothing of set and dict comprehensions.

You know that there is a difference between functions and methods, but you can’t quite put your foot on what that difference is, or how Python rewrites “self” to be the first argument to every method.

You keep hearing about “decorators,” and how they allow you to do all sorts of magical things to functions and classes — but every time you start reading about them, you get confused or distracted.

Sound familiar? If so, then I want to help.

As you probably know, I spend just about every day at one of the world’s best companies — Apple, Cisco, IBM, PayPal, VMWare, and Western Digital, among others — teaching their engineers how to use Python.

The engineers who learn these techniques benefit by having more “tools in their toolbox,” as I like to put it; when a problem presents itself, they have more options at their disposal. I help them to solve new types of problems, or to solve existing problems more quickly. These engineers become more valuable to their employers, and more valuable on the larger job market.

Each of these classes will run live, for five hours (with two 15-minute breaks):

New York: 7 a.m. – 2 p.m.

London: 12 noon – 5 p.m.

Israel: 2 p.m. – 7 p.m.

Mumbai: 5:30 p.m. – 10:30 p.m.

Each will be packed with lectures, accompanied by tons of live-coding examples, many exercises that you’ll be expected to solve (and which we’ll review together when you’re done), and plenty of time for interactions and questions. Indeed, please come with lots of questions, to make the class more interesting and relevant.

Whenever I teach Python courses, most of my students are using Windows. And thus, when it comes time to do an exercise, I inevitably end up with someone who does the following:

for one_line in open('c:\abc\def\ghi'):
print(one_line)

The above code looks like it should work. But it almost certainly doesn’t. Why? Because backslashes (\) in Python strings are used to insert special characters. For example, \n inserts a newline, and \t inserts a tab. So when we create the above string, we think that we’re entering a simple path — but we’re actually entering a string containing ASCII 7, the alarm bell.

Experienced programmers are used to looking for \n and \t in their code. But \a (alarm bell) and \v (vertical tab), for example, tend to surprise many of them. And if you aren’t an experienced programmer? Then you’re totally baffled why the pathname you’ve entered, and copied so precisely from Windows, results in a “file not found” error.

One way to solve this problem is by escaping the backslashes before the problematic characters. If you want a literal “\n” in your text, then put “\\n” in your string. By the same token, you can say “\\a” or “\\v”. But let’s be honest; remembering which characters require a doubled backslash is a matter of time, experience, and discipline.

(And yes, you can use regular, Unix-style forward slashes on Windows. But that is generally met by even more baffled looks than the notion of a “vertical tab.”)

You might as well double all of the backslashes — but doing that is really annoying. Which is where “raw strings” come into play in Python.

A “raw string” is basically a “don’t do anything special with the contents” string — a what-you-see-is-what-you-get string. It’s actually not a different type of data, but rather a way to enter strings in which you want to escape all backslashes. Just preface the opening quote (or double quotes) with the “r” character, and your string will be defined with all backslashes escaped. For example, if you say:

print("abc\ndef\nghi")

then you’ll see

abc
def
ghi

But if you say:

print(r"abc\ndef\nghi")

then you’ll see

abc\ndef\nghi

I suggest using raw strings whenever working with pathnames on Windows; it allows you to avoid guessing which characters require escaping. I also use them whenever I’m writing regular expressions in Python, especially if I’m using \b (for word boundaries) or using backreferences to groups.

Raw strings are one of those super-simple ideas that can have big implications on the readability of your code, as well as your ability to avoid problems. Avoid such problems; use raw strings for all of your Windows pathnames, and you’ll be able to devote your attention to fixing actual bugs in your code.