The Best Gets Better

Over the last few years, the Python programming language (named after Monty Python’s Flying Circus) has gained popularity and matured. More software is being developed with Python and more contributors are donating their time, talents, and energy to enrich the language (thanks in part to a move to Sourceforge). There’s a non-profit foundation that owns and promotes Python (the Python Software Foundation) and a core group of full-time Python architects (thanks to Zope Corporporation). There’s a formal process for changing the language (Python Enhancement Proposals, or PEPs) and an international society of companies that base their business on Python (the Python Business Forum, or PBF). All sorts of programs, from convenience scripts to large, complex systems (like Zope) are being developed in Python.

Guided by Python inventor and chief architect Guido van Rossum, Python has remained true to its basic tenets: be simple, clear, readable, powerful, and backwards compatible. Python syntax is clean and relies more on readable keywords than punctuation. Simple indentation indicates block structure. The language’s features are also general and regular: there are no context-dependent “conveniences,” and thus, no surprises. “Do the simplest thing that can possibly work,” one of the mottos of the Extreme Programming movement, is a Python mantra as well.

Python 2.2, the most recent revision, reinforces these basic principles. (For a good introduction to Python, see “Python: Yes, You SHOULD Be Using It” in the April 2002 issue, available online at http://www.linuxmagazine/2002-04/python_01. html). Python 2.2 has fewer “special cases” than previous versions of the language. In fact, much of Python’s recent evolution comes from wider and more general application of existing ideas and constructs. As you will see, recent changes have made Python even more self-consistent and readable. Python is a great language for learning how to program, yet is suitable for production use in any field.

To take full advantage of everything that Python has to offer, you should install the latest release (at press time the latest version was 2.2.1) available on the Python Web site (http://www.python.org). Do not remove Python 1.5.2 if your Linux distribution relies on it — RedHat, for example, does.

Python Dictionaries

To review what’s changed in Python since version 1.5.2, let’s look at one of the most pervasive data structures in Python, the dictionary (what other languages call hashes, maps, or associative arrays).

A Python dictionary maps arbitrary keys to arbitrary values and works surprisingly fast. (Because dictionaries are implemented so efficiently, Python uses dictionaries pervasively for many kinds of “namespaces” or “symbol tables,” such as object attributes.) Listing One uses a dictionary to create an index of all words found in a set of text files. The index is stored in a file named shelf.

Lines 1-4 loop over all of the words in the text files and build a dictionary named index that maps each word to a list of locations where the word was found. Line 4 calls the dictionary method setdefault(), new since Python 2.0. We could have coded this in Python 1.5.2 (and still can since Python 2.2 is backwards compatible) as shown here:

The method has_key() checks to see if the dictionary already has a certain key. If not, it is inserted with a new empty list as the value. Then, we append the location to the list corresponding to the word. setdefault() compacts this idiom, which is commonly used when building dictionaries whose values are lists (or other dictionaries), into one expression. The single look-up is simpler and enhances performance.

Lines 6-9 loop over all distinct words — all keys in dictionary index. Line 7 in particular, for word in index:, is the obvious way to express this loop. Surprisingly enough, the ability to loop directly on a dictionary is a new feature of Python 2.2. In earlier versions, we would have coded this as:

for word in index.keys():

This older idiom uses the dictionary’s keys() method to obtain the list of keys, then loops on the list. Although it is not a major change, the Python 2.2 idiom is superior in all respects: it saves memory and processing power and is more concise and immediate. The new idiom relies on a concept called iterators, which we’ll cover later.

Also new in version 2.2 is the ability to determine if a word is in the index, by using if word in index:. In earlier versions, you’d have to use if index.has_key(word):. While you can still code it that way, the new code is much clearer, more readable, and faster. Moreover, it helps Python beginners avoid the trap of coding if word in index.keys(): which gives the right result, but is much slower than both the old and new idiom.

Specialized Dictionaries

Suppose you’ve created a dictionary like index above and want to know the number of times a particular word is used. The natural way of coding this, len(index[word]), fails if the word is never found since index[word] raises KeyError. The best approach is to use the get() method (also in 1.5.2) and specify the result when a key is not present, as in len(index.get(word,[])). However, it would be even simpler if we could make a custom dictionary that implicitly uses empty lists as the default values for missing keys, rather than having to call get() and setdefault() in code that uses the dictionary.

With Python 2.2, we can make a custom dictionary by subclassing the built-in type dict. Before Python 2.2, type and class were separate concepts. A type was coded in C or C++ in the Python runtime internals or as an extension module; a Python class was coded in Python. Now, in Python 2.2, the concepts of type and class are unified. Old-style “classic classes” still work (again, no sacrifice of backwards compatibility), but you can also have “new style” classes that inherit from a built-in type.

An example of the custom dictionary we want is shown in Listing Two. __getitem__() is the method Python calls when you index an object. Here, we override it to delegate to the superclass, dict, catch missing-key errors, and use setdefault() to install a new empty list. Now we can initialize our dictionary with index=dictOfLists(), then use simpler code, such as index[word].append(location), rather than explicitly calling setdefault().

The type dict is new in version 2.2 and lets you build a dictionary out of any sequence of (key, value) pairs — a nice complement to the list comprehension construct introduced in version 2.0, which lets you build a list in a single expression. With list comprehensions, you can say x = [(i,i*i) for i in range(10)] to create a list of pairs, x, that contains single-digit integers and their squares. In 1.5.2, you would have to do more coding as shown here:

x = []
for i in range(10):
x.append((i,i*i))

Similarly, in 2.2, the statement d = dict([(i,i*i) for i in range(10)]) creates a dictionary d that maps single-digit integers to their squares. In previous versions, you would have to code:

x = {}
for i in range(10):
x[i] = i*i

Again, you can still code things the old-fashioned way. However, the new ways are faster and more concise.

Functions (And More Dictionaries)

Functions have always been considered “first-class citizens” in Python — you can have functions as items in lists, as keys and values in dictionaries, and as arguments and/or results of other functions. With Python 2.2, it’s gotten even better.

Consider currying a function f() — wrapping f() together with predefined arguments to build a new function, callable without arguments, that will in turn call f() with the predefined arguments. Not only is this a fundamental mathematical concept, it’s also handy when you need argument-less functions to respond, for example, to a button-click in a GUI programming framework. With Python 2.2, currying is easy.

def curry(f, *args):
def curried():
return f(*args)
return curried

The form *args means “an arbitrary sequence of arguments to follow,” with the sequence being named args. Prior to Python 2.0, you could use the *args notation for formal arguments. Since 2.0, you can use *args when defining a function (formal arguments), or when calling a function (as actual arguments).

So, the function curried(), which the function curry() defines and returns as its result, can easily call f() with “whatever arguments are held in the sequence args,” with the simple and natural f(*args).

For example, say you’re using a GUI framework (that happens to look a lot like Tkinter, the Python version of the popular Tk toolkit), and want to define a button that prints a message to standard output when clicked. You could use a lambda, as in Listing Three, or define and use a named local function, as in Listing Four.

Moreover, Python 2.2 supports lexically nested scopes. A function nested within another — as curried() is nested within curry() — “sees” all local variables of the outer function (and arguments are just local variables). Therefore, the function curried() can use the values of f() (a function or other callable object) and of args which are local variables of the outer function, curry().

Even though nested scopes were introduced in Python 2.2, you can use them with 2.1, by starting your module with: from __future__ import nested_scopes. This “importing from the future” lets Python implement and offer language enhancements while preserving backwards compatibility.

However, in this 1.5.2-compatible version, you need to pass f() and args to curried() as “named values with defaults” — a trick to get them into curried()‘s namespace. You also need to use the built-in function apply() to call function f() with the arbitrary sequence of arguments args. Python 2.0 obsoletes apply() in favor of the *args form to pass arguments. Python 2.2 simplifies things further thanks to lexical scoping.

Python functions have always been able to receive named arguments as well as positional ones. Just as you can receive arbitrary positional arguments as a sequence, using *args, you can receive arbitrary named arguments as a dictionary, using **named. (And, since Python 2.0, you can also supply arbitrary positional and/or named arguments by using the same syntax when you call a function). The ability to receive arbitrary named arguments supports a dictionary-building idiom that was already popular in 1.5.2.

The dictionaries d1 and d2 are equal, and the syntax of named arguments is more obvious than that of dictionary literals. This works in the latest versions of Python as well.

Another enhancement regarding functions and dictionaries helps associate arbitrary information with a function. You could do this in version 1.5.2 by indexing into another dictionary with the function object and an identifier.

somedict = {}
somedict[curry, 'author'] = ‘me’

However, you’d have to keep track of somedict separately. Since version 2.1, you can do it more naturally: curry.author = ‘me’ (within the function or outside of it) and Python sets up the dictionary as metadata for the function object itself. Each function object can now carry its own attributes. Python 2.0 is even more object-oriented than good old 1.5.2. This is particularly helpful for development environments and other frameworks that help you handle large quantities of Python code with ease.

Python 2.2′s New Object Model

As mentioned above, you can subclass built-in types in Python 2.2 and add or override methods. A class that inherits from built-in types is a new-style class. A class that doesn’t inherit from types is called a classic class and behaves as it did back in version 1.5.2.

New-style classes allow more features than classic ones. To make your class new-style without inheriting from a specific type, just subclass object. object is a new type added in 2.2 that’s the common ancestor of all types and new-style classes.

The new-style feature you’ll probably use most often is properties. Properties are attributes that trigger methods with each “get” or “set”: if you “get” a property, the value is computed on the fly; if you “set” a property, a method is called where you can account for the property changing value. (You may be familiar with this concept from languages such as Object Pascal, used in Borland’s Kylix and Delphi products, or even Microsoft’s Visual Basic. Properties are much handier than the Java style of cloaking each attribute with accessor methods setThis() and getThat(), reducing boilerplate and enhancing polymorphism.)

You could implement properties in 1.5.2 since Python calls your object’s method __getattr__() when looking for an attribute it cannot find (and similarly with __setattr__()). See Listing Five .

The constructor method __init__(), which Python automatically calls to initialize each new instance, only binds attributes width and height. So, when some code accesses, say, r.area on an instance r of this class, Python can’t find the attribute and calls r.__getattr__(‘area’). __getattr__() recognizes the attribute name and computes the value on the fly.

This technique works, but it’s a bit clunky. Version 2.2 offers a simpler, clearer, and better-performing alternative, with the new built-in property. See Listing Six .

Now, accesses to r.area on an instance r of this class invoke r.getArea() directly and immediately.

Other extras of the new-style class (static and class methods, slots, better support for diamond-shaped inheritance graphs, the new built-in super, and custom metaclasses, among others) are more advanced, but not as pervasive and won’t be covered in this article. Except for backward compatibility needs, you’ll probably want to make all of your classes new-style.

“And a Cast of Thousands…”

These are just a few highlights from Python 2.0, 2.1, and 2.2. We haven’t even mentioned the full support for Unicode, the great XML functionality now in Python’s standard library (how many languages come with both SAX and DOM parsing), or minor language improvements such as “rich comparisons” and “augmented assignment” (you can now code x += 1).

After the flurry of releases that took Python from version 1.5.2 to version 2.2 in little over a year, the language has now reached a plateau and the development team is settling down a bit.

Backwards compatibility has been intentionally and carefully preserved from release to release despite the substantial enhancements being introduced. No major language changes are expected until the next major release, version 3.0, which is at least a couple of years away. However, bug fixes, optimizations, enhanced tools, and library improvements continue.

There’s never been a better time to give Python a try, or upgrade to the latest version if you’re already on the Python bandwagon. If Python 1.5.2 was a locomotive, then Python 2.0 is the bullet train. Climb aboard.

Alex Martelli (martm@aleax.it) lives in Italy and is Senior System Developer with AB Strakt, Sweden. He co-edited the Python Cookbook, and is currently working on the forthcoming book Python in a Nutshell.