What's New in Python 2.3?

Editor's note: When Alex Martelli said he wanted to write an article about what he was "unable to include in Python in a Nutshell," my first thought was "why would we want to tell readers what they will not find when they purchase this book?" But our In a Nutshell books do not purport to include everything about a subject. As Tim O'Reilly says: "These books aren't tutorials. They take a topic and drill down, expand, and, we hope, delight the reader by providing useful information the reader didn't even expect to find." If you want to find out more about all of our In a Nutshell books, check out our recently launched nutshells.oreilly.com site. Meanwhile, read on to find out why Alex says his recent In a Nutshell book is eminently relevant as you upgrade to Python 2.3.

Introduction

Python in a Nutshell comes with a banner on the cover that says it
"Covers Python 2.2." With Python 2.3 coming soon (version 2.3
is currently in the "alpha" phase; "beta" will soon follow; and then, in
due course, there will be release candidates; then a final release),
you might justifiably worry that the forthcoming Python 2.3 is going to invalidate what you learned from Python in a Nutshell. Is it worth upgrading, or should you stick to 2.2 as
long as possible? This article answers those questions with a look at the changes and improvements to the new version, including reviews of the new modules 2.3 has to offer.

Upgrading to 2.3

Good news: Python is a stable language. New releases are always designed to
avoid breaking good Python code that worked with previous
releases. So you can keep programming to Python 2.2, upgrade
your installed Python to Python 2.3, and count on your code still
working correctly. Python in a Nutshell will be eminently applicable
regardless of which version you use.

Is it worth upgrading? You bet. With Python 2.3, you can expect
typical Python code to run about 15 percent to 20 percent faster than it did with 2.2,
since a lot of care has been devoted to optimization and fine-tuning.

Perfomance Improvements and New Modules

Even if you don't use the language and library improvements in Python
2.3, the speed gains alone make it worthwhile to upgrade. In some
cases, the gains are even more impressive, and the new timeit.py module
makes them easy to measure. For example, multiplication of long
integers uses a new, much faster algorithm ("Karatsuba multiplication"):

For this case, we see that the speedup is over 70 percent (let us note, in
passing, that multiplying a long integer by itself is the fastest way of
squaring it: 112233445566778899 ** 2, on the same machine, takes 2.727
microseconds in Python 2.2, 1.349 in Python 2.3; so, when you need to
square a long integer, remember that multiplying it by itself is over
twice as fast as raising it to the power of two).

Enhancements to the Python language itself, from 2.2 to 2.3, are
minor but helpful. As mentioned in the Nutshell, slicing of built-in
sequences now supports an optional third parameter, the stride of the
slice. For example, to get alternate characters from a string, in
either normal or reverse order, you can now just slice the string:

As you see, we can now check if any substring is "in" a given string:
the check is not limited any more to being done on a single character,
as it was up to Python 2.2.

Built-in types have gained a few more small extras. You can now open a
text file with mode U, for "universal readlines", to read it with
transparent support for all common kinds of line terminators: '\r',
'\n', and '\r\n' all translate into '\n' in this mode. Dictionaries are a bit richer, with two more ways to build them:

The new pop method of dicts takes the key as its argument, returns the
corresponding value, and removes the item from the dict, quite similarly
to the pop method that lists have long had. A dict's pop method lets
you treat missing keys in either of two ways:

When you call pop with a single argument, and that key is not in the
dict, KeyError gets raised; alternatively, you can call pop with two
arguments, and so provide a default value for the method to return if
the key isn't present.

A useful new built-in function is enumerate, which lets you loop in
parallel over a sequence and its indices:

In Python 2.3 you would not need the from __future__ import any longer for this purpose (generators are always enabled, and yield is always a
keyword), but, since this is Python 2.2 code intended to emulate the new
2.3 built-in, of course, we do need to "import generators from the
future".

As is typical of all Python upgrades, most enhancements in Python 2.3 do
not come as changes to the Python language itself, but rather can be
found in Python's vast standard library. In many cases, this means you
can take the Python sources of a new 2.3 library module... and sneak
it into a 2.2 installation that you cannot entirely upgrade for whatever
reason--this will not always work, as the new module may take
advantage of language innovations; but often it will, and if you find
yourself in such a situation it may be worth a try.

Library enhancements can be generally divided into improvements to
existing modules, and entirely new modules. However, in the specific
case of Python 2.3, one important enhancement is the removal of two
modules: rexec and Bastion, which are discussed in the
"Restricted Execution" section of the Nutshell's Chapter 13
("Controlling Execution"). It has been discovered that these modules
present unfixable and exploitable security flaws, and therefore they
have been officially declared "dead," with immediate effect and without
the usual backwards-compatibility precautions. Security weaknesses do
require such immediate and drastic action.

Research is ongoing on alternative ways to let your Python applications
execute "untrusted" Python code in safe ways; for example, I recommend
taking a look at the experimental Sandbox.py module that you can find at
www.procoders.net/download.php?fname=SandBox.py. However, until such alternatives have been thoroughly examined by security experts, and released as
approved and secure, I recommend you do not yet rely on them for
production work that does require high security.

Some of the enhancements to existing modules were known early enough
that I was able to mention them in Python in a Nutshell; for example, all
sockets from standard module socket can now optionally display timeout
behavior. Other enhancements are nearly "transparent" to your
application code. For example, the module random uses a new, random number
generator (the "Mersenne Twister") with a longer period; the pickle module can use
a new and more efficient pickling protocol; and bsddb supports newer
versions of the underlying Sleepycat Berkeley DB library.

Python 2.3's standard library also comes with many new modules. Some
are analogous to existing ones, but are better: for example, bz2 lets your
application use the bzip2 compression library, which can compress data
better than gzip; optparse lets you parse command-line options, like
getopt but with more power; textwrap reformats text into paragraphs, as
you could previously do with some of the classes supplied by module
formatter, but in simpler and more flexible ways.

Other new modules offer completely new functionality. The datetime module offers quick date and time calculations; to compute, for example, the number of days between two dates, you can now use very simple code:

The heapq module implements functions that let you use a list as a heap-queue (also known as a priority queue). This new module doesn't
directly implement a priority queue class, but it does make it trivial
to build one yourself, depending on the exact details of your
application's needs. For example, you could code:

Here, each "arriving" item with a given cost is added to a PriorityQueue
instance pq by calling pg.arrival(cost, item); at any time, provided pq is non-empty, the "best" (cheapest) item that is still present in the
queue can be obtained (and removed) by calling pq.departure().

The itertools module implements simple and fast "building blocks" to
build, modify, and combine iterators, letting you construct flexible and
memory-efficient loops in very simple ways. For example, yet another
way to simulate the new enumerate built-in function would be:

The logging module implements a complete, powerful, and flexible system
for logging error and warning messages. Despite the logging system's
richness, you can use it quite simply, as in the following snippet:

import logging
...
if username not in known_users:
logging.warning("User %s not known", username)

This code can just ask the system to "log a warning-level message", and
leave it up to the system's runtime configuration to determine where
(and if) a message of such a level will be stored and/or displayed.

The sets module offers a new datatype corresponding to the mathematical
concept of "set". For example, given two strings, a simple and
straightforward way to get a string that is made up of all characters
present in both (such as a "set intersection" of strings) is now:

The resulting order of the characters is arbitrary: sets, like
dictionaries, do not even have a concept of "order" in their items.

The zipimport module lets you import modules from .zip files (without
having to unzip such files first); zipimport is now automatically used
by the import statement if you just place a .zip file into your
modules-import path.

Altogether, the rich crop of new modules let you build Python programs
with more productivity and ease than before--and your Python
programs can be faster to run, and simpler and faster for you
to write. Thus, I recommend the upgrade, without reservations.

Alex Martelli
currently works for AB
Strakt, a Python-centered software house in Göteborg, Sweden, mostly by
telecommuting from his home in Bologna, Italy.