****************************
What's New in Python 2.2
****************************
:Author: A.M. Kuchling
.. |release| replace:: 1.02
.. $Id: whatsnew22.tex 37315 2004-09-10 19:33:00Z akuchling $
Introduction
============
This article explains the new features in Python 2.2.2, released on October 14,
2002. Python 2.2.2 is a bugfix release of Python 2.2, originally released on
December 21, 2001.
Python 2.2 can be thought of as the "cleanup release". There are some features
such as generators and iterators that are completely new, but most of the
changes, significant and far-reaching though they may be, are aimed at cleaning
up irregularities and dark corners of the language design.
This article doesn't attempt to provide a complete specification of the new
features, but instead provides a convenient overview. For full details, you
should refer to the documentation for Python 2.2, such as the `Python Library
Reference `_ and the `Python
Reference Manual `_. If you want to
understand the complete implementation and design rationale for a change, refer
to the PEP for a particular new feature.
.. see also, now defunct
http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm
"What's So Special About Python 2.2?" is also about the new 2.2 features, and
was written by Cameron Laird and Kathryn Soraiz.
.. ======================================================================
PEPs 252 and 253: Type and Class Changes
========================================
The largest and most far-reaching changes in Python 2.2 are to Python's model of
objects and classes. The changes should be backward compatible, so it's likely
that your code will continue to run unchanged, but the changes provide some
amazing new capabilities. Before beginning this, the longest and most
complicated section of this article, I'll provide an overview of the changes and
offer some comments.
A long time ago I wrote a Web page listing flaws in Python's design. One of the
most significant flaws was that it's impossible to subclass Python types
implemented in C. In particular, it's not possible to subclass built-in types,
so you can't just subclass, say, lists in order to add a single useful method to
them. The :mod:`UserList` module provides a class that supports all of the
methods of lists and that can be subclassed further, but there's lots of C code
that expects a regular Python list and won't accept a :class:`UserList`
instance.
Python 2.2 fixes this, and in the process adds some exciting new capabilities.
A brief summary:
* You can subclass built-in types such as lists and even integers, and your
subclasses should work in every place that requires the original type.
* It's now possible to define static and class methods, in addition to the
instance methods available in previous versions of Python.
* It's also possible to automatically call methods on accessing or setting an
instance attribute by using a new mechanism called :dfn:`properties`. Many uses
of :meth:`__getattr__` can be rewritten to use properties instead, making the
resulting code simpler and faster. As a small side benefit, attributes can now
have docstrings, too.
* The list of legal attributes for an instance can be limited to a particular
set using :dfn:`slots`, making it possible to safeguard against typos and
perhaps make more optimizations possible in future versions of Python.
Some users have voiced concern about all these changes. Sure, they say, the new
features are neat and lend themselves to all sorts of tricks that weren't
possible in previous versions of Python, but they also make the language more
complicated. Some people have said that they've always recommended Python for
its simplicity, and feel that its simplicity is being lost.
Personally, I think there's no need to worry. Many of the new features are
quite esoteric, and you can write a lot of Python code without ever needed to be
aware of them. Writing a simple class is no more difficult than it ever was, so
you don't need to bother learning or teaching them unless they're actually
needed. Some very complicated tasks that were previously only possible from C
will now be possible in pure Python, and to my mind that's all for the better.
I'm not going to attempt to cover every single corner case and small change that
were required to make the new features work. Instead this section will paint
only the broad strokes. See section :ref:`sect-rellinks`, "Related Links", for
further sources of information about Python 2.2's new object model.
Old and New Classes
-------------------
First, you should know that Python 2.2 really has two kinds of classes: classic
or old-style classes, and new-style classes. The old-style class model is
exactly the same as the class model in earlier versions of Python. All the new
features described in this section apply only to new-style classes. This
divergence isn't intended to last forever; eventually old-style classes will be
dropped, possibly in Python 3.0.
So how do you define a new-style class? You do it by subclassing an existing
new-style class. Most of Python's built-in types, such as integers, lists,
dictionaries, and even files, are new-style classes now. A new-style class
named :class:`object`, the base class for all built-in types, has also been
added so if no built-in type is suitable, you can just subclass
:class:`object`::
class C(object):
def __init__ (self):
...
...
This means that :keyword:`class` statements that don't have any base classes are
always classic classes in Python 2.2. (Actually you can also change this by
setting a module-level variable named :attr:`__metaclass__` --- see :pep:`253`
for the details --- but it's easier to just subclass :keyword:`object`.)
The type objects for the built-in types are available as built-ins, named using
a clever trick. Python has always had built-in functions named :func:`int`,
:func:`float`, and :func:`str`. In 2.2, they aren't functions any more, but
type objects that behave as factories when called. ::
>>> int
>>> int('123')
123
To make the set of types complete, new type objects such as :func:`dict` and
:func:`file` have been added. Here's a more interesting example, adding a
:meth:`lock` method to file objects::
class LockableFile(file):
def lock (self, operation, length=0, start=0, whence=0):
import fcntl
return fcntl.lockf(self.fileno(), operation,
length, start, whence)
The now-obsolete :mod:`posixfile` module contained a class that emulated all of
a file object's methods and also added a :meth:`lock` method, but this class
couldn't be passed to internal functions that expected a built-in file,
something which is possible with our new :class:`LockableFile`.
Descriptors
-----------
In previous versions of Python, there was no consistent way to discover what
attributes and methods were supported by an object. There were some informal
conventions, such as defining :attr:`__members__` and :attr:`__methods__`
attributes that were lists of names, but often the author of an extension type
or a class wouldn't bother to define them. You could fall back on inspecting
the :attr:`__dict__` of an object, but when class inheritance or an arbitrary
:meth:`__getattr__` hook were in use this could still be inaccurate.
The one big idea underlying the new class model is that an API for describing
the attributes of an object using :dfn:`descriptors` has been formalized.
Descriptors specify the value of an attribute, stating whether it's a method or
a field. With the descriptor API, static methods and class methods become
possible, as well as more exotic constructs.
Attribute descriptors are objects that live inside class objects, and have a few
attributes of their own:
* :attr:`__name__` is the attribute's name.
* :attr:`__doc__` is the attribute's docstring.
* ``__get__(object)`` is a method that retrieves the attribute value from
*object*.
* ``__set__(object, value)`` sets the attribute on *object* to *value*.
* ``__delete__(object, value)`` deletes the *value* attribute of *object*.
For example, when you write ``obj.x``, the steps that Python actually performs
are::
descriptor = obj.__class__.x
descriptor.__get__(obj)
For methods, :meth:`descriptor.__get__` returns a temporary object that's
callable, and wraps up the instance and the method to be called on it. This is
also why static methods and class methods are now possible; they have
descriptors that wrap up just the method, or the method and the class. As a
brief explanation of these new kinds of methods, static methods aren't passed
the instance, and therefore resemble regular functions. Class methods are
passed the class of the object, but not the object itself. Static and class
methods are defined like this::
class C(object):
def f(arg1, arg2):
...
f = staticmethod(f)
def g(cls, arg1, arg2):
...
g = classmethod(g)
The :func:`staticmethod` function takes the function :func:`f`, and returns it
wrapped up in a descriptor so it can be stored in the class object. You might
expect there to be special syntax for creating such methods (``def static f``,
``defstatic f()``, or something like that) but no such syntax has been defined
yet; that's been left for future versions of Python.
More new features, such as slots and properties, are also implemented as new
kinds of descriptors, and it's not difficult to write a descriptor class that
does something novel. For example, it would be possible to write a descriptor
class that made it possible to write Eiffel-style preconditions and
postconditions for a method. A class that used this feature might be defined
like this::
from eiffel import eiffelmethod
class C(object):
def f(self, arg1, arg2):
# The actual function
...
def pre_f(self):
# Check preconditions
...
def post_f(self):
# Check postconditions
...
f = eiffelmethod(f, pre_f, post_f)
Note that a person using the new :func:`eiffelmethod` doesn't have to understand
anything about descriptors. This is why I think the new features don't increase
the basic complexity of the language. There will be a few wizards who need to
know about it in order to write :func:`eiffelmethod` or the ZODB or whatever,
but most users will just write code on top of the resulting libraries and ignore
the implementation details.
Multiple Inheritance: The Diamond Rule
--------------------------------------
Multiple inheritance has also been made more useful through changing the rules
under which names are resolved. Consider this set of classes (diagram taken
from :pep:`253` by Guido van Rossum)::
class A:
^ ^ def save(self): ...
/ \
/ \
/ \
/ \
class B class C:
^ ^ def save(self): ...
\ /
\ /
\ /
\ /
class D
The lookup rule for classic classes is simple but not very smart; the base
classes are searched depth-first, going from left to right. A reference to
:meth:`D.save` will search the classes :class:`D`, :class:`B`, and then
:class:`A`, where :meth:`save` would be found and returned. :meth:`C.save`
would never be found at all. This is bad, because if :class:`C`'s :meth:`save`
method is saving some internal state specific to :class:`C`, not calling it will
result in that state never getting saved.
New-style classes follow a different algorithm that's a bit more complicated to
explain, but does the right thing in this situation. (Note that Python 2.3
changes this algorithm to one that produces the same results in most cases, but
produces more useful results for really complicated inheritance graphs.)
#. List all the base classes, following the classic lookup rule and include a
class multiple times if it's visited repeatedly. In the above example, the list
of visited classes is [:class:`D`, :class:`B`, :class:`A`, :class:`C`,
:class:`A`].
#. Scan the list for duplicated classes. If any are found, remove all but one
occurrence, leaving the *last* one in the list. In the above example, the list
becomes [:class:`D`, :class:`B`, :class:`C`, :class:`A`] after dropping
duplicates.
Following this rule, referring to :meth:`D.save` will return :meth:`C.save`,
which is the behaviour we're after. This lookup rule is the same as the one
followed by Common Lisp. A new built-in function, :func:`super`, provides a way
to get at a class's superclasses without having to reimplement Python's
algorithm. The most commonly used form will be ``super(class, obj)``, which
returns a bound superclass object (not the actual class object). This form
will be used in methods to call a method in the superclass; for example,
:class:`D`'s :meth:`save` method would look like this::
class D (B,C):
def save (self):
# Call superclass .save()
super(D, self).save()
# Save D's private information here
...
:func:`super` can also return unbound superclass objects when called as
``super(class)`` or ``super(class1, class2)``, but this probably won't
often be useful.
Attribute Access
----------------
A fair number of sophisticated Python classes define hooks for attribute access
using :meth:`__getattr__`; most commonly this is done for convenience, to make
code more readable by automatically mapping an attribute access such as
``obj.parent`` into a method call such as ``obj.get_parent``. Python 2.2 adds
some new ways of controlling attribute access.
First, ``__getattr__(attr_name)`` is still supported by new-style classes,
and nothing about it has changed. As before, it will be called when an attempt
is made to access ``obj.foo`` and no attribute named ``foo`` is found in the
instance's dictionary.
New-style classes also support a new method,
``__getattribute__(attr_name)``. The difference between the two methods is
that :meth:`__getattribute__` is *always* called whenever any attribute is
accessed, while the old :meth:`__getattr__` is only called if ``foo`` isn't
found in the instance's dictionary.
However, Python 2.2's support for :dfn:`properties` will often be a simpler way
to trap attribute references. Writing a :meth:`__getattr__` method is
complicated because to avoid recursion you can't use regular attribute accesses
inside them, and instead have to mess around with the contents of
:attr:`__dict__`. :meth:`__getattr__` methods also end up being called by Python
when it checks for other methods such as :meth:`__repr__` or :meth:`__coerce__`,
and so have to be written with this in mind. Finally, calling a function on
every attribute access results in a sizable performance loss.
:class:`property` is a new built-in type that packages up three functions that
get, set, or delete an attribute, and a docstring. For example, if you want to
define a :attr:`size` attribute that's computed, but also settable, you could
write::
class C(object):
def get_size (self):
result = ... computation ...
return result
def set_size (self, size):
... compute something based on the size
and set internal state appropriately ...
# Define a property. The 'delete this attribute'
# method is defined as None, so the attribute
# can't be deleted.
size = property(get_size, set_size,
None,
"Storage size of this instance")
That is certainly clearer and easier to write than a pair of
:meth:`__getattr__`/:meth:`__setattr__` methods that check for the :attr:`size`
attribute and handle it specially while retrieving all other attributes from the
instance's :attr:`__dict__`. Accesses to :attr:`size` are also the only ones
which have to perform the work of calling a function, so references to other
attributes run at their usual speed.
Finally, it's possible to constrain the list of attributes that can be
referenced on an object using the new :attr:`__slots__` class attribute. Python
objects are usually very dynamic; at any time it's possible to define a new
attribute on an instance by just doing ``obj.new_attr=1``. A new-style class
can define a class attribute named :attr:`__slots__` to limit the legal
attributes to a particular set of names. An example will make this clear::
>>> class C(object):
... __slots__ = ('template', 'name')
...
>>> obj = C()
>>> print obj.template
None
>>> obj.template = 'Test'
>>> print obj.template
Test
>>> obj.newattr = None
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'C' object has no attribute 'newattr'
Note how you get an :exc:`AttributeError` on the attempt to assign to an
attribute not listed in :attr:`__slots__`.
.. _sect-rellinks:
Related Links
-------------
This section has just been a quick overview of the new features, giving enough
of an explanation to start you programming, but many details have been
simplified or ignored. Where should you go to get a more complete picture?
https://www.python.org/2.2/descrintro.html is a lengthy tutorial introduction to
the descriptor features, written by Guido van Rossum. If my description has
whetted your appetite, go read this tutorial next, because it goes into much
more detail about the new features while still remaining quite easy to read.
Next, there are two relevant PEPs, :pep:`252` and :pep:`253`. :pep:`252` is
titled "Making Types Look More Like Classes", and covers the descriptor API.
:pep:`253` is titled "Subtyping Built-in Types", and describes the changes to
type objects that make it possible to subtype built-in objects. :pep:`253` is
the more complicated PEP of the two, and at a few points the necessary
explanations of types and meta-types may cause your head to explode. Both PEPs
were written and implemented by Guido van Rossum, with substantial assistance
from the rest of the Zope Corp. team.
Finally, there's the ultimate authority: the source code. Most of the machinery
for the type handling is in :file:`Objects/typeobject.c`, but you should only
resort to it after all other avenues have been exhausted, including posting a
question to python-list or python-dev.
.. ======================================================================
PEP 234: Iterators
==================
Another significant addition to 2.2 is an iteration interface at both the C and
Python levels. Objects can define how they can be looped over by callers.
In Python versions up to 2.1, the usual way to make ``for item in obj`` work is
to define a :meth:`__getitem__` method that looks something like this::
def __getitem__(self, index):
return
:meth:`__getitem__` is more properly used to define an indexing operation on an
object so that you can write ``obj[5]`` to retrieve the sixth element. It's a
bit misleading when you're using this only to support :keyword:`for` loops.
Consider some file-like object that wants to be looped over; the *index*
parameter is essentially meaningless, as the class probably assumes that a
series of :meth:`__getitem__` calls will be made with *index* incrementing by
one each time. In other words, the presence of the :meth:`__getitem__` method
doesn't mean that using ``file[5]`` to randomly access the sixth element will
work, though it really should.
In Python 2.2, iteration can be implemented separately, and :meth:`__getitem__`
methods can be limited to classes that really do support random access. The
basic idea of iterators is simple. A new built-in function, ``iter(obj)``
or ``iter(C, sentinel)``, is used to get an iterator. ``iter(obj)`` returns
an iterator for the object *obj*, while ``iter(C, sentinel)`` returns an
iterator that will invoke the callable object *C* until it returns *sentinel* to
signal that the iterator is done.
Python classes can define an :meth:`__iter__` method, which should create and
return a new iterator for the object; if the object is its own iterator, this
method can just return ``self``. In particular, iterators will usually be their
own iterators. Extension types implemented in C can implement a :c:member:`~PyTypeObject.tp_iter`
function in order to return an iterator, and extension types that want to behave
as iterators can define a :c:member:`~PyTypeObject.tp_iternext` function.
So, after all this, what do iterators actually do? They have one required
method, :meth:`next`, which takes no arguments and returns the next value. When
there are no more values to be returned, calling :meth:`next` should raise the
:exc:`StopIteration` exception. ::
>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
File "", line 1, in ?
StopIteration
>>>
In 2.2, Python's :keyword:`for` statement no longer expects a sequence; it
expects something for which :func:`iter` will return an iterator. For backward
compatibility and convenience, an iterator is automatically constructed for
sequences that don't implement :meth:`__iter__` or a :c:member:`~PyTypeObject.tp_iter` slot, so
``for i in [1,2,3]`` will still work. Wherever the Python interpreter loops
over a sequence, it's been changed to use the iterator protocol. This means you
can do things like this::
>>> L = [1,2,3]
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)
Iterator support has been added to some of Python's basic types. Calling
:func:`iter` on a dictionary will return an iterator which loops over its keys::
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10
That's just the default behaviour. If you want to iterate over keys, values, or
key/value pairs, you can explicitly call the :meth:`iterkeys`,
:meth:`itervalues`, or :meth:`iteritems` methods to get an appropriate iterator.
In a minor related change, the :keyword:`in` operator now works on dictionaries,
so ``key in dict`` is now equivalent to ``dict.has_key(key)``.
Files also provide an iterator, which calls the :meth:`readline` method until
there are no more lines in the file. This means you can now read each line of a
file using code like this::
for line in file:
# do something for each line
...
Note that you can only go forward in an iterator; there's no way to get the
previous element, reset the iterator, or make a copy of it. An iterator object
could provide such additional capabilities, but the iterator protocol only
requires a :meth:`next` method.
.. seealso::
:pep:`234` - Iterators
Written by Ka-Ping Yee and GvR; implemented by the Python Labs crew, mostly by
GvR and Tim Peters.
.. ======================================================================
PEP 255: Simple Generators
==========================
Generators are another new feature, one that interacts with the introduction of
iterators.
You're doubtless familiar with how function calls work in Python or C. When you
call a function, it gets a private namespace where its local variables are
created. When the function reaches a :keyword:`return` statement, the local
variables are destroyed and the resulting value is returned to the caller. A
later call to the same function will get a fresh new set of local variables.
But, what if the local variables weren't thrown away on exiting a function?
What if you could later resume the function where it left off? This is what
generators provide; they can be thought of as resumable functions.
Here's the simplest example of a generator function::
def generate_ints(N):
for i in range(N):
yield i
A new keyword, :keyword:`yield`, was introduced for generators. Any function
containing a :keyword:`yield` statement is a generator function; this is
detected by Python's bytecode compiler which compiles the function specially as
a result. Because a new keyword was introduced, generators must be explicitly
enabled in a module by including a ``from __future__ import generators``
statement near the top of the module's source code. In Python 2.3 this
statement will become unnecessary.
When you call a generator function, it doesn't return a single value; instead it
returns a generator object that supports the iterator protocol. On executing
the :keyword:`yield` statement, the generator outputs the value of ``i``,
similar to a :keyword:`return` statement. The big difference between
:keyword:`yield` and a :keyword:`return` statement is that on reaching a
:keyword:`yield` the generator's state of execution is suspended and local
variables are preserved. On the next call to the generator's ``next()`` method,
the function will resume executing immediately after the :keyword:`yield`
statement. (For complicated reasons, the :keyword:`yield` statement isn't
allowed inside the :keyword:`try` block of a :keyword:`try`...\
:keyword:`finally` statement; read :pep:`255` for a full explanation of the
interaction between :keyword:`yield` and exceptions.)
Here's a sample usage of the :func:`generate_ints` generator::
>>> gen = generate_ints(3)
>>> gen
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
File "", line 1, in ?
File "", line 2, in generate_ints
StopIteration
You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
generate_ints(3)``.
Inside a generator function, the :keyword:`return` statement can only be used
without a value, and signals the end of the procession of values; afterwards the
generator cannot return any further values. :keyword:`return` with a value, such
as ``return 5``, is a syntax error inside a generator function. The end of the
generator's results can also be indicated by raising :exc:`StopIteration`
manually, or by just letting the flow of execution fall off the bottom of the
function.
You could achieve the effect of generators manually by writing your own class
and storing all the local variables of the generator as instance variables. For
example, returning a list of integers could be done by setting ``self.count`` to
0, and having the :meth:`next` method increment ``self.count`` and return it.
However, for a moderately complicated generator, writing a corresponding class
would be much messier. :file:`Lib/test/test_generators.py` contains a number of
more interesting examples. The simplest one implements an in-order traversal of
a tree using generators recursively. ::
# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
Two other examples in :file:`Lib/test/test_generators.py` produce solutions for
the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no
queen threatens another) and the Knight's Tour (a route that takes a knight to
every square of an $NxN$ chessboard without visiting any square twice).
The idea of generators comes from other programming languages, especially Icon
(http://www.cs.arizona.edu/icon/), where the idea of generators is central. In
Icon, every expression and function call behaves like a generator. One example
from "An Overview of the Icon Programming Language" at
http://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks
like::
sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)
In Icon the :func:`find` function returns the indexes at which the substring
"or" is found: 3, 23, 33. In the :keyword:`if` statement, ``i`` is first
assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon
retries it with the second value of 23. 23 is greater than 5, so the comparison
now succeeds, and the code prints the value 23 to the screen.
Python doesn't go nearly as far as Icon in adopting generators as a central
concept. Generators are considered a new part of the core Python language, but
learning or using them isn't compulsory; if they don't solve any problems that
you have, feel free to ignore them. One novel feature of Python's interface as
compared to Icon's is that a generator's state is represented as a concrete
object (the iterator) that can be passed around to other functions or stored in
a data structure.
.. seealso::
:pep:`255` - Simple Generators
Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly
by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.
.. ======================================================================
PEP 237: Unifying Long Integers and Integers
============================================
In recent versions, the distinction between regular integers, which are 32-bit
values on most machines, and long integers, which can be of arbitrary size, was
becoming an annoyance. For example, on platforms that support files larger than
``2**32`` bytes, the :meth:`tell` method of file objects has to return a long
integer. However, there were various bits of Python that expected plain integers
and would raise an error if a long integer was provided instead. For example,
in Python 1.5, only regular integers could be used as a slice index, and
``'abc'[1L:]`` would raise a :exc:`TypeError` exception with the message 'slice
index must be int'.
Python 2.2 will shift values from short to long integers as required. The 'L'
suffix is no longer needed to indicate a long integer literal, as now the
compiler will choose the appropriate type. (Using the 'L' suffix will be
discouraged in future 2.x versions of Python, triggering a warning in Python
2.4, and probably dropped in Python 3.0.) Many operations that used to raise an
:exc:`OverflowError` will now return a long integer as their result. For
example::
>>> 1234567890123
1234567890123L
>>> 2 ** 64
18446744073709551616L
In most cases, integers and long integers will now be treated identically. You
can still distinguish them with the :func:`type` built-in function, but that's
rarely needed.
.. seealso::
:pep:`237` - Unifying Long Integers and Integers
Written by Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van
Rossum.
.. ======================================================================
PEP 238: Changing the Division Operator
=======================================
The most controversial change in Python 2.2 heralds the start of an effort to
fix an old design flaw that's been in Python from the beginning. Currently
Python's division operator, ``/``, behaves like C's division operator when
presented with two integer arguments: it returns an integer result that's
truncated down when there would be a fractional part. For example, ``3/2`` is
1, not 1.5, and ``(-1)/2`` is -1, not -0.5. This means that the results of
division can vary unexpectedly depending on the type of the two operands and
because Python is dynamically typed, it can be difficult to determine the
possible types of the operands.
(The controversy is over whether this is *really* a design flaw, and whether
it's worth breaking existing code to fix this. It's caused endless discussions
on python-dev, and in July 2001 erupted into an storm of acidly sarcastic
postings on :newsgroup:`comp.lang.python`. I won't argue for either side here
and will stick to describing what's implemented in 2.2. Read :pep:`238` for a
summary of arguments and counter-arguments.)
Because this change might break code, it's being introduced very gradually.
Python 2.2 begins the transition, but the switch won't be complete until Python
3.0.
First, I'll borrow some terminology from :pep:`238`. "True division" is the
division that most non-programmers are familiar with: 3/2 is 1.5, 1/4 is 0.25,
and so forth. "Floor division" is what Python's ``/`` operator currently does
when given integer operands; the result is the floor of the value returned by
true division. "Classic division" is the current mixed behaviour of ``/``; it
returns the result of floor division when the operands are integers, and returns
the result of true division when one of the operands is a floating-point number.
Here are the changes 2.2 introduces:
* A new operator, ``//``, is the floor division operator. (Yes, we know it looks
like C++'s comment symbol.) ``//`` *always* performs floor division no matter
what the types of its operands are, so ``1 // 2`` is 0 and ``1.0 // 2.0`` is
also 0.0.
``//`` is always available in Python 2.2; you don't need to enable it using a
``__future__`` statement.
* By including a ``from __future__ import division`` in a module, the ``/``
operator will be changed to return the result of true division, so ``1/2`` is
0.5. Without the ``__future__`` statement, ``/`` still means classic division.
The default meaning of ``/`` will not change until Python 3.0.
* Classes can define methods called :meth:`__truediv__` and :meth:`__floordiv__`
to overload the two division operators. At the C level, there are also slots in
the :c:type:`PyNumberMethods` structure so extension types can define the two
operators.
* Python 2.2 supports some command-line arguments for testing whether code will
works with the changed division semantics. Running python with :option:`-Q
warn` will cause a warning to be issued whenever division is applied to two
integers. You can use this to find code that's affected by the change and fix
it. By default, Python 2.2 will simply perform classic division without a
warning; the warning will be turned on by default in Python 2.3.
.. seealso::
:pep:`238` - Changing the Division Operator
Written by Moshe Zadka and Guido van Rossum. Implemented by Guido van Rossum..
.. ======================================================================
Unicode Changes
===============
Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are
usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be
compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by
supplying :option:`--enable-unicode=ucs4` to the configure script. (It's also
possible to specify :option:`--disable-unicode` to completely disable Unicode
support.)
When built to use UCS-4 (a "wide Python"), the interpreter can natively handle
Unicode characters from U+000000 to U+110000, so the range of legal values for
the :func:`unichr` function is expanded accordingly. Using an interpreter
compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still
cause :func:`unichr` to raise a :exc:`ValueError` exception. This is all
described in :pep:`261`, "Support for 'wide' Unicode characters"; consult it for
further details.
Another change is simpler to explain. Since their introduction, Unicode strings
have supported an :meth:`encode` method to convert the string to a selected
encoding such as UTF-8 or Latin-1. A symmetric ``decode([*encoding*])``
method has been added to 8-bit strings (though not to Unicode strings) in 2.2.
:meth:`decode` assumes that the string is in the specified encoding and decodes
it, returning whatever is returned by the codec.
Using this new feature, codecs have been added for tasks not directly related to
Unicode. For example, codecs have been added for uu-encoding, MIME's base64
encoding, and compression with the :mod:`zlib` module::
>>> s = """Here is a lengthy piece of redundant, overly verbose,
... and repetitive text.
... """
>>> data = s.encode('zlib')
>>> data
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
>>> data.decode('zlib')
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
>>> print s.encode('uu')
begin 666
M2&5R92!I=F5R8F]S92P*86YD(')E>> "sheesh".encode('rot-13')
'furrfu'
To convert a class instance to Unicode, a :meth:`__unicode__` method can be
defined by a class, analogous to :meth:`__str__`.
:meth:`encode`, :meth:`decode`, and :meth:`__unicode__` were implemented by
Marc-André Lemburg. The changes to support using UCS-4 internally were
implemented by Fredrik Lundh and Martin von Löwis.
.. seealso::
:pep:`261` - Support for 'wide' Unicode characters
Written by Paul Prescod.
.. ======================================================================
PEP 227: Nested Scopes
======================
In Python 2.1, statically nested scopes were added as an optional feature, to be
enabled by a ``from __future__ import nested_scopes`` directive. In 2.2 nested
scopes no longer need to be specially enabled, and are now always present. The
rest of this section is a copy of the description of nested scopes from my
"What's New in Python 2.1" document; if you read it when 2.1 came out, you can
skip the rest of this section.
The largest change introduced in Python 2.1, and made complete in 2.2, is to
Python's scoping rules. In Python 2.0, at any given time there are at most
three namespaces used to look up variable names: local, module-level, and the
built-in namespace. This often surprised people because it didn't match their
intuitive expectations. For example, a nested recursive function definition
doesn't work::
def f():
...
def g(value):
...
return g(value-1) + 1
...
The function :func:`g` will always raise a :exc:`NameError` exception, because
the binding of the name ``g`` isn't in either its local namespace or in the
module-level namespace. This isn't much of a problem in practice (how often do
you recursively define interior functions like this?), but this also made using
the :keyword:`lambda` statement clumsier, and this was a problem in practice.
In code which uses :keyword:`lambda` you can often find local variables being
copied by passing them as the default values of arguments. ::
def find(self, name):
"Return list of any entries equal to 'name'"
L = filter(lambda x, name=name: x == name,
self.list_attribute)
return L
The readability of Python code written in a strongly functional style suffers
greatly as a result.
The most significant change to Python 2.2 is that static scoping has been added
to the language to fix this problem. As a first effect, the ``name=name``
default argument is now unnecessary in the above example. Put simply, when a
given variable name is not assigned a value within a function (by an assignment,
or the :keyword:`def`, :keyword:`class`, or :keyword:`import` statements),
references to the variable will be looked up in the local namespace of the
enclosing scope. A more detailed explanation of the rules, and a dissection of
the implementation, can be found in the PEP.
This change may cause some compatibility problems for code where the same
variable name is used both at the module level and as a local variable within a
function that contains further function definitions. This seems rather unlikely
though, since such code would have been pretty confusing to read in the first
place.
One side effect of the change is that the ``from module import *`` and
``exec`` statements have been made illegal inside a function scope under
certain conditions. The Python reference manual has said all along that ``from
module import *`` is only legal at the top level of a module, but the CPython
interpreter has never enforced this before. As part of the implementation of
nested scopes, the compiler which turns Python source into bytecodes has to
generate different code to access variables in a containing scope. ``from
module import *`` and ``exec`` make it impossible for the compiler to
figure this out, because they add names to the local namespace that are
unknowable at compile time. Therefore, if a function contains function
definitions or :keyword:`lambda` expressions with free variables, the compiler
will flag this by raising a :exc:`SyntaxError` exception.
To make the preceding explanation a bit clearer, here's an example::
x = 1
def f():
# The next line is a syntax error
exec 'x=2'
def g():
return x
Line 4 containing the ``exec`` statement is a syntax error, since
``exec`` would define a new local variable named ``x`` whose value should
be accessed by :func:`g`.
This shouldn't be much of a limitation, since ``exec`` is rarely used in
most Python code (and when it is used, it's often a sign of a poor design
anyway).
.. seealso::
:pep:`227` - Statically Nested Scopes
Written and implemented by Jeremy Hylton.
.. ======================================================================
New and Improved Modules
========================
* The :mod:`xmlrpclib` module was contributed to the standard library by Fredrik
Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple
remote procedure call protocol built on top of HTTP and XML. For example, the
following snippet retrieves a list of RSS channels from the O'Reilly Network,
and then lists the recent headlines for one channel::
import xmlrpclib
s = xmlrpclib.Server(
'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
channels = s.meerkat.getChannels()
# channels is a list of dictionaries, like this:
# [{'id': 4, 'title': 'Freshmeat Daily News'}
# {'id': 190, 'title': '32Bits Online'},
# {'id': 4549, 'title': '3DGamers'}, ... ]
# Get the items for one channel
items = s.meerkat.getItems( {'channel': 4} )
# 'items' is another list of dictionaries, like this:
# [{'link': 'http://freshmeat.net/releases/52719/',
# 'description': 'A utility which converts HTML to XSL FO.',
# 'title': 'html2fo 0.3 (Default)'}, ... ]
The :mod:`SimpleXMLRPCServer` module makes it easy to create straightforward
XML-RPC servers. See http://www.xmlrpc.com/ for more information about XML-RPC.
* The new :mod:`hmac` module implements the HMAC algorithm described by
:rfc:`2104`. (Contributed by Gerhard Häring.)
* Several functions that originally returned lengthy tuples now return pseudo-
sequences that still behave like tuples but also have mnemonic attributes such
as memberst_mtime or :attr:`tm_year`. The enhanced functions include
:func:`stat`, :func:`fstat`, :func:`statvfs`, and :func:`fstatvfs` in the
:mod:`os` module, and :func:`localtime`, :func:`gmtime`, and :func:`strptime` in
the :mod:`time` module.
For example, to obtain a file's size using the old tuples, you'd end up writing
something like ``file_size = os.stat(filename)[stat.ST_SIZE]``, but now this can
be written more clearly as ``file_size = os.stat(filename).st_size``.
The original patch for this feature was contributed by Nick Mathewson.
* The Python profiler has been extensively reworked and various errors in its
output have been corrected. (Contributed by Fred L. Drake, Jr. and Tim Peters.)
* The :mod:`socket` module can be compiled to support IPv6; specify the
:option:`--enable-ipv6` option to Python's configure script. (Contributed by
Jun-ichiro "itojun" Hagino.)
* Two new format characters were added to the :mod:`struct` module for 64-bit
integers on platforms that support the C :c:type:`long long` type. ``q`` is for
a signed 64-bit integer, and ``Q`` is for an unsigned one. The value is
returned in Python's long integer type. (Contributed by Tim Peters.)
* In the interpreter's interactive mode, there's a new built-in function
:func:`help` that uses the :mod:`pydoc` module introduced in Python 2.1 to
provide interactive help. ``help(object)`` displays any available help text
about *object*. :func:`help` with no argument puts you in an online help
utility, where you can enter the names of functions, classes, or modules to read
their help text. (Contributed by Guido van Rossum, using Ka-Ping Yee's
:mod:`pydoc` module.)
* Various bugfixes and performance improvements have been made to the SRE engine
underlying the :mod:`re` module. For example, the :func:`re.sub` and
:func:`re.split` functions have been rewritten in C. Another contributed patch
speeds up certain Unicode character ranges by a factor of two, and a new
:meth:`finditer` method that returns an iterator over all the non-overlapping
matches in a given string. (SRE is maintained by Fredrik Lundh. The
BIGCHARSET patch was contributed by Martin von Löwis.)
* The :mod:`smtplib` module now supports :rfc:`2487`, "Secure SMTP over TLS", so
it's now possible to encrypt the SMTP traffic between a Python program and the
mail transport agent being handed a message. :mod:`smtplib` also supports SMTP
authentication. (Contributed by Gerhard Häring.)
* The :mod:`imaplib` module, maintained by Piers Lauder, has support for several
new extensions: the NAMESPACE extension defined in :rfc:`2342`, SORT, GETACL and
SETACL. (Contributed by Anthony Baxter and Michel Pelletier.)
* The :mod:`rfc822` module's parsing of email addresses is now compliant with
:rfc:`2822`, an update to :rfc:`822`. (The module's name is *not* going to be
changed to ``rfc2822``.) A new package, :mod:`email`, has also been added for
parsing and generating e-mail messages. (Contributed by Barry Warsaw, and
arising out of his work on Mailman.)
* The :mod:`difflib` module now contains a new :class:`Differ` class for
producing human-readable lists of changes (a "delta") between two sequences of
lines of text. There are also two generator functions, :func:`ndiff` and
:func:`restore`, which respectively return a delta from two sequences, or one of
the original sequences from a delta. (Grunt work contributed by David Goodger,
from ndiff.py code by Tim Peters who then did the generatorization.)
* New constants :const:`ascii_letters`, :const:`ascii_lowercase`, and
:const:`ascii_uppercase` were added to the :mod:`string` module. There were
several modules in the standard library that used :const:`string.letters` to
mean the ranges A-Za-z, but that assumption is incorrect when locales are in
use, because :const:`string.letters` varies depending on the set of legal
characters defined by the current locale. The buggy modules have all been fixed
to use :const:`ascii_letters` instead. (Reported by an unknown person; fixed by
Fred L. Drake, Jr.)
* The :mod:`mimetypes` module now makes it easier to use alternative MIME-type
databases by the addition of a :class:`MimeTypes` class, which takes a list of
filenames to be parsed. (Contributed by Fred L. Drake, Jr.)
* A :class:`Timer` class was added to the :mod:`threading` module that allows
scheduling an activity to happen at some future time. (Contributed by Itamar
Shtull-Trauring.)
.. ======================================================================
Interpreter Changes and Fixes
=============================
Some of the changes only affect people who deal with the Python interpreter at
the C level because they're writing Python extension modules, embedding the
interpreter, or just hacking on the interpreter itself. If you only write Python
code, none of the changes described here will affect you very much.
* Profiling and tracing functions can now be implemented in C, which can operate
at much higher speeds than Python-based functions and should reduce the overhead
of profiling and tracing. This will be of interest to authors of development
environments for Python. Two new C functions were added to Python's API,
:c:func:`PyEval_SetProfile` and :c:func:`PyEval_SetTrace`. The existing
:func:`sys.setprofile` and :func:`sys.settrace` functions still exist, and have
simply been changed to use the new C-level interface. (Contributed by Fred L.
Drake, Jr.)
* Another low-level API, primarily of interest to implementors of Python
debuggers and development tools, was added. :c:func:`PyInterpreterState_Head` and
:c:func:`PyInterpreterState_Next` let a caller walk through all the existing
interpreter objects; :c:func:`PyInterpreterState_ThreadHead` and
:c:func:`PyThreadState_Next` allow looping over all the thread states for a given
interpreter. (Contributed by David Beazley.)
* The C-level interface to the garbage collector has been changed to make it
easier to write extension types that support garbage collection and to debug
misuses of the functions. Various functions have slightly different semantics,
so a bunch of functions had to be renamed. Extensions that use the old API will
still compile but will *not* participate in garbage collection, so updating them
for 2.2 should be considered fairly high priority.
To upgrade an extension module to the new API, perform the following steps:
* Rename :c:func:`Py_TPFLAGS_GC` to :c:func:`PyTPFLAGS_HAVE_GC`.
* Use :c:func:`PyObject_GC_New` or :c:func:`PyObject_GC_NewVar` to allocate
objects, and :c:func:`PyObject_GC_Del` to deallocate them.
* Rename :c:func:`PyObject_GC_Init` to :c:func:`PyObject_GC_Track` and
:c:func:`PyObject_GC_Fini` to :c:func:`PyObject_GC_UnTrack`.
* Remove :c:func:`PyGC_HEAD_SIZE` from object size calculations.
* Remove calls to :c:func:`PyObject_AS_GC` and :c:func:`PyObject_FROM_GC`.
* A new ``et`` format sequence was added to :c:func:`PyArg_ParseTuple`; ``et``
takes both a parameter and an encoding name, and converts the parameter to the
given encoding if the parameter turns out to be a Unicode string, or leaves it
alone if it's an 8-bit string, assuming it to already be in the desired
encoding. This differs from the ``es`` format character, which assumes that
8-bit strings are in Python's default ASCII encoding and converts them to the
specified new encoding. (Contributed by M.-A. Lemburg, and used for the MBCS
support on Windows described in the following section.)
* A different argument parsing function, :c:func:`PyArg_UnpackTuple`, has been
added that's simpler and presumably faster. Instead of specifying a format
string, the caller simply gives the minimum and maximum number of arguments
expected, and a set of pointers to :c:type:`PyObject\*` variables that will be
filled in with argument values.
* Two new flags :const:`METH_NOARGS` and :const:`METH_O` are available in method
definition tables to simplify implementation of methods with no arguments or a
single untyped argument. Calling such methods is more efficient than calling a
corresponding method that uses :const:`METH_VARARGS`. Also, the old
:const:`METH_OLDARGS` style of writing C methods is now officially deprecated.
* Two new wrapper functions, :c:func:`PyOS_snprintf` and :c:func:`PyOS_vsnprintf`
were added to provide cross-platform implementations for the relatively new
:c:func:`snprintf` and :c:func:`vsnprintf` C lib APIs. In contrast to the standard
:c:func:`sprintf` and :c:func:`vsprintf` functions, the Python versions check the
bounds of the buffer used to protect against buffer overruns. (Contributed by
M.-A. Lemburg.)
* The :c:func:`_PyTuple_Resize` function has lost an unused parameter, so now it
takes 2 parameters instead of 3. The third argument was never used, and can
simply be discarded when porting code from earlier versions to Python 2.2.
.. ======================================================================
Other Changes and Fixes
=======================
As usual there were a bunch of other improvements and bugfixes scattered
throughout the source tree. A search through the CVS change logs finds there
were 527 patches applied and 683 bugs fixed between Python 2.1 and 2.2; 2.2.1
applied 139 patches and fixed 143 bugs; 2.2.2 applied 106 patches and fixed 82
bugs. These figures are likely to be underestimates.
Some of the more notable changes are:
* The code for the MacOS port for Python, maintained by Jack Jansen, is now kept
in the main Python CVS tree, and many changes have been made to support MacOS X.
The most significant change is the ability to build Python as a framework,
enabled by supplying the :option:`--enable-framework` option to the configure
script when compiling Python. According to Jack Jansen, "This installs a self-
contained Python installation plus the OS X framework "glue" into
:file:`/Library/Frameworks/Python.framework` (or another location of choice).
For now there is little immediate added benefit to this (actually, there is the
disadvantage that you have to change your PATH to be able to find Python), but
it is the basis for creating a full-blown Python application, porting the
MacPython IDE, possibly using Python as a standard OSA scripting language and
much more."
Most of the MacPython toolbox modules, which interface to MacOS APIs such as
windowing, QuickTime, scripting, etc. have been ported to OS X, but they've been
left commented out in :file:`setup.py`. People who want to experiment with
these modules can uncomment them manually.
.. Jack's original comments:
The main change is the possibility to build Python as a
framework. This installs a self-contained Python installation plus the
OSX framework "glue" into /Library/Frameworks/Python.framework (or
another location of choice). For now there is little immedeate added
benefit to this (actually, there is the disadvantage that you have to
change your PATH to be able to find Python), but it is the basis for
creating a fullblown Python application, porting the MacPython IDE,
possibly using Python as a standard OSA scripting language and much
more. You enable this with "configure --enable-framework".
The other change is that most MacPython toolbox modules, which
interface to all the MacOS APIs such as windowing, quicktime,
scripting, etc. have been ported. Again, most of these are not of
immedeate use, as they need a full application to be really useful, so
they have been commented out in setup.py. People wanting to experiment
can uncomment them. Gestalt and Internet Config modules are enabled by
default.
* Keyword arguments passed to built-in functions that don't take them now cause a
:exc:`TypeError` exception to be raised, with the message "*function* takes no
keyword arguments".
* Weak references, added in Python 2.1 as an extension module, are now part of
the core because they're used in the implementation of new-style classes. The
:exc:`ReferenceError` exception has therefore moved from the :mod:`weakref`
module to become a built-in exception.
* A new script, :file:`Tools/scripts/cleanfuture.py` by Tim Peters,
automatically removes obsolete ``__future__`` statements from Python source
code.
* An additional *flags* argument has been added to the built-in function
:func:`compile`, so the behaviour of ``__future__`` statements can now be
correctly observed in simulated shells, such as those presented by IDLE and
other development environments. This is described in :pep:`264`. (Contributed
by Michael Hudson.)
* The new license introduced with Python 1.6 wasn't GPL-compatible. This is
fixed by some minor textual changes to the 2.2 license, so it's now legal to
embed Python inside a GPLed program again. Note that Python itself is not
GPLed, but instead is under a license that's essentially equivalent to the BSD
license, same as it always was. The license changes were also applied to the
Python 2.0.1 and 2.1.1 releases.
* When presented with a Unicode filename on Windows, Python will now convert it
to an MBCS encoded string, as used by the Microsoft file APIs. As MBCS is
explicitly used by the file APIs, Python's choice of ASCII as the default
encoding turns out to be an annoyance. On Unix, the locale's character set is
used if ``locale.nl_langinfo(CODESET)`` is available. (Windows support was
contributed by Mark Hammond with assistance from Marc-André Lemburg. Unix
support was added by Martin von Löwis.)
* Large file support is now enabled on Windows. (Contributed by Tim Peters.)
* The :file:`Tools/scripts/ftpmirror.py` script now parses a :file:`.netrc`
file, if you have one. (Contributed by Mike Romberg.)
* Some features of the object returned by the :func:`xrange` function are now
deprecated, and trigger warnings when they're accessed; they'll disappear in
Python 2.3. :class:`xrange` objects tried to pretend they were full sequence
types by supporting slicing, sequence multiplication, and the :keyword:`in`
operator, but these features were rarely used and therefore buggy. The
:meth:`tolist` method and the :attr:`start`, :attr:`stop`, and :attr:`step`
attributes are also being deprecated. At the C level, the fourth argument to
the :c:func:`PyRange_New` function, ``repeat``, has also been deprecated.
* There were a bunch of patches to the dictionary implementation, mostly to fix
potential core dumps if a dictionary contains objects that sneakily changed
their hash value, or mutated the dictionary they were contained in. For a while
python-dev fell into a gentle rhythm of Michael Hudson finding a case that
dumped core, Tim Peters fixing the bug, Michael finding another case, and round
and round it went.
* On Windows, Python can now be compiled with Borland C thanks to a number of
patches contributed by Stephen Hansen, though the result isn't fully functional
yet. (But this *is* progress...)
* Another Windows enhancement: Wise Solutions generously offered PythonLabs use
of their InstallerMaster 8.1 system. Earlier PythonLabs Windows installers used
Wise 5.0a, which was beginning to show its age. (Packaged up by Tim Peters.)
* Files ending in ``.pyw`` can now be imported on Windows. ``.pyw`` is a
Windows-only thing, used to indicate that a script needs to be run using
PYTHONW.EXE instead of PYTHON.EXE in order to prevent a DOS console from popping
up to display the output. This patch makes it possible to import such scripts,
in case they're also usable as modules. (Implemented by David Bolen.)
* On platforms where Python uses the C :c:func:`dlopen` function to load
extension modules, it's now possible to set the flags used by :c:func:`dlopen`
using the :func:`sys.getdlopenflags` and :func:`sys.setdlopenflags` functions.
(Contributed by Bram Stolk.)
* The :func:`pow` built-in function no longer supports 3 arguments when
floating-point numbers are supplied. ``pow(x, y, z)`` returns ``(x**y) % z``,
but this is never useful for floating point numbers, and the final result varies
unpredictably depending on the platform. A call such as ``pow(2.0, 8.0, 7.0)``
will now raise a :exc:`TypeError` exception.
.. ======================================================================
Acknowledgements
================
The author would like to thank the following people for offering suggestions,
corrections and assistance with various drafts of this article: Fred Bremmer,
Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger,
Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg,
Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore,
Gustavo Niemeyer, Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom
Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.