Summary
I found this in my drafts, dated Feb 6 2005. I 'll just push it out now unedited. Original summary:
I thought it was clear that we should add interfaces to Python, but Phillip Eby reminded me that years ago I rejected them in favor of Abstract Base Classes (ABCs). Why? I don't remember! Which do you prefer?

I can't for the life of me remember why I would prefer ABCs over
interfaces! And even if I did remember, I believe I have changed my
mind since then.

The only argument that comes to mind is that ABCs don't require more
syntax. That's usually a strong argument in Python. But it seems
that at least two of the largest 3rd party projects in Python (Zope
and Twisted) have already decided that they can't live without
interfaces, and have created their own implementation. From this I
conclude that there's a real need.

But if they can do it themselves (albeit with some heavy-duty
metaclass magic), doesn't that prove we don't have to add them to the
language? Not at all. The mechanisms used are ideosyncratic,
fragile, and cumbersome. For example, you have to use __xxx__ words
to declare conformance to an interface. Plus, there's duplicate work
and interoperability. IMO this is language infrastructure that should
be provided by the language implementation.

Many followups to my blogs about adding optional type checking to
Python (given the latest design I'm leaving off the word "static" :-)
said that interfaces were more important than type checking.
Personally, I think they're interconnected: interfaces make much more
sense if you can also declare the argument types of the methods, and
argument type declarations in Python are unthinkable without a way to
spell duck types -- for which interfaces are an excellent approach.
Phillip Eby's Monkey Typing proposal is an interface-free alternative,
but I find it way too complex to be adopted as a standard Python
mechanism.

A few more arguments against ABCs: they seem the antithesis of duck
typing. Using ABCs for type declarations suggests that isinstance()
is used for type checking, and even if reality is not quite that
rigid, this suggestion would be leading people into the wrong
direction.

ABCs also allow, nay, encourage, "partially abstract" classes --
classes that have some abstract methods and some concrete ones. Of
course, such a class as a whole is still abstract, but the resulting
mixture of implementation and interface complexifies the semantics.

It has been suggested, especially by Ping, that there is a need to
specify some semantics in interfaces. A typical example involves a
file, where various operations such as readline() and readlines(), can
be implemented by default in terms of a more primitive operation -- in
this case read(). Unfortunately, I believe that this approach is not
very practical, since those default implementations are usually
inefficient, and the choice of the "most primitive" operation is often
dependent on the situation. I also suspect that outside some common
standard types there aren't all that many uses for this pattern. But
if you have to do this, a mix-in class for the "default" functionality
separate from the interface would work just as well as combining the
two.

Specifically because they are not classes, interfaces allow for clear,
distinct semantics. (That is, the semantics of interface objects; I
intend for interfaces to be neutral on the issue of the semantics of
the objects they describe.) For example, (not that I necessarily
propose this, but this is one way that we could decide to go), in a
class claiming to implement a particular method declared in an
interface, it could be flagged as an error if the actual
implementation required more arguments than declared in the interface,
or (assuming we can have type declarations in interfaces as well as in
classes) if the argument types didn't match.

Python has a strong tradition that subclasses may redefine methods
with a different signature, and making that an error goes against the
grain of the language. But the explicit use of an interface changes
things and there is seems appropriate that a class should not be
allowed to violate an interface it claims to implement.

So, while I haven't decided that Python 3.0 will have interfaces (or
type declarations), I'd like to go ahead and hypothesize about such a
future, and look at some of the standard interfaces the language would
provide for various common protocols like sequence, mapping and file.

Let's start with files, because they don't require genericity to fully
specify the interface. Here's my first attempt. Note that I'm
simplifying a few things; I'd like to drop the optional argument to
readline() and readlines(), and I'm dropping the obsolete API
xreadlines():

interface file:
def readline() -> str:
"Returns the next line; returns '' on EOF"
def next() -> str:
"""Returns the next line; raises StopIteration on EOF.
next() has one special property: due to internal buffering,
mixing it with other operations is not guaranteed to work
properly unless seek() is called first.
"""
def __iter__() -> file:
"Returns self"
def read(n: int = -1) -> str:
"""Reads n bytes; if n < 0, reads until EOF.
This blocks rather than returning a short read unless EOF is
reached; however not all implementations honor this
property.
(Did you know the default argument was -1?)
"""
def readlines() -> list[str]:
"Returns a list containing the remaining lines"
def seek(pos: int, how: int = 0) -> None:
"""Sets the position for the next read/write.
The 'how' argument should be 0 for positioning relative to
the start of the file, 1 for positioning relative to the
current position, and 2 for positioning relative to the end
of the file.
"""
def tell() -> int:
"Returns the current read/write position"
def write(s: str) -> None:
"Writes a string"
def writelines(s: list[str]) -> None:
"""Writes a list of strings.
Note: this does not add newlines; the strings in the list
are supposed to end in a newline.
"""
def truncate(size: int = ...) -> None:
"""Truncates the file to the given size.
Default is the current position.
"""
def flush() -> None:
"Flushes buffered data to the operating system."
def close():
"Closes the file, rendering subsequent use invalid"
def fileno() -> int:
"""Returns the underlying 'Unix file descriptor'.
Not all file implementations may support this, and the
semantics are not always the same.
"""
def isatty() -> bool:
"Returns whether this is an interactive device"
# Attributes
softspace: int # read-write attribute used by 'print'
# The following are read-only and not always supported
mode: str # mode given to open
name: str # file name given to open
encoding: str|None # file encoding
closed: bool # whether the file is closed
newlines: None|str|tuple(str) # observed newline convention

This brings up a number of interesting issues already:

The file interface does not include standard object methods and
attributes such as __repr__() and __class__. But it does
include __iter__() since this is not supported by all objects.

Moreover __iter__() is defined different than the "generic"
definition of __iter__() would be: since we know that a file's
__iter__ method returns the file itself, we know its type. This
is in general the case for standard APIs that are explicitly
part of a specific interface; we'll see this again for
__getitem__ later.

The argument to writelines() and the return value from
readlines() are lists of strings. I really want to be able to
express that in the interface definition. Even in Pascal,
which has such a nice simple type system, you can say this!
I'm using list[str], following the notation I used in an earlier
blog where I was brainstorming about generic types.

Rather than distinguishing between None and its type, for
conciseness I'm using the singleton value as its own type.
While type(None) is not None and never will be, in type
expressions, None stands for type(None).

In a few places, a value may be either a string or None; or
either an int or None. I'm using the notation str|None
respectively int|None for this, which also debuted in an earlier
blog.

The argument to truncate() has a dynamic default. I'm proposing
the notation ... for this; I don't want to say -1 because
passing int -1 doesn't have the same effect, unlike for read().
The semantics of this notation are that an implementation may
choose its own default but that it must provide one.

There's the thorny issue of some APIs that aren't always
defined. I'm not going to introduce a notation for this yet;
rather, I'll just say it in a comment. The default type
checking algorithm will accept partial implementations of an
interface.

The file interface has a few attributes, one of which
(softspace) is writable. This must be supported or else the
print statement won't work righty when directed to such a file.
For now I'm using the notation:

name: type

and indicating the read-write-ness in a comment. The notation
is less than ideal because it doesn't allow to attach a doc
string. I could use this:

name: type "docstring"

but I fear that Python's parser isn't smart enough to always
know where the type expression ends and the docstring begins
(since 'type' can be an expression, syntactically).

Note that softspace is conceptually a bool, but implemented as
an int, and that's how it's declared here.

The return type of close() is problematic. Usually it is None,
but for file objects returned by os.popen() is is an int. I've
chosen to leave out the '-> None' notation on the close()
method, leaving its return type unspecified. I could also have
written '-> int|None'. Or we could have a rule that allows
a method that is declared to return None to return a different
type, perhaps after subclassing.

It would be lovely to be able to declare exceptions, even if we
don't assign any semantics to this (Java checked exceptions have
turned out to be a horrible thing in practice). But I'm leaving
this to a future brainstorm.

What about argument names and keyword parameters? In the above
example, I don't intend to allow keyword parameters on any of
the interfaces. But what if an interface wants to define
keyword parameters? What if you want to require certain
parameters to be given as keyword parameters (and you still want
to declare their types)? Maybe we need a notation to explicitly
say that an argument can or must be a keyword parameter? Or
maybe it would be sufficient to allow leaving out the parameter
name if it is supposed to be always positional? Then the
declaration of read() would become:

Here's my attempt at defining a generic sequence interface. Note that
I'm declaring this as a generic type, with 'T' being the type
parameter. This despite my earlier promise not to bother with generic
types. I think they are both useful and easy to implement, even if
there are some thorny issues left: a dynamic check for list[int] is
very expensive (it has to check every item in the list for int-ness)
and any mutation of the list might change its type:

The syntax for declaring a generic interface (interface X[T])
requires a bit of a leap of faith. But without parameterization
we can say so much less about a sequence than what is common
knowledge (and what a type inferencer should know) that I find
it nearly useless to bother defining a sequence type without
this notation. Possibly an implementation that ignores the type
parameter T would be acceptable; use of T would be purely for
the benefit of the human reader.

I had to introduce two auxiliary interfaces:

iterator, something with primarily a next() method

iterable: something with primarily an __iter__() method,
although something implementing __getitem__() will also work.
That makes its declaration a bit awkward (with both methods
being optional but at least one being required).

I struggled a bit with the two possible signatures for
__getitem__ and friends: it is normally called with an int
argument, returning a single item, but the extended slice
notation (e.g. seq[1:2:3]) calls it with an argument that is a
slice object, and then it returns a sequence. Declaring the
argument and return types as unions feels unsatisfactory because
it throws away information. I decided to use the @overloaded
decorator, which can be implemented using a small amount of
namespace hacking.

Should we have separate interfaces for immutable and mutable
sequences? For now I'd rather only have one; the notion that an
implementation may leave out methods naturally allows for
immutable sequences.

Should the sequence interface mention virtually everything that
a list can do, or should it be minimal?

Even if the sequence interface is inclusive (containing most
list methods), I'd like to leave sort() out of it; sort() is
really unique to the list type, and even if some user type
defines a sort() method, it's unlikely to have the same
signature as list.sort() (especially after what we did to this
signature in Python 2.4). Feel free to prove me wrong.

In current Python, the + operator on standard sequence types
only accepts a right operand of the same type (list, tuple or
str). But the += operator on a list accepts any iterable! I
think + on any two sequences, or even a sequence and an
iterable, in either order, should be allowed, and should return
a new sequence. However, iterable + iterable should be left
undefined; this is because iterator + iterator is not defined,
and I think it should not be.

RSS Feed

About the Blogger

Guido van Rossum is the creator of Python, one of the major
programming languages on and off the web. The Python community refers to him as the BDFL (Benevolent Dictator For Life), a title straight
from a Monty Python skit. He moved from the Netherlands to the USA in
1995, where he met his wife. Until July 2003 they lived in the
northern Virginia suburbs of Washington, DC with their son Orlijn, who
was born in 2001. They then moved to Silicon Valley where Guido now works for Google
(spending 50% of his time on Python!).