Thursday, December 09, 2004

Python Interfaces are not Java Interfaces

My "Java is not Python, either" article seems to have raised a few hackles. In retrospect, I realize that the fault is my own. Although I myself said in the article that "frameworks like Zope, Twisted, and PEAK all have interfaces, but since they're not part of the language or standard library, for most Python developers it's as if they don't exist," I then proceeded to write about them as if everyone would know what I was talking about. Duh! So, let's take a look at the subject that I should have covered first, about what interfaces are used for in Python today, and how their use differs from interface usage in Java.

First of all, in Python, interfaces are much more dynamic than their counterparts in Java. For example, Java doesn't let you change your mind on the fly about what interface a class or an instance supports, but Python does. Java requires you to write your interface in the form of a class, while in Python you can use a class, an object, generate it on the fly, pretty much whatever you want.

In Java, if you don't get the interface or implementation just so, your code doesn't compile. In Python, as long as your code still "quacks like a duck", it's still a duck. (Unless you pass it to a function that needs to introspect or adapt the interface; more on this later.)

So, if you're accustomed to Java interfaces, you may at this point be wondering, if Python interfaces are so darn optional and don't do hardly any of the things Java interfaces do, then what the heck do we need them for?

I'm so glad you asked that question. (Okay, I'm so glad I put that question into your mouth. Whatever.) There are three things that interfaces are useful for in Python, in roughly descending order of importance:

Documentation

Adaptation

Introspection (A very bad idea, in my opinion, but hey, consenting adults and all that.)

Now, the next thought that I'm going to pretend you're having is, "Why do I need interfaces for documentation? Why not just write documentation?". Well, that depends a lot on who and what the documentation is for.

If you're writing a library or a script, all someone needs to know is how to use it. They're going to be calling your code; you don't call theirs. They aren't modifying or extending your code, and if they do, they're on their own. So if they break it, screw 'em. They really don't need the formality of interfaces.

But, if you're creating a framework, the situation is different. A framework is a library that calls the framework user's code. Now, somebody will be writing code that -- in effect -- is part of your library, and they need to know things about how their code will be called. With a library, it's okay if the author or maintainers are the only ones who know the internal dependencies of their code. But with a framework, the users are effectively co-authors and co-maintainers. They need more concrete information.

For example, consider PEP 234, the iterator protocol. In the context of iteration, we may view Python itself as the "framework", which will be calling the user's code to perform iteration. Let's look at what the iterator protocol looks like, as a pair of Python interfaces, using the currently popular idioms for Python interfaces:

class IIterable(Interface):

"""An object that can be looped over with 'for'"""

def __iter__(): """Return an IIterator for this iterable/container, or return self if this is an iterator"""

class IIterator(IIterable):

"""Iterator representing the current position of a 'for' loop"""

def next(): """Return the next item, or raise StopIteration if finished"""

If you compare with PEP 234, this is roughly the same amount of text as is used there to describe the methods expected of Python iterators. But, it has some advantages that the original text does not. For one thing, it's closer in shape to an actual implementation; you can cut and paste from it in order to begin implementing your methods, rather than having to translate "takes no arguments" into a method signature. Textual documentation can refer to "IIterable" or "IIterator" objects, instead of talking about "protocol 1" and "protocol 2" (as PEP 234 does), or coming up with other ways to specify what's being talked about. You can easily skim to the definition of a specific method. It's clear that one kind of thing is a special case of the other. And so on.

Now, the point of this is not to say that PEP 234 should have had interfaces in it. That would be a silly thing to say. However, there were times when I was tempted to use interfaces in PEP 333, and then backed off because of a simple issue: by the time I got done explaining the conventions for specifying interfaces, it would have taken more writing than just putting up with using less precise (and concise) ways of specifying things. So, right now, it doesn't make sense for PEPs to include interfaces, because they're foreign to most of the intended audience. For any given PEP author to attempt it would kill the PEP, because the author would spend more time answering questions about interface definition than about the contents of the PEP.

Which reminds me, I didn't explain those conventions, although most of them can be reverse-engineered from the example I gave. First off, common practice among the interface-using Python community is to name interfaces with a leading 'I'. This drives some people up the wall, but most swear by it, because it makes it unambiguosly clear whether something is an interface. And for documentation and discussion, unambiguous clarity is a very good thing to have.

Second, by convention, the methods in an interface do not include 'self'. They reflect the signature of the method from the caller's point of view, and you do not pass 'self' to methods. This is something of a pain when copying and pasting to implement them, but then it's also easier to cut and paste when you're writing code that calls the methods. So, either way somebody's got to either add or delete the 'self', and current convention is to make the implementer add 'self' when implementing.

Third, inheritance between interfaces is used to denote that the inheriting interface requires all of the same behavior that the base interface does, but may provide additional capabilities or loosen restrictions on its callers. For example, a method overridden in a subclass interface may take additional arguments, as long as they are optional. Thus, the call signature of the base interface is still usable with the derived interface. If an interface has any calling signatures that are incompatible with those of its base, the interface should not be derived from the base interface. Instead, the common methods should be factored out into a common base, or duplicated in a new interface.

Interfaces can also specify attributes that are expected to appear on an object. One easy way to do this is just to include something like:

anAttribute = property(doc="""Describe the attribute here""")

in the interface definition. Both PyProtocols and zope.interface also offer Attribute objects you can use instead.

Finally, functions or other callable objects are described using the __call__ method. For example, a quick sketch of a PEP 333 application object would look like this:

The server will call the application with an IEnviron (the 'environ' argument) and an IHeaderOutput (the 'start_response' parameter). The application should first invoke 'start_response' to set the status and headers, then return an iterable. This can also be implemented as a generator, in which case 'start_response' should be called before the first 'yield' statement is executed."""

This is a lot shorter than the detailed explanations in PEP 333 about callables that call callables, but it means the same thing, since this calling signature may be implemented by a class, function, method, or any callable object.

Now, while it's not a substitute for an example, the interface above provides a compact overview of the concept of a WSGI "application object". Also, notice how I can refer to other interfaces as I describe the parameters, which allows me to have absolute precision in the specification, but without having to immediately explain what those other concepts are. You get a general sense of them as some kind of environment and header output, but you also know you can skim ahead looking for 'class IEnviron' or 'class IHeaderOutput' if you want the details.

This is the sort of precision and flexibility in specifying that I missed dearly when writing PEP 333. But I really didn't want to cloud issues that were often contentious to begin with by bringing interfaces into the mix. I especially didn't want people to possibly turn their backs on the PEP just because "interfaces are bad and shouldn't be in Python."

But, when I talk about Python needing interfaces, this is what I'm mainly talking about: conventions for documentation. Especially documentation that supports specifications that require interoperability, like PEP 234 or PEP 333. Over time, I think we'll see an increasing number of projects in Python that will need to be able to talk about what their interfaces are. It would certainly be nice if we could all talk about them in the same way.

(P.S. I didn't forget about adaptation and introspection; I just don't have time to cover them in today's post. Look for a follow-up at a later date.)