Summary
Optional static typing has long been requested as a Python feature. It's been studied in depth before (e.g. on the type-sig) but has proven too hard for even a PEP to appear. In this post I'm putting together my latest thoughts on some issues, without necessarily hoping to solve all problems.

Advertisement

An email exchange with Neal Norwitz that started out as an inquiry
about the opening of a stock account for the PSF (talk about bizarre
conversation twists) ended up jogging my thoughts about optional
static typing for Python.

[As an experiment, I'm going to post this to Artima without mentioning
it anywhere else. If RSS works, it should show up on various other
blogs within days.]

The Python compiler doesn't know about the types of the objects you
pass around in a Python program; only the run-time (the Virtual
Machine) does. This makes Python expressive and flexible, but
sometimes means that bugs of a rather trivial kind (like typos in
method names) are found later than the developer would have
liked. Without losing the benefits of Python's dynamic typing, it
would be nice if you had the option to add type declarations for your
method arguments, variables and so on, and then the compiler would
give you a warning if you did something that wasn't possible given
what the compiler knows about those types. While there are third-party
program checkers that find a lot of problems without type
declarations, e.g. pychecker, it would be nice if (a) this
capability was built into the Python compiler and (b) you could give
it hints in cases where its type inference broke down.

Let's look at a simple function:

def gcd(a, b):
while a:
a, b = b%a, a
return b

This pretty much only makes sense with integer arguments, but the
compiler won't stop you if you call it with string or floating point
arguments. Purely based on the type system, those types are fine: the
% operator on two strings does string formatting (e.g. "(%s)" %
"foobar" gives "(foobar)"), and Python happens to define % on floats
as well (3.7 % 0.5 gives 0.2). But with string arguments the function
is likely to raise a TypeError (gcd("", "%s") notwithstanding) and
float arguments often cause bogus results due to the rounding errors.

So let's consider a simple type annotation for this function:

def gcd(a: int, b: int) -> int:
while a:
a, b = b%a, a
return b

I've considered various ways of adding argument types, and I've come
to like this notation best. I couldn't use a colon to indicate the
return type, because it would be too ambiguous to distinguish between
these two:

def foo(): int:
def foo(): print

(Yes, that's in part because Python's parser generator is so lame, but
that in turn is intentional -- it is so lame to prevent me from
inventing syntax that is either hard to write a parser for or hard to
disambiguate by human readers, who always come first in Python's
design.)

It would be a shame if declaring gcd() as taking int arguments would
mean that gcd(123456789012345, 23456789012355) would no longer work
(those are longs, not ints). This particular case can be solved by
using inheritance (let int derive from long, or vice versa), but there
are others that aren't solved so easily. For example:

What if a cStringIO instance were passed? There are technical reasons
why cStringIO can't or shouldn't inherit from StringIO, but more
importantly, requiring inheritance defeats Python's reliance on duck
typing.

Possible solution?: the compiler derives an inherited interface
(let's call this a duck type) for the argument from the fact that
the only two methods used are write() and getvalue(); the latter
returning a str. It makes sure that these methods are in fact defined
by the StringIO class, and then accepts other class instances that
implement the duck type.

But...: taking that to the extreme would not flag gcd("x", "y") as
a bug because the operations used (bool-testing, binary %) are
supported. Also, it doesn't help for the return type. What if I had
a UnicodeStringIO class whose getvalue() returned a Unicode string?

Container types offer lots of interesting problems. As an
introduction (not yet using containers), consider a function that
compute the smallest of two values:

def min(a, b):
if a < b:
return a
else:
return b

We would like to be able to add type annotations indicating that a and
b should have the same type and that the result also has that type.
Strawman syntax:

def min(a: T, b: T) -> T:
if a < b:
return a
else:
return b

As a (rather unpythonic) strawman, I'm using T and T0, T1, T2, etc. as
type variables. You can think of these as the free variables in an
equation of types; we're saying that the types of a, b and the
(unnamed) return value must all be the same. So min(1, 2) is valid
and returns an int; min("a", "b") is valid and returns a string.

What about min(1, 2.5)? That ought to be valid and return a float. I
guess this means that there should be some kind of typing hierarchy
that explains how the various numeric types are embedded in each
other. The VM already knows about these coercion rules, but the trick
is to build them into the type system. I think I would like to do
this using a mechanism separate from inheritance, since I really don't
think that it is a good idea to require that int is a subclass of
float and float a subclass of complex. But the mechanism should also
be open to user-defined types; there shouldn't be mechanisms in Python
that the user cannot extend (not many, anyway).

Now on to containers. Let's look at a different function that
computes the smallest element of a sequence:

I guess iterable would have to be a new built-in to represent the
concept "anything over which you can iterate". This includes lists
and tuples and strings, but also dictionaries, files, and in general
anything that defines __getitem__ or __iter__.

The Boost folks have a habit of Capitalizing these abstract types, so
it would be Iterable. That would perhaps also be a way out of the
dilemma of int vs. long: we could use Integer for idealized integers.
There are other numeric abstractions that we might want to define,
like Exact and Inexact, Rational, Real and Complex (and Quaternion?),
but I don't want to digress into number systems. Type systems are
hard enough without them.

Perhaps we could extend this convention to saying that whenever a
class name starts with a Capital letter it acts as a duck type, and
when it starts with a lower case letter it acts as a concrete type (so
only instances of the type and its subclasses are allowed).

I don't particularly like enforcing naming conventions based on case;
it's one thing to have an agreement amongst a group of Python
developers that Foo is an abstract class and foo is a concrete class,
but it's quit a different thing to have the compiler look at this too
for its type checking.

It would be good to have a set of notations both for the common
(abstract and concrete) container types, just so we can write more
examples. Here's a strawman.

list(T): a list whose elements are all T's

set(T): a set whose elements are all T's (set is a builtin in Python
2.4)

tuple(T0, T1, T2): a tuple of length three whose elements have the
specified types (if you want to use tuples of arbitrary length as
sequences, use Seqence)

union(T1, T2): either T1 or T2 (union is not really a type but an
operator, but I think it could just be a built-in)

Do we need to distinguish between mutable and immutable sets and
sequences?

We need a better name for type(None) than types.NoneType; perhaps void
will do?

A common pattern for optional values is to have union(T, void).
Perhaps we could add the notation optional(T) to mean just this?

I'm handwaving a bit about how the compiler knows all these built-in
names. I think it should assume that names of built-ins that aren't
redefined as globals (or imported, etc.) stand for their built-in
meaning and there's no surreptitious run-time redefinition going on.
So then it can know about int, and about len(x: sequence(T))->int, and
so on.

This is not about operator overloading, which is simple enough, but
about method signature overloading a la Java. While most overloading
is easily expressed using either default argument values or a simple
union type, there are some cases that defeat this approach.

For example, the signatures of the built-in min() and max() functions
cause some grief: they overload two quite different functions:

Python traditionally does more stuff at run time than most languages.
This makes compile time type checking difficult. For example, the
Python compiler (unlike e.g. the Java compiler) doesn't resolve
imports at compile time -- it just notes that an imported module of a
certain name is supposed to exist and moves on. (from ... import *
leaves it in the dark even more completely -- just like the human
reader.)

Perhaps some of the type checking should be postponed to import time?
That solves the problem that even if the compiler were to follow the
import references, it can't be sure that the module that's present at
compile time defines the same classes as the module that's present at
run time. Java actually has a similar problem, and solves it by
putting lots of information in class files which is then checked by
the bytecode loader before the bytecode is allowed to run. We can do
a similar thing, or perhaps even more extreme. (Neal Norwitz prompted
me to think of this approach. Let's hope it isn't patented.)

[I could write more, but it would take all night.
Perhaps tomorrow, perhaps in the new year.]

In my opinion, the use of a : instead of -> for the return type could confuse some users as the colon would be used to represent too many concepts all within one line of code. Also, for some reason my gut feeling is that the -> syntax is cleaner even though there is an extra character to type.

While reading the section on container types it appears that adding interfaces to the language would be more appropriate then using built-ins. You could avoid the issues of built-ins being redefined and would also give users another mechanism to extend static typing besides classes.

Some initial thoughts (apologies in advance if they're all over the place):

With this particular topic-cum-flamewar-in-waiting, I think there's a big danger that folk will want to race full-speed to a quick, familiar answer (add type declarations) before the fundamental problem it's supposed to solve has even been identified. If you're going to spend a whole load of time and effort on solving a problem, at least make sure you've found the right problem first. Especially when it's such a stupidly incendiary subject as type systems.

Folk need to back up from the issue as far as they can, ignoring the trees while they get the broadest possible view of the whole woods. Once they're ignoring the details they can take a much more measured, holistic approach towards defining the real problems and, ultimately, some real solutions to them. Start with the fundamental "What do users NEED?" question, rather than the (unfortunately all-too-common) "What do users WANT?" - i.e. only *think* they need (based on over-familiarity, preconception, myopia, ignorance, prejudice, and whatever) - and proceed from there.

...

OK. Two reasons I can see for folk wanting [optional] type declarations:

1. Formal interface declarations. However, this is completely missing the forest for the sake of an individual tree: just ask any Eiffel programmer. More on this below.

2. Supports compiler optimisations. This should really be handled via type inferencing, however, not explicit type declarations. Get the computer to do the grunt work, not the user; that's what it's there for.

A cynic would offer a third reason: as a comfort blanket for C/C++/Java users who feel uncomfortable or insecure without it. Providing false sense of security should be major grounds for NOT adding a feature, however: not only does it not solve the original problem, it creates an even bigger one as users are fooled into letting their guard down. It's the absolute worst solution of all: if the only choice is between this or doing nothing, always plug your ears and do nothing.

...

Reason #2 above is self-explanatory, so let's deal with #1: formal interface declarations. This is something I'd definitely like to see myself. While there's already a couple ways to do informal declarations, these don't have the benefit of being machine-readable in any meaningful sense, which makes them useless as far as automated code analysis tools go. However, type declarations alone hardly cut the mustard: they might eliminate one class of input/output errors, but do absolutely nothing to address any of the myriad others, so hardly warrant any kind of preferential treatment. Therefore, optional type declarations are a non-solution to the wrong problem.

In fact, I'm convinced that adding a type declaration system would be counterproductive: not only is it a completely inadequate solution as far as interface declarations go, its presence would undercut subsequent motivation to create anything better. Example: 'f(x: float, y: float) -> float' isn't a damn bit of use if y=0 will cause a ZeroDivisionError. Declaring 'y must not be 0' should be just as big a priority as saying 'y must be a number'.

Therefore: identifying the [real, fundamental] problem and proposing a [real, comprehensive] solution. If you can do that, I think the type declaration issue will naturally solve itself within context of the larger answer. (And any C/C++/Java weenies start getting bad-tempered and impatient, wondering what you're doing wasting time when they've already determined the 'solution', tell 'em to go learn Eiffel for a few months and report back once they've expanded their horizons a bit.)

...

Another thought: a formal interface declaration system doesn't need to solve 100% of problems to be useful. It would be ideal if it did, of course, but that's probably too hard a problem to solve (especially all at once). So don't even try. Therefore, the declaration system itself requires flexibility; it's not much use to anyone if it's an all-or-nothing proposition. Leveraging the Python language as much as possible - both in its implementation and in its user interface - would be a sensible and economic move, providing plenty flexibility (since Python is inherently flexible) and modest cost (more reuse = less reinvention of wheels).

For example, if declarations are written in ordinary Python code, you could allow users considerable (if not complete) freedom in how they express their interface rules without needing to perform AI-class code analysis at compile-time. Initially, just treat the entire declaration as runtime code, executing the pre- and post-conditions before and after the function's main body. You can then gradually develop your compile-time analyser to look for and extract common, simple patterns such as 'isinstance(ARG, TYPES)' and 'ARG != LITERAL', executing them at compile-time instead of run time, translating that information to more human-readable form and adding to the function's auto-generated documentation, or merely inserting more readable error messages into the runtime code that indicate an error was caused by a particular user-supplied argument being inappropriate, rather than just any old Tom, Dick or Harry value going wrong.

Hello,I'm not a specialist but I think one important person on this topic would be Phillip J Eby who's working on Peak and PyProtocols.For me I see Interfaces as needed most of all because of security issues which can lie behind bugs, tipically using Strings where % is applied to int to get some hidden information, or causing denial of service raising unexpected exceptions.So Interface would express more as constraints. Not insuring the constraint would raise an exception. Philip Eby already proposes decorators to declare types of arguments : something that could sounds like@args(IString, IInteger)@return(IString)def x(y,z): return ""With such an approach, generic typing is more like another constraint :@insureSameType(x,y)

Another mecanism Phillip J Eby proposes is adaptators wich can adapt one interface to another. This adaptator shall be provided by on of the interfaces or by a tiers function. That's far more flexible than inheritance mecanism.

Interfaces keeps the duck talking philosophy.

For the concern of performances issues, I really think that's the job of the computer. Maybe one can rely on interfaces to help Psyco like optimisations, maybe, some persistance for Psyco analysis.

Thank you for typing that up, I really enjoyed reading that 'article'. I was woundering what was going on with that.

I would like to suggest that I do not believe that what works with other languages will work here as well due to Pythons dynamic nature. Things like interfaces to 'force' a semi-static typing is not something I imagine enjoying just so that I could send my custom OddClass though the builtin (or also most other) helper functions.Also, being dynamic, most interface signatures would probably not be so appropriate for the type of error control wanted here, rather then the build helpers that they are used in other python project that have them.

I would like to suggest instead something like a 'category' or 'keyword' that can be associate with a class.Taking the gcd example. Suppose I have an OddNum class I want to send gcd, then I could define it as such:

This way classes can easily be categorised. Making it possible for a type to go into a 'staticicly typed' function is just as easy as adding another category as such:

class oddint(int) is Numeric, Iterable:.... # Not sure if this one is a great example though

I don't even, necessarily, see a point in defining the category before hand in any way. They could be dynamically created and managed without the user caring to even define them. After all its not the presence of any functionality in them that is important, but rather the fact that the developer states what type of class it is, allowing for an easier way to check functions the 'duck way'. If it says its numeric, lets treat it like one.

Then again, there are still the more complicated cases, which I am not sure if they should be dealt with, that are a little more difficult.

For example:

class oddobj(object) is Dict(Numeric, Numeric), Iterable:....

What defines that Dict can have subcategories? Is this sort of thing even needed? If there is a need to declare categories, how should it look? Maybe something like:

category Numeric: '''This category defines anything that agress to the general description of a numberThis means that basic mathematical operators (+, -, *, /, %) should behave in a mathematical way'''

category Dict(type1, type2): '''This category defines anything that has a mapping between type1 and type2'''

Still, it seems pretty empty. It's not that obvious what should come with the definition of a category, or what could go in it. There are things I can think of (validification code for example), but they are not things that are jump into your face obvious which is why I am not so sure.

> OK. Two reasons I can see for folk wanting [optional] type> declarations:> > 1. Formal interface declarations. However, this is> completely missing the forest for the sake of an> individual tree: just ask any Eiffel programmer. More on> this below.> > 2. Supports compiler optimisations. This should really be> handled via type inferencing, however, not explicit type> declarations. Get the computer to do the grunt work, not> the user; that's what it's there for.

I'd add:

3. Allow IDE's to (more easily) support code completion, Find Usages, and other such niceties that are now expected by many programmers. For example, Boo (scripting language with static type inferencing by default, with duck typing available on demand) is in its infancy but already ships with a code completion capable editor: http://boo.codehaus.org/Boo+Explorer

> Hi,> > Sorry if this is a stupig question, but what's wrong with> something like pyrex's syntax?>

> def int sum( int a, int b ):> return a + b>

Careful there, folks only just finished fighting the Great Decorator Syntax War of 2004. Might be best to take a break - let 'em get on with some real work for a bit - before plunging into the next one. ;)

Seriously tho', I suspect superficial syntax is the least of Guido's concerns at this point. There's big technical issues with implementing a clean, simple, rigid type declaration system in Python to be solved before syntax gets decided, not least the problem that Python's type/class system isn't particularly clean or simple, not having evolved with this need in mind.

...

Incidentally, another advantage of implementing general-purpose contracts instead of a special-purpose type declaration system is that it avoids the need to solve these problems before you can do anything useful with it. The best tool for dealing with the vagaries of Python's type system would be Python itself: it's much more powerful, flexible and adaptable than any hardwired system could be, so you're not limiting your options or prematurely setting them in stone.

Type purists may sniff at the idea, but Python is nothing if not pragmatic, and I'd bet contracts would provide far better bang for the buck - and ride over Python's real-world lumps and bumps much more smoothly - than an idealistic, over-specialised, static typing solution.

Guido, while I support the concept of compile-time type enforcing, I really can't get behind this syntax. Please indulge my rant:

One of the biggest reasons I switched my primary development environment from C++ to Python was the easy to read syntax. Arbitrary symbols ( like -> ) and symbols that have different meanings in different contexts ( like : ) make learning a new language ABSOLUTELY PAINFUL because their meaning cannot be easily discovered.

Compare the proposed syntax with this:def gcd(a of int, b of int) of int :

A new keyword, (I propose of, or ofType) accomplishes the same goals, but someone encountering the syntax for the first time would be able to find its meaning very quickly by googling "python keyword of" (it also makes the line much more readable imo).

While I could certainly see benefits to certain types of static type checking in Python, like others I'm wary of the complexity it could add to the syntax and to the language in general. In your discussion of duck typing, you already opened the can of worms that is Parameterized Types, which I have spent a lot of time studying in both C++ and Java, and which (as you see in your musings about it) exponentially complicates a language.

Two observations:

1) I suspect more mileage will be found with type inference rather than explicit type declarations, and this would allow Python to stay Python. Even though the examples of type declaration that you give seem simple, I believe that type declarations would open the door to madness as all the use cases were covered. Again, you touched on this complexity in your coverage of duck typing.

2) I have come to expect delightful surprising insights from the design of Python. I don't know exactly what that would be in this case, but explicit declarations seem un-Pythonic.

As one person pointed out, carefully choosing your battles may be the solution, rather than trying to cover every base. For example, I recently implemented a Composite of XML nodes and was putting the wrong kind of object in my node -- an exception told me so. The answer was to use instanceof. I don't put the wrong type in a collection often enough (this was the first time that I can recall) to make me want the whole language redefined -- I can just use an 'assert instanceof' to find it. (Perhaps support for Design By Contract would be more productive than type declarations).

On the other hand, it could make a lot of sense to introduce interfaces, which I've recently realized is a static type checking mechanism. That's a big-picture thing that a lot of people have fabricated on their own. Python could either check interfaces as part of the compile cycle, or both at compile and runtime, depending on the design. My guess is that people would get much more mileage out of an interface mechanism along with some type inference than in adding a new -- and I'll wager endlessly complex -- declaration syntax.

Note: An even more valuable interface mechanism would be one that allows you to give constraints about the types that implement the interface -- to make it a genuine contract rather than just a set of methods. Again, this moves towards more of a Design by Contract model, which I think the "D" language has at least partially implemented.

I agree with Bruce. Interfaces would provide more mileage than Static Typing and should be easier to implement and less controversial. By adding static typing, we run the risk that many users will implement them, far more often than necessary, and they will never fully appreciate the gains that can be had without their use. We would also be given ammunition to the users who continue to worship Static Typing as a guardian angel that protects us from the evil bugs. When we add Static Typing these users will be quick to say I told you so repeatedly and loudly.

In my opinion there are three areas in Python that need to be addressed if Python is to become a more attractive environment for the masses. These three areas will also reduce the amount of reinventing the wheel that goes no today and will make us all more productive. These areas are:

There already exists some standard interfaces in Python like iterators, file io, printing, mapping, sequences, callable objects, etc. They exist by conventions that have been established over time. We all benefit from them, but what about all the other interfaces that we could benefit from that will never make it into the Python standard. Many of them would never make it as a standard for good reasons, as they may not be useful to the general Python user.

A problem with these so called standard interfaces is how they are documented. They are documented in various locations in the standard docs and in an inconsistent manor, making it more work for the user to figure out how they need to program their classes to support the interface. Once interfaces are added to the language these documentation issues will be a thing of the past.

Also, powerful frameworks are likely to be developed as interfaces will serve as contracts on how to interface with a framework. We will all benefit from these frameworks as they will likely be easier to use, than the ones that exist today. Also, the learning curve for these frameworks will be smaller since they would share a common method for documentation (the interface itself) as with the standard Python interfaces.