Wednesday, January 28, 2009

My thanks go to Guido for allowing me to share my own history of Python!

I'll save my introduction to Python for another post, but the end result was its introduction into a startup that I co-founded in 1991 with several people. We were working on a large client/server system to handle Business-to-Consumer electronic shopping. Custom TCP protocols operating over the old X.25 network, and all that. Old school.

In 1995, we realized, contrary to our earlier beliefs, that more consumers actually were on the Internet, and that we needed a system for our customers (the vendors) to reach Internet-based consumers. I was tasked to figure out our approach, and selected Python as my prototyping tool.

Our first problem was moving to an entirely browser-based solution. Our custom client was no longer viable, so we needed a new shopping experience for the consumer, and server infrastructure to support that. At that time, talking to a web browser meant writing CGI scripts for the Apache and Netscape HTTP servers. Using CGI, I connected to our existing server backend to process orders, maintain the shopping basket, and to fetch product information. These CGI scripts produced plain, vanilla HTML (no AJAX in 1995!).

This approach was less-than-ideal since each request took time to spin up a new CGI process. The responsiveness was very poor. Then, in December 1995, while attending the Python Workshop in Washington, DC, I was introduced to some Apache and Netscape modules (from Digital Creations, who are best known for Zope) which ran persistently within the server process. These modules used an RPC system called ILU to communicate with backend, long-running processes. With this system in place, the CGI forking overhead disappeared and the shopping experience was now quite enjoyable! We started to turn the prototype into real code. The further we went with it, the better it looked and more people jumped onto the project. Development moved very fast over the next few months (thanks Python!).

In January 1996, Microsoft knocked on our door. Their internal effort at creating an electronic commerce system was floundering, and they needed people that knew the industry (we'd been doing electronic commerce for several years by that point) and somebody who was nimble. We continued to develop the software during the spring while negotiations occurred, and then the acquisition finalized in June 1996.

Once we arrived at Microsoft with our small pile of Python code, we had to figure out how to ship the product on Windows NT. The team we joined had lots of Windows experience and built an IIS plugin to communicate over named pipes to the backend servers, which were NT Services with our Python server code embedded. With a mad sprint starting in July, we shipped Microsoft Merchant Server 1.0 in October, 1996.

And yes... if you looked under the covers, somewhat hidden, was a Python interpreter, some extension DLLs, and a bunch of .pyc files. Microsoft certainly didn't advertise that fact, but it was there if you knew were to look.

Tuesday, January 27, 2009

The Python workshop (see previous posting) resulted in a job offer to come work on mobile agents at CNRI (the Corporation for National Research Initiatives). CNRI is a non-profit research lab in Reston, Virginia. I joined in April 1995. CNRI’s director, Bob Kahn, was the first to point out to me how much Python has in common with Lisp, despite being completely different at a superficial (syntactic) level. Python work at CNRI was funded indirectly by a DARPA grant for mobile agent research. Although there was DARPA support for projects that used Python, there was not much direct support for language development itself.

At CNRI, I led and helped hire a small team of developers to build a mobile agent system in pure Python. The initial team members were Roger Masse and Barry Warsaw who were bitten by the Python bug at the Python workshop at NIST. In addition, we hired Python community members Ken Manheimer and Fred Drake. Jeremy Hylton, an MIT graduate originally hired to work on text retrieval, also joined the team. The team was initially managed by Ted Strollo and later on by Al Vezza.

This team helped me create and maintain additional Python community infrastructure such as the python.org website, the CVS server, and the mailing lists for various Python Special Interest Groups. Python releases 1.3 through 1.6 came out of CNRI. For many years Python 1.5.2 was the most popular and most stable version.

GNU mailman was also born here: we originally used a Perl tool called Majordomo, but Ken Manheimer found it unmaintainable and looked for a Python solution. He found out about something written in Python by John Viega and took over maintenance. When Ken left CNRI for Digital Creations, Barry Warsaw took over, and convinced the Free Software Foundation to adopt it as its official mailing list tool. Hence Barry licensed it under the GPL (GNU Public License).

The Python workshops continued, at first twice a year, but due to the growth and increased logistical efforts they soon evolved into yearly events. These were first run by whoever wanted to host them, like NIST (the first one), USGS (the second and third one) and LLNL (the fourth one, and the start of the yearly series). Eventually CNRI took over the organization, and later (together with the WWW and IETF conferences) this was spun off as a commercial effort, Fortec. Attendance quickly rose to several hundreds. When Fortec faded away a while after I left CNRI, the International Python Conference was folded into O'Reilly's Open Source Conference (OSCON), but at the same time the Python Software Foundation (see below) started a new series of grassroots conferences named PyCon.

We also created the first (loose) organization around Python at CNRI. In response to efforts by Mike McLay and Paul Everitt to create a "Python Foundation", which ended up in the quicksand of bylaw drafting, Bob Kahn offered to create the "Python Software Activity", which would not be an independent legal entity but simply a group of people working under CNRI's legal (non-profit) umbrella. The PSA was successful in rallying the energy of a large group of committed Python users, but its lack of independence limited its effectiveness.

CNRI also used DARPA money to fund the development of JPython (later shortened to Jython), a Python implementation in and for Java. Jim Hugunin initially created JPython while doing graduate work at MIT. He then convinced CNRI to hire him to complete the work (or perhaps CNRI convinced Jim to join -- it happened while I was on vacation). When Jim left CNRI less than two years later to join the AspectJ project at Xerox PARC, Barry Warsaw continued the JPython development. (Much later, Jim would also author IronPython, the Python port to Microsoft's .NET. Jim also wrote the first version of Numeric Python.)

Other projects at CNRI also started to use Python. Several new core Python developers came out of this, in particular Andrew Kuchling, Neil Schemenauer, and Greg Ward, who worked for the MEMS Exchange project. (Andrew had contributed to Python even before joining CNRI; his first major project was the Python Cryptography Toolkit, a third party library that made many fundamental cryptological algorithms available to Python users.)

On the wings of Python's success, CNRI tried to come up with a model to fund Python development more directly than via DARPA research grants. We created the Python Consortium, modeled after the X Consortium, with a minimum entrance fee of $20,000. However, apart from one group at Hewlett-Packard, we didn't get much traction, and eventually the consortium died of anemia. Another attempt to find funding was Computer Programming for Everybody (CP4E), which received some DARPA funding. However, the funding wasn't enough for the whole team, and it turned out that there was a whole old-boys network involved in getting actually most of the money spread over several years. That was not something I enjoyed, and I started looking for other options.

Eventually, in early 2000, the dot-com boom, which hadn’t quite collapsed yet, convinced me and three other members of the CNRI Python team (Barry Warsaw, Jeremy Hylton, and Fred Drake) to join BeOpen.com, a California startup that was recruiting open source developers. Tim Peters, a key Python community member, also joined us.

In anticipation of the transition to BeOpen.com, a difficult question was the future ownership of Python. CNRI insisted on changing the license and requested that we release Python 1.6 with this new license. The old license used while I was still at CWI had been a version of the MIT license. The releases previously made at CNRI used a slightly modified version of that license, with basically one sentence added where CNRI disclaimed most responsibilities. The 1.6 license however was a long wordy piece of lawyerese crafted by CNRI's lawyers.

We had several long wrestling discussions with Richard Stallman and Eben Moglen of the Free Software Foundation about some parts of this new license. They feared it would be incompatible with the GPL, and hence threaten the viability of GNU mailman, which had by now become an essential tool for the FSF. With the help of Eric Raymond, changes to the CNRI Python license were made that satisfied both the FSF and CNRI, but the resulting language is not easy to understand. The only good thing I can say about it is that (again thanks to Eric Raymond's help) it has the seal of approval of the Open Source Initiative as a genuine open source license. Only slight modifications were made to the text of the license to reflect the two successive changes of ownership, first BeOpen.com and then the Python Software Foundation, but in essence the handiwork of CNRI's lawyers still stands.

Like so many startups at the time, the BeOpen.com business plan failed rather spectacularly. It left behind a large debt, some serious doubts about the role played by some of the company's officers, and some very disillusioned developers besides my own team.

Luckily year my team, by now known as PythonLabs, was pretty hot, and we were hired as a unit by Digital Creations, one of the first companies to use Python. (Ken Manheimer had preceded us there a few years before.) Digital Creations soon renamed itself Zope Corporation after its main open source product, the web content management system Zope. Zope’s founders Paul Everitt and Rob Page had attended the very first Python workshop at NIST in 1994, as did its CTO, Jim Fulton.

History could easily have gone very differently: besides Digital Creations, we were also considering offers from VA Linux and ActiveState. VA Linux was then a rising star on the stock market, but eventually its stock price (which had made Eric Raymond a multi-millionaire on paper) collapsed rather dramatically. Looking back I think ActiveState would not have been a bad choice, despite the controversial personality of its founder Dick Hardt, if it hadn't been located in Canada.

In 2001 we created the Python Software Foundation, a non-profit organization, whose initial members were the main contributing Python developers at that time. Eric Raymond was one of the founding members. I'll have to write more about this period another time.

Tuesday, January 20, 2009

Python’s early development started at a research institute in Amsterdam called CWI, which is a Dutch acronym for a phrase that translates into English as Centre for Mathematics and Computer Science. CWI is an interesting place; funded by the Dutch government’s Department of Education and other research grants, it conducts academic-level research into computer science and mathematics. At any given time there are plenty of Ph.D. students wandering about and old-timers in the profession may still remember its original name, the Mathematical Centre. Under this name, it was perhaps most famous for the invention of Algol 68.

Python is a direct product of my experience at CWI. As I explain later, ABC gave me the key inspiration for Python, Amoeba the immediate motivation, and the multimedia group fostered its growth. However, so far as I know, no funds at CWI were ever officially earmarked for its development. Instead, it merely evolved as an important tool for use in both the Amoeba and multimedia groups.

My original motivation for creating Python was the perceived need for a higher level language in the Amoeba project. I realized that the development of system administration utilities in C was taking too long. Moreover, doing these in the Bourne shell wouldn’t work for a variety of reasons. The most important one was that as a distributed micro-kernel system with a radically new design, Amoeba’s primitive operations were very different (and finer-grain) than the traditional primitive operations available in the Bourne shell. So there was a need for a language that would “bridge the gap between C and the shell.” For a long time, this was Python’s main catchphrase.

At this point, you might ask "why not port an existing language?" In my view, there weren’t a lot of suitable languages around at that time. I was familiar with Perl 3, but it was even more tied to Unix than the Bourne shell. I also didn’t like Perl’s syntax--my tastes in programming language syntax were strongly influenced by languages like Algol 60, Pascal, Algol 68 (all of which I had learned early on), and last but not least, ABC, on which I’d spent four years of my life. So, I decided to design a language of my own which would borrow everything I liked from ABC while at the same time fixing all its problems (as I perceived them).

The first problem I decided to fix was the name! As it happened, the ABC team had some trouble picking a name for its language. The original name for the language, B, had to be abandoned because of confusion with another language named B, that was older and better known. In any case, B was meant as a working title only (the joke was that B was the name of the variable containing the name of the language--hence the italics). The team had a public contest to come up with a new name, but none of the submissions made the cut, and in the end, the internal back up candidate prevailed. The name was meant to convey the idea that the language made programming “as simple as ABC”, but it never convinced me all that much.

So, rather than over-analyzing the naming problem, I decided to under-analyze it. I picked the first thing that came to mind, which happened to be Monty Python’s Flying Circus, one of my favorite comedy troupes. The reference felt suitably irreverent for what was essentially a “skunkworks project”. The word “Python” was also catchy, a bit edgy, and at the same time, it fit in the tradition of naming languages after famous people, like Pascal, Ada, and Eiffel. The Monty Python team may not be famous for their advancement of science or technology, but they are certainly a geek favorite. It also fit in with a tradition in the CWI Amoeba group to name programs after TV shows.

For many years I resisted attempts to associate the language with snakes. I finally gave up when O’Reilly wanted to put a snake on the front of their first Python book "Programming Python". It was an O’Reilly tradition to use animal pictures, and if it had to be an animal, it might as well be a snake.

With the naming issue settled, I started working on Python in late December 1989, and had a working version in the first months of 1990. I didn’t keep notes, but I remember vividly that the first piece of code I wrote for Python’s implementation was a simple LL(1) parser generator I called “pgen." This parser generator is still part of the Python source distribution and probably the least changed of all the code. This early version of Python was used by a number of people at CWI, mostly, but not exclusively in the Amoeba group during 1990. Key developers besides myself were my officemates, programmers Sjoerd Mullender (Sape’s younger brother) and Jack Jansen (who remained one of the lead developers of the Macintosh port for many years after I left CWI).

On February 20, 1991, I first released Python to the world in the alt.sources newsgroup (as 21 uuencoded parts that had to be joined together and uudecoded to form a compressed tar file). This version was labeled 0.9.0, and released under a license that was an almost verbatim copy of the MIT license used by the X11 project at the time, substituting “Stichting Mathematisch Centrum”, CWI’s parent organization, as the responsible legal entity. So, like almost everything I’ve written, Python was open source before the term was even invented by Eric Raymond and Bruce Perens in late 1997.

There was immediately a lot of feedback and with this encouragement I kept a steady stream of releases coming for the next few years. I started to use CVS to track changes and to allow easier sharing of coding responsibilities with Sjoerd and Jack (Coincidentally, CVS was originally developed as a set of shell scripts by Dick Grune, who was an early member of the ABC group). I wrote a FAQ, which was regularly posted to some newsgroup, as was customary for FAQs in those days before the web, started a mailing list, and in March 1993 the comp.lang.python newsgroup was created with my encouragement but without my direct involvement. The newsgroup and mailing list were joined via a bidirectional gateway that still exists, although it is now implemented as a feature of mailman – the dominant open source mailing list manager, itself written in Python.

In the summer of 1994, the newsgroup was buzzing with a thread titled “If Guido was hit by a bus?” about the dependency of the growing Python community on my personal contributions. This culminated in an invitation from Michael McLay for me to spend two months as a guest researcher at NIST, the US National Institute for Standards and Technology, formerly the National Bureau of Standards, in Gaithersburg, Maryland. Michael had a number of “customers” at NIST who were interested in using Python for a variety of standards-related projects and the budget for my stay there was motivated by the need to help them improve their Python skills, as well as possibly improving Python for their needs.

The first Python workshop was held while I was there in November 1994, with NIST programmer Ken Manheimer providing important assistance and encouragement. Of the approximately 20 attendees, about half are still active participants in the Python community and a few have become major open source project leaders themselves (Jim Fulton of Zope and Barry Warsaw of GNU mailman). With NIST’s support I also gave a keynote for about 400 people at the Usenix Little Languages conference in Santa Fe, organized by Tom Christiansen, an open-minded Perl advocate who introduced me to Perl creator Larry Wall and Tcl/Tk author John Ousterhout.

The development of Python occurred at a time when many other dynamic (and open-source) programming languages such as Tcl, Perl, and (much later) Ruby were also being actively developed and gaining popularity. To help put Python in its proper historical perspective, the following list shows the release history of Python. The earliest dates are approximate as I didn't consistently record all events:

I've added hyperlinks to the releases that are still being advertised on python.org at this time. Note that many releases were followed by several micro-releases, e.g. 2.0.1; I haven't bothered to include these in the table as otherwise it would become too long. Source tarball of very old releases are also still accessible, here: http://www.python.org/ftp/python/src/. Various ancient binary releases and other historical artefacts can still be found by going one level up from there.

Tuesday, January 13, 2009

Later blog entries will dive into the gory details of Python's history. However, before I do that, I would like to elaborate on the philosophical guidelines that helped me make decisions while designing and implementing Python.

First of all, Python was originally conceived as a one-person “skunkworks” project – there was no official budget, and I wanted results quickly, in part so that I could convince management to support the project (in which I was fairly successful). This led to a number of timesaving rules:

Borrow ideas from elsewhere whenever it makes sense.

“Things should be as simple as possible, but no simpler.” (Einstein)

Do one thing well (The "UNIX philosophy").

Don’t fret too much about performance--plan to optimize later when needed.

Don’t fight the environment and go with the flow.

Don’t try for perfection because “good enough” is often just that.

(Hence) it’s okay to cut corners sometimes, especially if you can do it right later.

Other principles weren’t intended as timesavers. Sometimes they were quite the opposite:

The Python implementation should not be tied to a particular platform. It’s okay if some functionality is not always available, but the core should work everywhere.

Don’t bother users with details that the machine can handle (I didn’t always follow this rule and some of the of the disastrous consequences are described in later sections).

Support and encourage platform-independent user code, but don’t cut off access to platform capabilities or properties (This is in sharp contrast to Java.)

A large complex system should have multiple levels of extensibility. This maximizes the opportunities for users, sophisticated or not, to help themselves.

Errors should not be fatal. That is, user code should be able to recover from error conditions as long as the virtual machine is still functional.

At the same time, errors should not pass silently (These last two items naturally led to the decision to use exceptions throughout the implementation.)

A bug in the user’s Python code should not be allowed to lead to undefined behavior of the Python interpreter; a core dump is never the user’s fault.

Finally, I had various ideas about good programming language design, which were largely imprinted on me by the ABC group where I had my first real experience with language implementation and design. These ideas are the hardest to put into words, as they mostly revolved around subjective concepts like elegance, simplicity and readability.

Although I will discuss more of ABC's influence on Python a little later, I’d like to mention one readability rule specifically: punctuation characters should be used conservatively, in line with their common use in written English or high-school algebra. Exceptions are made when a particular notation is a long-standing tradition in programming languages, such as “x*y” for multiplication, “a[i]” for array subscription, or “x.foo” for attribute selection, but Python does not use “$” to indicate variables, nor “!” to indicate operations with side effects.

Tim Peters, a long time Python user who eventually became its most prolific and tenacious core developer, attempted to capture my unstated design principles in what he calls the “Zen of Python.” I quote it here in its entirety:

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than right now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

Although my experience with ABC greatly influenced Python, the ABC group had a few design principles that were radically different from Python’s. In many ways, Python is a conscious departure from these:

The ABC group strived for perfection. For example, they used tree-based data structure algorithms that were proven to be optimal for asymptotically large collections (but were not so great for small collections).

The ABC group wanted to isolate the user, as completely as possible, from the “big, bad world of computers” out there. Not only should there be no limit on the range of numbers, the length of strings, or the size of collections (other than the total memory available), but users should also not be required to deal with files, disks, “saving”, or other programs. ABC should be the only tool they ever needed. This desire also caused the ABC group to create a complete integrated editing environment, unique to ABC (There was an escape possible from ABC’s environment, for sure, but it was mostly an afterthought, and not accessible directly from the language.)

The ABC group assumed that the users had no prior computer experience (or were willing to forget it). Thus, alternative terminology was introduced that was considered more “newbie-friendly” than standard programming terms. For example, procedures were called “how-tos” and variables “locations”.

The ABC group designed ABC without an evolutionary path in mind, and without expecting user participation in the design of the language. ABC was created as a closed system, as flawless as its designers could make it. Users were not encouraged to “look under the hood”. Although there was talk of opening up parts of the implementation to advanced users in later stages of the project, this was never realized.

In many ways, the design philosophy I used when creating Python is probably one of the main reasons for its ultimate success. Rather than striving for perfection, early adopters found that Python worked "well enough" for their purposes. As the user-base grew, suggestions for improvement were gradually incorporated into the language. As we will seen in later sections, many of these improvements have involved substantial changes and reworking of core parts of the language. Even today, Python continues to evolve.

Python is currently one of the most popular dynamic programming languages, along with Perl, Tcl, PHP, and newcomer Ruby. Although it is often viewed as a "scripting" language, it is really a general purpose programming language along the lines of Lisp or Smalltalk (as are the others, by the way). Today, Python is used for everything from throw-away scripts to large scalable web servers that provide uninterrupted service 24x7. It is used for GUI and database programming, client- and server-side web programming, and application testing. It is used by scientists writing applications for the world's fastest supercomputers and by children first learning to program.In this blog, I will shine the spotlight on Python's history. In particular, how Python was developed, major influences in its design, mistakes made, lessons learned, and future directions for the language.

Acknowledgment: I am indebted to Dave Beazley for many of the better sentences in this blog. (For more on the origins of this blog, see my other blog.)

A Bird's Eye View of Python

When one is first exposed to Python, they are often struck by way that Python code looks, at least on the surface, similar to code written in other conventional programming languages such as C or Pascal. This is no accident---the syntax of Python borrows heavily from C. For instance, many of Python's keywords (if, else, while, for, etc.) are the same as in C, Python identifiers have the same naming rules as C, and most of the standard operators have the same meaning as C. Of course, Python is obviously not C and one major area where it differs is that instead of using braces for statement grouping, it uses indentation. For example, instead of writing statements in C like this

if (a < b) { max = b;} else { max = a;}

Python just dispenses with the braces altogether (along with the trailing semicolons for good measure) and uses the following structure

if a < b: max = belse: max = a

The other major area where Python differs from C-like languages is in its use of dynamic typing. In C, variables must always be explicitly declared and given a specific type such as int or double. This information is then used to perform static compile-time checks of the program as well as for allocating memory locations used for storing the variable’s value. In Python, variables are simply names that refer to objects. Variables do not need to be declared before they are assigned and they can even change type in the middle of a program. Like other dynamic languages, all type-checking is performed at run-time by an interpreter instead of during a separate compilation step.

Python’s primitive built-in data types include Booleans, numbers (machine integers, arbitrary-precision integers, and real and complex floating point numbers), and strings (8-bit and Unicode). These are all immutable types, meaning that values are represented by objects that cannot be modified after their creation. Compound built-in data types include tuples (immutable arrays), lists (resizable arrays) and dictionaries (hash tables).

In Python, all objects that can be named are said to be "first class." This means that functions, classes, methods, modules, and all other named objects can be freely passed around, inspected, and placed in various data structures (e.g., lists or dictionaries) at run-time. And speaking of objects, Python also has full support for object-oriented programming including user-defined classes, inheritance, and run-time binding of methods.

Python has an extensive standard library, which is one of the main reasons for its popularity. The standard library has more than 100 modules and is always evolving. Some of these modules include regular expression matching, standard mathematical functions, threads, operating systems interfaces, network programming, standard internet protocols (HTTP,FTP, SMTP, etc.), email handling, XML processing, HTML parsing, and a GUI toolkit (Tcl/Tk).

In addition, there is a very large supply of third-party modules and packages, most of which are also open source. Here one finds web frameworks (too many to list!), more GUI toolkits, efficient numerical libraries (including wrappers for many popular Fortran packages), interfaces to relational databases (Oracle, MySQL, and others), SWIG (a tool for making arbitrary C++ libraries available as Python modules), and much more.

A major appeal of Python (and other dynamic programming languages for that matter) is that seemingly complicated tasks can often be expressed with very little code. As an example, here is a simple Python script that fetches a web page, scans it looking for URL references, and prints the first 10 of those.

# Scan the web looking for references

import reimport urllib

regex = re.compile(r'href="([^"]+)"')

def matcher(url, max=10): "Print the first several URL references in a given url." data = urllib.urlopen(url).read() hits = regex.findall(data) for hit in hits[:max]: print urllib.basejoin(url, hit)

matcher("http://python.org")

This program can easily be modified to make a web crawler, and indeed Scott Hassan has told me that he wrote Google’s first web crawler in Python. Today, Google employs millions of lines of Python code to manage many aspects of its operations, from build automation to ad management (Disclaimer: I am currently a Google employee.)

Underneath the covers, Python is typically implemented using a combination of a bytecode compiler and interpreter. Compilation is implicitly performed as modules are loaded, and several language primitives require the compiler to be available at run-time. Although Python’s de-facto standard implementation is written in C, and available for every imaginable hardware/software platform, several other implementations have become popular. Jython is a version that runs on the JVM and has seamless Java integration. IronPython is a version for the Microsoft .NET platform that has similar integration with other languages running on .NET. PyPy is an optimizing Python compiler/interpreter written in Python (still a research project, being undertaken with EU funding). There’s also Stackless Python, a variant of the C implementation that reduces reliance on the C stack for function/method calls, to allow co-routines, continuations, and microthreads.