Peter Norvig has exquisite tastes in programming, is a Lisp guru and is also a great Python hacker. Put that together and what do you get? Lis.py, an interpreter for the core of the Lisp dialect Scheme in 96 lines of Python. Norvig mentions Alan Kay’s view of Lisp as “Maxwell’s Equations of Software” in a 2004 interview with Stu Feldman:

SF: If nothing else, Lisp was carefully defined in terms of Lisp.

AK: Yes, that was the big revelation to me when I was in graduate school—when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were “Maxwell’s Equations of Software!” This is the whole world of programming in a few lines that I can put my hand over.

‘And here it ends. One other thing I learned is that some in the academic world scornfully refer to Python as “the Basic of the future”. Personally, I rather see that as a badge of honor, and it gives me an opportunity to plug a book of interviews with language designers to which I contributed, side by side with the creators of Basic, C++, Perl, Java, and other academically scorned languages — as well as those of ML and Haskell, I hasten to add. (Apparently the creators of Scheme were too busy arguing whether to say “tail call optimization” or “proper tail recursion.” :-)’

I’ve not yet been able to track down any sources calling Python the ‘Basic of the future’ — all I could find is one person who referred to Java this way and another referring to Javascript. But for a programming language, it is a great slur, or maybe, to take Guido’s stance, a great compliment.

People who’s native language is Perl might find the Perl/Python phrasebook handy. When talking to the Python interpreter, some try hand gestures, typing slowly or using ALL CAPS, but these seldom work and can often annoy or even alarm the interpreter. This phrasebook covers the most common things you need to say to a simple Python system. For example, if you wanted to tell it to read your file as a list of lines, there’s a phrasebook entry that that shows just how to say it.

filename = “cooktest1.1-1”
f = open(filename) # Python has exceptions with somewhat-easy to
# understand error messages. If the file could
# not be opened, it would say “No such file or
# directory: %filename” which is as
# understandable as “can’t open $filename:”
lines = f.readlines()

Many of the entries also contain helpful facts and advice about the customs and social norms of native Python speakers. Not only can this keep you out of trouble, it will deepen your understanding of the colorful and sometimes quaint Python speakers. I hope that the pocket travel version of the phrasebook, suitable for downloading onto an ipod, will be out soon.

The amount of free, interesting, and useful data is growing explosively. Luckily, computer are getting cheaper as we speak, they are all connected with a robust communication infrastructure, and software for analyzing data is better than ever. That’s why everyone is interested in easy to use frameworks like MapReduce for every-day programmers to run their data crunching in parallel.

“Octo.py doesn’t aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there’s a good chance that you can make it distributed with just a few small changes. If you’re already using Python’s map() and reduce() functions, the changes needed are trivial!”

triangular.py is the simple example given in the documentation that is used with octo.py to compute the first 100 triangular numbers.

Put octo.py on all of the machines you want to use. On the machine you will use as a server (with ip address <ip>), also install triangular.py, and then execute:

python octo.py server triangular.py &

On each of your clients, run

python octo.py client <ip> &

You can try this out using the same machine to run the server process and one or more client processes, of course.

When the clients register with the server, they will get a copy of triangular.py and wait for tasks from the server. The server access the data from source and distributed tasks to the clients. These in turn use mapfn and reducefn to complete the tasks, returning the results. The server integrates these and, when all have completed, invokes final, which in this case just prints the answers, and halts. The clients continue to run, waiting for more tasks to do.

Octo.py is not a replacement for more sophisticated frameworks like Hadoop or Disco but if you are working in Python, its KISS approach is a good way to get started with the MapReduce paradigm and might be all you need for a small projects.

(Note: The package has not been updated since April 2008, so it’s status is not clear. But further development would run the risk of making it more complex and would be self-defeating.)

Disco is a Python-friendly, open-source Map-Reduce framework for distributed computing with the slogan “massive data – minimal code”. Disco’s core is written in Erlang, a functional language designed for concurrent programming, and users typically write Disco map and reduce jobs in Python. So what’s wrong with using Hadoop? Nothing, according to the Disco site, but…

“We see that platforms for distributed computing will be of such high importance in the future that it is crucial to have a wide variety of different approaches which produces healthy competition and co-evolution between the projects. In this respect, Hadoop and Disco can be seen as complementary projects, similar to Apache, Lighttpd and Nginx.

It is a matter of taste whether Erlang and Python are more suitable for the task than Java. We feel much more productive with Python than with Java. We also feel that Erlang is a perfect match for the Disco core that needs to handle tens of thousands of tasks in parallel.

Thanks to Erlang, the Disco core remarkably compact, currently less than 2000 lines of code. It is relatively easy to understand how the core works, and start experimenting with it or adapt it to new environments. Thanks to Python, it is easy to add new features around the core which ensures that Disco can respond quickly to real-world needs.”

The Disco tutorial uses the standard word counting task to show how to set up and use Disco on both a local cluster and Amazon EC2. There is also homedisco, which lets programmers develop, debug, profile and test Disco functions on one local machine before running on a cluster. The word counting example from the tutorial is certainly nicely compact:

A student in my programming languages class pointed me to the Python Challenge site. It looks like a great way for someone new to Python to test her skills and learn new ones.

It’s a riddle site in the style of notpron, but one where solving each riddle requires a little bit of Python programming. The solutions are entered by changing the URL of the current page to take you to the next riddle page. The problems are “designed to be solvable by Python newcomers and yet challenging even for Python experts.”