Tuesday, November 30, 2010

Zen of NumPy

While I was on-site working for a client, one of the developers I worked with would begin each day with a brief discussion of one of the tenets from the "Zen of Python." For those who are not familiar with this little pearl of Python goodness. You can find the "Zen of Python" as an Easter egg inside a Python distribution:

>>>importthis
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

The Zen of Python is often quoted from one Python user to another in trying to communicate something of the essence of what makes programming in Python different. While we were discussing one of the points, one of my co-workers suggested that there should be a "Zen of NumPy". This isn't the first time I've heard that suggestion. Actually David Morrill (author of Traits) was the first person who suggested there should be a book about the "Zen of NumPy." I totally agree with him. The only problem is that everybody involved with NumPy has apparently been too busy to write one :-)

With this idea in my mind, when it came time to give a talk on NumPy at the New York Python Meetup group in Manhattan, I decided to create a first-draft of the Zen of NumPy. The phrases are included on one slide in the deck shared here.

I'm interested in feedback on these before proposing them for placement as

numpy.this

Here is my attempt at a "Zen of NumPy"

Strided is better than scattered
Contiguous is better than strided
Descriptive is better than imperative (use data-types)
Array-oriented is often better than object-oriented
Broadcasting is a great idea -- use where possible
Vectorized is better than an explicit loop
Unless it’s complicated --- then use numexpr, weave, or Cython
Think in higher dimensions

I think there are useful edits as well as more statements that could be added to this list. Your feedback is welcome.