Python sadness

Python was apparently named after Monty Python’s Flying Circus, but just about every logo associated with it uses a snake. You may consider this the 1st of several fundamental, major inconsistencies in the language

This post is inspired by the very hilarious PHP Sadness. I recently completed Codeacademy’s Python track, and am full of high praises for the language. Python’s primary strengths are:

Syntactical simplicity

Can be both compiled and interpreted, which makes it very flexible

Large open source ecosystem build around it

List comprehension

First class support on all platforms

The biggest of the preceding is #1, and it’s the reason I think Python should be the 1st language anyone learns. The simple syntax elucidates a lot of complicated concepts (e.g. objects, classes, functional programming, etc.) that other language syntaxes obfuscate.

Bear in mind that most of my programming experience is in Mathematica, Fortran, and Matlab, all of which are very consistent languages. Also, I am by no means a Python expert, so if anything I say here is wrong, be sure to correct me.

That said, there are a few things about Python that make me sad:

Incompatibilities among different versions

Python 3.* is not backwards compatible with 2.*. Both version numbers are maintained simultaneously so that there are 2 current Python versions (really?). Oh yeah, and code written in version x.n might not work in version x.n+y – where x, n, and y are integers. Fascinating Horrifying stuff, really.

Inconsistent % operator semantics

When used between integers, % is the modulus function, meaning a % b returns the remainder of dividing a by b. When used with strings, however, % does something entirely different. "Hi %s name %s" % ("my", "is") produces "Hi my name is". There’s nothing wrong per se with that, but it’s completely different from the integer functionality. An operator shouldn’t mean wildly different things depending on the data type it’s used with.

Function notation varies based on the data type of the argument

Most Python functions use prefix notation, e.g. f(x). Some functions, such as sort() and upper(), use postfix notation, e.g. x.f(). While this is consistent with Python’s class methods approach – both sort() and upper() are string class methods – I find it unusual compared to the other languages I’m used to in which all functions have the same notations available to them.*

range() and list/string indices have inconsistent/confusing semantics

In Python, range(x,y) – where x and y are integers – gives you a list that begins with x and ends with y-1. range(y) gives you a list from 0 to y-1. I’m guessing the semantics of range(x,y) is a list starting at x of length y-x, where x = 0 if it’s not an argument. This is incredibly convoluted compared to simply having range(x,y) return a list of integers from x to y inclusive and range(y) return a list from 1 to y.

Starting indices at 0, while mathematically correct, leads to oddities when attempting to retrieve an item at the end of the list by using negative indices. For example, for stringB = "abcdef", stringB[0] returns a, but to pick up the last item – f – counting from the end of the list requires stringB[-1]. If list indexing started at 1 as it does in the real world – as opposed to computer science theory – then indices of 1 and -1 would pick up the first and last times in lists, respectively.

The obvious solution to all of this is to make Python count like people in the real world do: indices should start at 1, and range() should take inclusive endpoints as arguments.

.sort() works in place

I subscribe to the belief that functions should NOT change their arguments directly, as you risk destroying original data held in memory. The argument that doing so prevents machines from crashing by running out of memory is silly.

Operating on dictionaries can produce unordered output

This means that any time you loop through a dictionary, you will go through every key, but you are not guaranteed to get the output in any particular order. I can’t imagine how difficult this makes troubleshooting operations on large dictionaries.

Despite all of the above, I maintain that if you learn only 1 programming language, Python should be it. Unless you’re writing numerical solvers, in which case Fortran should be your only choice 😛

*Mathematica is an extreme example of this: f[x], f@x, and x//f all have the same effect

6 thoughts on “Python sadness”

The complaint regarding ‘%’ is absurd, since the string interpolation usage has been deprecated for years now. New code should _always_ do: “Hi my name is {}”.format(name)

Indexing starting at zero has existed since nearly the beginning of computer science. Get over it. range() output is designed such that you would have to go out of your way to produce an off-by-one error, that’s the whole point.

I agree somewhat about in-place vs returned values and have been bit by that inconsistency in the past. In general, though, the vast majority of methods return changed values.

Complaining about dictionaries changing order is also absurd. They are unordered by definition. If you need to preserve order for some reason, use collections.OrderedDict.

Tell that to the people who write online training courses. The Codeacademy track that I took appears to be based on Python 2.*, so maybe that’s why they included it?

Indexing starting at zero has existed since nearly the beginning of computer science.

1) In technology, precedent != correctness.
2) Quite a few languages index from 1.
3) Indexing from 0 makes indexing from the front of the list awkward and inconsistent.
4) How would you get an off by 1 error with range(x) returning a list of numbers from 1 to x? That list would be exactly x elements long, which is exactly what you asked for. range()‘s behavior in Python serves only consistence with indexing from 0 in the first place, nothing else.

Half-closed intervals with zero-based indexing are a consistent system. The problem with closed ranges is that the range [×,y] contains y-x+1 integers, leading to ugly and error-prone -1s and +1s all over the code. If you adopt the half-open range [x,y) (and there are reasons to pick that over (x,y]), it is natural to start indexing at 0, so that you get a range with n elements by [0,n).

As a practical example, consider operating on chunks of a list, with each chunk having length n. In Python, this works like the following:
for p in xrange(0, len(xs), n):
f(xs[p:p+n])
If you had used closed intervals, you would necessarily have a -1 there. Having done some video processing in Matlab, it’ll quickly get out of hand, and require constant thinking if you need -1 or +1 or neither.

As for the negative indices, you can always use len(x)-p if you find them confusing. Then you can note that -p is just a shorthand for len(x)-p. Although to be honest the negative indices in Python are a little inconsistent, and I haven’t personally found much use for them.

😛 I guess there’s really no free lunch in this situation. Either a language’s range function is intuitive – in the sense of being easier to grasp/understand – at first but requires deep thinking later (closed ranges), or it’s nonintuitive at first but simplifies later computation on a given range (half-closed ranges). I subscribe to the former philosophy: namely, that a simple operation should always be more intuitive than a more complicated one just as 1 + 1 = 2 is more intuitive than 5 x 3 = 15. Ergo closed ranges work for me because creating the range is more intuitive than operating on the range later, where as the reverse is true with half-closed ranges.

This dovetails nicely with my opinion that Fortran is the easiest language to pick up for people with limited formal programming training who find themselves needing to crunch numbers.