Tag: software design

Since my earlier post on this subject, someone has since brought to my attention this blog post by James Bennett. James writes well, cutting straight through to the real issues at hand, but in some places I think his facts are incorrect, and in other places I draw different conclusions to the ones he draws.

First up, Unicode strings vs. byte strings. In fact, these are handled in almost exactly the same fashion in Python 2 and in Python 3; both languages have a type for storing strings of characters, and a type for storing strings of arbitrary bytes (including things like data read from a network socket, and the encoded form of character strings). In Python 3, the str type is for storing character strings, and the bytes type is for storing byte strings. In Python 2, the unicode type is for storing character strings, and the str type is for storing byte strings. That’s really the only difference; the Python 2 str type has some methods that the Python 3 bytes type doesn’t, but that’s a relatively unimportant difference. The real problem in Python 2 is that many people have used the str type to store character strings, when they really should have been using the unicode type; this includes things built into the language (like attribute names or function names), various stdlib modules, and vast oceans of third-party code.

What does Python 3 do to solve this? Well, not all that much, except for completely breaking everyone’s existing string-handling code; I guess James assumes that in the process of fixing all of their string-handling code, they’ll get things right this time around, but I’m somewhat less optimistic. Still, I think it is important to point out that Python 3 does *not* give you any additional tools for dealing with character / byte strings, nor does it make it any easier to work with them; at best, it just fixes some of the broken character / byte string-handling code that was being distributed with Python.

With that out the way, I’ll move on to the “different conclusions” part. First up, the “Death by a thousand cuts”; I know many programmers feel similarly about the myriad minor issues he mentions, but I’m simply not one of them. Sure, there are all sorts of minor annoyances, and they do start to add up over time, but they’re simply irrelevant compared with the big issues. I might spend two weeks out of a whole year dealing with them, as opposed to months of time spent working around the lack of production-quality libraries for certain tasks, or the lack of higher-level programming constructs requiring me to write pages and pages of lower-level code to solve a certain problem. I’ll admit that I used to find these minor issues a great annoyance, but over time, they’ve just faded away to background noise, just like much of the supposedly major differentiating factors between different libraries and different programming languages. Once you see the forest, you stop caring so much about the trees.

Speaking of libraries, the new standard-library reorganisation is all very exciting; but I would really have liked it if they’d spent the time and energy on actually improving the code to a level suitable for production applications. It really doesn’t matter how most of the standard library is organised, if you’re not going to be using any of it anyway. In addition, projects reorganise APIs *all the time*, and there’s a perfectly straightforward way to do it in a backwards-compatible fashion. You introduce the new API or new location of the API, deprecate the old one, and then eventually remove it. No Python 3-style chasm-of-incompatibility required.

Of course, some of the standard library changes are actual functional improvements, not just rearranging the deck chairs; I haven’t looked at it yet myself, but I’ll take it on faith that the new I/O library is a vast improvement over the old Python 2 I/O facilities. Except… you don’t need to break backwards-compatibility to introduce a new I/O library; and I assume it’ll be ported to Python 2 sooner or later. Indeed, this is a common trend in Python 3 improvements; all the really interesting functional improvements are stuff that can and most likely will be ported to Python 2, if it has not already been ported.

If Python was a brand new language, being developed from scratch with a brand new community, I would be very happy about all of the changes made in Python 3; but since it’s not, I must repeat my claim that aside from things that can be backported to Python 2 in the first place, absolutely none of the Python 3 changes are worth making the jump to what is essentially a whole new programming language.

If you're a Python coder, and you haven't been living under a rock for the past few years, then you've probably heard of a thing called Python 3000, or Python 3.0. Python 3.0 is presented as a "new version" of Python, but with a twist: it represents a complete backwards compatibility break with the Python 2.x series, in order to make a whole slew of backwards-incompatible changes with no direct migration path. Thus, in some sense, this is actually a brand new language that just looks a lot like Python 2.x.

Is this a problem? Superficially, it doesn't "feel" like, say, trying to migrate from Java to C#; after all, it's just Python with a few cleanups and new features, right? Unfortunately, any non-trivial codebase is going to be riddled with code that is affected by the migration, and so the effort involved is still quite substantial. The problem runs deeper than that, however; code doesn't exist in a vacuum, it exists in a community of other code. All of your Python code has to run in the same runtime, so if you port your code to Python 3.x, you need all of the libraries / frameworks that you depend on to be available for Python 3.x. Of course, if you port your library to Python 3.x, people still using Python 2.x can't use any newer versions of your library anymore, so you either leave those users completely stranded, or you have to maintain two versions of your library simultaneously; that's really not much fun.

The Python developers have a solution to this problem: a tool called 2to3. 2to3 attempts to automatically convert Python 2.x code to Python 3.x code. Obviously, it can't do the right thing in every case, so the idea is that you modify your Python 2.x code so that it remains valid and correct Python 2.x code, but is also in a form that 2to3 can automatically convert to valid and correct Python 3.x code. Thus, you can simply maintain your Python 2.x codebase, generating the Python 3.x version, until one day you decide you no longer need Python 2.x code; at this point, you would run 2to3 over your codebase one last time and commit it, thus dropping Python 2.x forever.

Sounds great, right? Unfortunately, 2to3 is still in a fairly immature state, with lots of bugs and issues. In addition, there are many situations where 2to3 simply can't do the right thing. For example, in code that handles strings correctly, any use of the str type would simply be converted to use of the bytes type, and unicode would become str. However, a lot of code incorrectly uses the str type for strings of text, or because it uses APIs that are incorrectly written; in these cases, use of the str type might need to be converted to the Python 3.x str type, and 2to3 is never going to be able to figure this out on its own.

In the end, it's not yet clear that 2to3 will be a viable migration strategy, which opens the door to the possibility that we will see a schism in the Python ecosystem; with some people clinging to Python 2.x, while other people working in Python 3.x, isolated from their 2.x counterparts. Fortunately, the community has a lot of time to think about this: the 2.x series will probably continue until at least 2.7, if not beyond that. I'm personally still on Python 2.4 (my production servers run Debian stable), so I've got at least another 3 versions to go, which is quite a lot of time. If a viable migration strategy hasn't emerged by then, then I'm hoping that I'll either be able to switch from CPython to PyPy, or I'll already have made the jump to greener pastures, and thus not have to worry about Python at all.