Wednesday, June 13, 2007

Python, IronPython, Apples, and Oranges

While Fuzzyman is over at the voidspace, talking about how great it is that, in IronPython, str and unicode are the same things, I'm over here getting more worried every day about the segmentation of Python and IronPython.

IronPython is a new implementation of the Python ... maintaining full compatibility with the Python language.

They should go ahead and drop that last qualify. I want to make something very clear, and that is that I absolutely hate writing this post. The IronPython project is really great, and I've been impressed by what it has done, and my Microsoft's embrace of the language. Admiration does not trump worry, in this case. A number of issues make IronPython simply not Python. I've been advocating this issue more and more recently, so it is about time I wrote at a moderate length about the issue.

In IronPython, str is unicode

Now, it may be true that Python plans to drop the current behavior, make str unicode, and add a separate type specifically for dealing with byte strings (See PEP 358). However, that is not the case yet, and jumping the gun and making str and unicode the same type is an absolutely incorrect non-solution. This is not just a matter of taste, but a situation where IronPython is absolutely wrong. I can make two arguments against this.

IronPython does not encode or decode between str and unicode

One of the most important issues about dealing with unicode is the difference between unicode or unicode strings of text and encoding strings of text or bytestreams containing encoded text, which may be decoded into understandable unicode (Joel has covered all this). IronPython implicitly can not do this. A str with a non-ASCII "byte" cannot be encoded by Python, if you don't tell it the encoding being used. This is no flaw, it is the law. IronPython, having no str type, effectively, just assumes the bytes over 128 are taken as the corresponding codepoints. There is no encoding anywhere, in which this is the correct behavior. That's right. They just give you a known bad result, and let it go.

When There Is No Bytestring, You Have to Look Elsewhere

So what happens when you truly need to work with byte strings in IronPython, which pretends byte strings are unicode strings? Well, you have to look elsewhere. Of course, the entire .Net API is at your finger tips, so look no further than System.Byte and System.Array, of course. Sounds easy, but the danger here should be obvious. Any Python code assuming, correctly, that str is a byte string type, is subject to implosion within IronPython and any IronPython code "properly" handling byte data simply can't import outside IronPython at all.

Language and Library

Does syntax alone make a language? Maybe one day it could, but those days died out. Python is far more than its clean, beautiful syntax. The libraries that come in the standard library provide even more value. As a foundation for all the software built on top, these packages are fundamental to the success of Python. Yes, your code looks beautiful all on its own, but all on its own it does not have an embedded database, configuration parser, and mail and web servers. Right there you have a basis for a huge number of applications, without even leaving the language's vanilla installation.

IronPython does not include any of these, so if you write software using them, don't expect them to run on the .Net runtime, just because IronPython claims compatibility. You can probably access all the same facilities, but you have to do so through the .Net APIs of similar facilities. I am not even sure that the same facilities are provided there. The sad fact about a lot of this, is that many fo the libraries not included in IronPython actually work perfectly, if they would include them in the distribution, without change.

Because of this, we have to resort to things I consider terrible, like two different Python scripts, both doing some basic HTTP downloads, and both being completely incompatible because they rely on entirely different APIs: IronPython through .Net APIs and the real Python through urllib2 or httplib.

Conclusion

IronPython takes the syntax, but stops short of the language. The problem is one for both Python and IronPython lovers. In Python land, we're seeing what appears to be an influx of interest from the IronPython (also, via Silverlight) world, but all those new developers are creating completely incompatible code. IronPython advocates, on the other hand, look silly to think they are promoting the Python language, and are completely missing out on hundreds of great libraries, years of built up community, and synergy that isn't just a buzzword.

10 comments:

So you have the same problems with pywin32, py2exe, jython or indeed any platform specific module or implementation? (except you don't seem to devote so much energy to railing against these ;-)

I also think that you underestimate the value of having Python on a new platform. It isn't "just syntax", but the whole semantics of the Python language, which really takes the pain out of programming on the .NET platform. This is very valuable - and the .NET framework is pretty rich, so being able to use Python there is a good thing!

Also, ConfigParser seems to work fine with IronPython - and possibly some of the other modules you mention. Did you *try* them at all?

To make it clear - at Resolver we have a 'large' IronPython application, which uses many modules from the Python standard library as well as third party Python libraries - and it works *great*.

Sure, there are some issues. Efforts to create cross-compatibility layers would be (MUCH) more effective than complaints!

By the way, urllib and urllib2 already work with IPCE and will soon work with IronPython. Part of the problem is that these modules (as well as other parts of the standard library) rely on undocumented features or even implementation details.

The inspect module decompiles bytecode - how is this ever going to work on another implementation? (An unpatched iinspect.getargspec works on neither IronPython nor Jython - and that is the fault of the Python standard library, not these implementations.)

Some of the problems highlight things that need to change in Python...

It is good to see a large corporation get publicly behind Python but this is the same Microsoft that tried to wrest control of Java from Sun with its own slightly different flavour of "Java for Windows from Microsoft".It would be good to see Microsoft make all the right noises about IronPython and seeking compatability - then back it with actions of course :-)

> I don't agree with Calvin. Having IronPython, Jython, PyPy attempt to perfectly duplicate the CPython would *harm* Python as a language.

> The community is learning, slowly, exactly what we mean by "str", "unicode", "bytes". These bugs are *good*, they are opportunities to learn.

> I believe Guido explicitly said that he was learning about unicode issues "in the wild" for Python3k from Jython and IronPython.

> "str", "unicode", "bytes" mean different things than they did three years ago. They just do.

Now, replying to you article:

First of all, Python, CPython, IronPython, Jython, PyPy are all changing entities. As changing entities, we can only legitimately criticize their trajectories. I see zero evidence that IronPython will "fork" the Python community. So much activity is being spent to make the Python library run in IronPython, in the FePy community and in the Microsoft sanctioned community.

> IronPython takes the syntax, but stops short of the language.

> IronPython advocates, on the other hand, look silly to think they are promoting the Python language, and are completely missing out on hundreds of great libraries, years of built up community, and synergy that isn't just a buzzword.

Pure FUD. Nothing else to call it. Unless you possess evidence that forking the community is one of the goals of the IronPython project, how could you possibly defend these statements? Surely you aren't suggesting that they should have solicited your permission before they announced a "1.0" version?

> I really want this to all work out. IronPython, can we get along?

Let me answer your question with another question. What are you, personally, willing to do, to get along? If "getting along" is the goal, why does the full weight of the effort fall on those that choose to code in IronPython?

Is it "getting along", or is it "do what I want"?

In reality, Python is in flux. This is a feature, not a bug.

Python design features are being formed the correct way: they are informed by the different implementation decisions of different implementations of Python, and then collecting "in the wild", "real-world" best practices.

It sounds like the ongoing approach with Python language implementations to be as different as the library implementations.

Jython's different destructor/garbage collection mechanism seemed to set this tone early on.

But when these differences trickle all the way up to fundamental data type/STRING implementation differences per platforms, you've got a powerful point.

Python will continue to be the cross-platform problem-domain solver, but like C, the cross-platform compiler, it will be riddled with underlying mechanism riddles. Python: Garbage collection, C: Struct alignment.

I never said Python was perfect, so obviously issues where implementation details are relied on or are the basis for a module (like bytecode decompiling) need to be fixed. The modules that rely on implementations need to be fixed, and the ones that center on implementations need to be delegated in usage and marked, maybe renamed with an underscore to denote their status as internal.

Furthermore, you either missed or ignored the part where I did say that these modules do work in IronPython most of the time, and that my issue is with their lack of inclusion, not their lack of working. The sad fact about a lot of this, is that many fo the libraries not included in IronPython actually work perfectly, if they would include them in the distribution, without change.

I don't want to argue with you more, so I'll stop attacking the issue and focus on promoting repair of the problems I see. But, they had to be mentioned first.

This rant is about ten years late: the same issue applies to Jython. I'd accept that this can be confusing, though, and that Jython has lacked a distinct "plain" or "byte" string type, disregarding byte arrays, of course, which is what you'd use in Java.