Wednesday, January 16, 2008

Shed Skin 0.0.26

After my 'rant' about not getting much help with Shed Skin, I received a lot of feedback on how to improve packaging and usability in general. Several people even started helping out with this :-) Most notably, Paul helped me to create a Debian package, and James wrote a tutorial, which has by now replaced the README. Thanks guys!!

Shortly after my rant, somehow I also got involved in the Google GHOP project, in which high school students are paid to help out with open source projects. Some of these students turned out to be a real help, and one of them even managed to add complete 're' support to Shed Skin, using PCRE (perl-compatible-regular-expression library, thanks Cyril for the suggestion; this will be available in the next release!).

All of this, in turn, inspired me to add (almost) complete support for os.path and collections.defaultdict (deque was already supported). The combined result of all this should be a much more 'usable' project, although many things can still be improved of course. For the next release, I am planning on splitting up the compiler core (ss.py), and adding support for time, datetime, socket and as mentioned re.

When all this works well, I think the time has finally come to have another look at type inference scalability. Two ideas that stand out in my mind are iterative deepening (restart after an increasing number of iterations, each time combining everything that was learned) and selector-based filtering (basically, the idea is that for a method call x.bleh(), obviously x can only be of a type that has a bleh method, though it can be generalized further, e.g. for x=y).

Anyway, getting back to the subject, I have just released Shed Skin 0.0.26. It took me a while to get there, but this is mostly because it has many improvements:

My own reason why I've not followed Shed Skin can be summed up quite easily:

"Shed Skin will only ever support a subset of all Python features."

I don't want a python subset - that's effectively just C++ using python's syntax. What I want is a full implementation (even if the stdlib is incomplete), with an optimizer that only works on a subset.

That would let me use it on arbitrary python programs, and if the performance isn't good enough I could tweak them so that the optimizer likes them better. Or I could improve the optimizer.

I think having a good polyvariant type inferencer is the foundation of all that. I've not seen indication that yours is viable, which says to me it's not worth pursuing the rest of the project.

yes, that's exactly what I want with shed skin, to be able to use Python as a thin layer above C++. it is not meant to be useful for everybody and every program. if you don't like this approach, why are you wasting my and your own time here? use psyco, pyrex, cython, pypy or whatever.

I think I've clearly demonstrated my type inferencer is promising (see ss-progs.tgz for 27 non-trivial programs that work). and I still know many things that can be improved. for large programs, one can always add type profiling (making type inference dramatically easier) or only compile parts as extension modules.

"don't want a python subset - that's effectively just C++ using python's syntax. What I want is a full implementation (even if the stdlib is incomplete), with an optimizer that only works on a subset."

I'm not so fussy. I think that a lot of Python features end up as the playthings of people who would rather write a load of dynamic machinery than get to grips with solving real problems (whilst accusing Java developers of writing monstrous framework code, I imagine).

"I think having a good polyvariant type inferencer is the foundation of all that. I've not seen indication that yours is viable, which says to me it's not worth pursuing the rest of the project."

How nice to rain on someone else's parade! What the Python community needs is people exploring different avenues and pointing out where improvements in both performance and effectiveness (merely in terms of being able to write better software in the language) can be made. If people decide that a solution isn't worth using, then at least they've had the opportunity to consider it and to learn from what it did and why it wasn't for them.

Telling people, who (for all you know) could have something to contribute to collective wisdom on a topic, that their work isn't worth pursuing just seems petty to me unless there are some real insights involved. It's like the whole "lesser Python" business where someone deems to tell everyone that a whole category of work isn't worthwhile: hardly a fine way of encouraging people to do anything outside the conservative mainstream.

I only meant to provide some perspective on why it hasn't gotten as much interest as it perhaps should have - The first impression of "a faster python" is undermined when they see the limitations. If the suitable uses were listed up front (eg extension modules) then it may be seen in a more positive light.

Reading through your blog, I really understand your frustration. I have a long history of creating things that are at least three years before the world is ready for it.[1] my current project is antispam system[2] based on experiential reputation and the reputation enhancement properties of a proof of work puzzle. It would be interesting the see how much shed skin could speed up 2Pennyblue. After I release the version I'm working on, I may just give it a go.

But for some of the other complaints, I see shed skin development as an evolutionary process. As resources permit, you'll be able to tackle more and more cases where you can generate code versus just running the existing Python byte codes. Even if you get no further than C++ in Python, that's a win! Yes, it may modify how you write Python to take advantage of the performance gain but, it's better to have the gain when you need it in any form than to not have it at all. Don't let the shortsighted get you down. My only concern is that there may be some subtle (or not so subtle) differences from the reference cpython.

You're doing good stuff. I'm looking forward to trying it out.

---eric

[1] And speaking of which, if you know of anyone who would like to volunteer to write some JavaScript help for a open-source writers Project, let me know.

[2] 2pennyblue.org: don't believe everything you read on the website. I haven't fully updated it (mid-February 2008). It originally used proof of work systems as an indicator that you were at "good person". Proof will work systems have the advantage that they are extremely deterministic and you can use that property to determine where they will succeed and where they will fail based on the resources of the attacker. Between 1998 and 2007 (June) they were a successful defense against spam zombies. But recent developments in bot nets have dramatically increased the amount of resources available to an attacker. I'm still skeptical about the claim that they could be used for spam because one would have to ask the question why don't we have a lot more spam than we do. Playing it safe, I shifted focus to reputation properties of proof work system and a variable configuration filter. End result is that proof will work systems can now be used to guarantee delivery of messages for sites with good reputation but become a very high barrier for such a bad reputation. I'll stop here. I can be a horrible boor on the topic.