Can I suggest something just like os.fork() (except implemented directly in CPython) would be incredibly great...? Shared memory, fast interpreter creation, sharing special kinds of objects over explicit channels (e.g., open files or sockets)... that would be wonderful.

It's all well and good to say something should be done, but it's unrealistic to think Guido or others are suddenly going to leap on the idea after years of discussion and at least one proof against the concept.

This is the open source world where I think "put up or shut up" is a good standard to live by. In other words, if you have a good idea, bring it up when you're ready to actually work on it, write and release actual code yourself, try things out, and get others to test and contribute ideas and code. Then you're on your way to proving the concept and refining things until eventually your code could be accepted as a patch.

But personally, I wouldn't bother with this. I think threading is a bad idea for most things and that the GIL's advantages outweigh its disadvantages most of the time.

> We do need some kind of solution, but it probably> shouldn't be threads. I think a process-based approach is> probably best. I'd like to see if it's possible to, from> within one cpython instance, easily start up a second one> in a different process and easily communicate between> them. Then you could use an agent system and the> programming would become very easy and safe, while> effortlessly making use of multiple processors. And no GIL> removal would be necessary.

I very much agree. Having each interpreter communicate is much better than having them share memory. Sharing means locking, and locking is expensive.

What if we make changes so that you could have multiple interpreters each in its own thread, sharing nothing? Each interpreter can keep its own GIL, and so there needs to be no fine-grained and expensive locking. There should be no performance change for the single-thread case.

This would work well for systems where process creation is expensive, or for embedding into programs that already start threads and would want to have a Python interpreter in many threads.

With a distributed object mechanism, calls could be made to objects in other threads, or serialized objects can be sent between threads.

A good remote-object-call mechanism that works between processes would also work between threads. Then end users could have the choice of using threads or processes to spawn off more interpreters.

I realize that this would place a new requirement on extension writers to lock or make extension globals thread-local. With Py3K coming out, this might be a good opportunity to suggest small changes in the way extensions are written for the new era.

It sounds like there just need to be a few things cleaned up in the thread state for the interpreter, in terms of shared small integers and single character strings.

> What if we make changes so that you could have multiple> interpreters each in its own thread, sharing nothing? Each> interpreter can keep its own GIL, and so there needs to be> no fine-grained and expensive locking. There should be no> performance change for the single-thread case.

Unfortunately, there are many data structures currently shared between interpreters, e.g. obmalloc (our custom super-fast small-block allocator), and immutable singleton objects like 0- and 1-char strings, the empty tuple, None, and all built-in exceptions, functions and classes. Having each interpreter have a separate None would require quite a bit of change in the VM.

Unfortunately, there are many data structures currently shared between interpreters, e.g. obmalloc (our custom super-fast small-block allocator), and immutable singleton objects like 0- and 1-char strings, the empty tuple, None, and all built-in exceptions, functions and classes. Having each interpreter have a separate None would require quite a bit of change in the VM.

All of those objects would be safe to share anyway, wouldn't they? Being immutable, they don't need locking, do they?

> I insist, why not incorporate in the language distribution> the "parallel python" module ? It is something that works,> is stable and can distribute work load to n-processors,> even on other computers.

I second that!

Using PP is extremely easy and follows the nice ease of use Python is well known for. Only one thing for parallel processing along the lines of this thread's discussions is missing: Thread/process communication. For something to go into Python there needs to be some mechanism to communicate between the processes (as in the case of PP). Maybe something along the line of channels as done in Stackless, maybe somehow as "lived" in ProActive within the Java world, ...

So far PP is more along the lines of submit and retrieve (final) results. But I just love it.

> All of those objects would be safe to share anyway,> wouldn't they? Being immutable, they don't need locking,> do they?

Well, their reference counts still change, so you'd have to have to use a thread-safe reference count update macro *everywhere* (since Py_INCREF(x) doesn't know whether x could be None or not). Also, some objects have other invisible state, e.g. PyUnicode objects have an internal reference to their PyString rendition. You don't want to leak that.

All of this may not be insurmountable, but once it's all done I'm not sure I'd recognize the Python/C API, and extension writers would have to start from scratch (more so than with py3k I expect).

Including PP is a start. Channels and Stackless is another (maybe complementary) step. Saying "use os.fork()" is not a good answer. What I mean is that we should have a good answer to threads in the core of python, not a do-it-yourself way (which defeats the batteries included motto).

Maybe the only real answer is not using refcount at all, have a real good garbage collector and have a lesser dependency on c libs, which means pypy... but that is going to take some time still.

All in all the thing that most people is saying is that new computers have almost all more than one core (or processor), and that python should be prepared for it, not that it is slower than java and with that it will be faster.

> We do need some kind of solution, but it probably> shouldn't be threads. I think a process-based approach is> probably best. I'd like to see if it's possible to, from> within one cpython instance, easily start up a second one> in a different process and easily communicate between> them. Then you could use an agent system and the> programming would become very easy and safe, while> effortlessly making use of multiple processors. And no GIL> removal would be necessary.

For what it's worth (not much), I completely agree with Bruce (and the others who have voiced similiar opinions here). A standard built in module for doing inter-process communication and spawning such processes would be awesome. Sure, you can already do this stuff with any number of 3rd party modules, or hack up your own specialized solution, but the need for easy multi-core processing is such a fundamental thing with today's hardware that it would be a shame for Python 3.0 to not have such capabilities out of the box. The complaints are only going to get louder and more frequent on this topic going forward.

That being said, I love Python and have great respect for all the Python developers and the amazing things they have given us all for free. I am not in a position to champion such an addition to Python (I don't have the knowledge, talent, or ambition :P ), so I will just sit back and keep my fingers crossed that one day it will happen! Until then, mpi4py 4me.

> Thanks for the support. I have said numerous times that I> don't want Python development to be driven by e.g.> Ruby-envy. (Ruby BTW has a GIL too.)

I don't 'speak' Ruby, so no envy there. :-) The envy is more from the fact that in so many other languages, even those that I personally don't like as much as Python, I can take advantage of multiple threads 'natively'. Using the thread API there is all I need. It allows me to utilize modern hardware easily.

Yes, I understand it's cPython, not the language itself. That's what makes it even more startling: The language and libraries offer a beautifully simple threading API, but modern hardware cannot be taken advantage of with it.

> But it didn't seem appropriate to ignore Juergen's "open> letter to Guido van Rossum". I hope that Juergen's> response echoes yours, but I'm skeptical until I see it.

Echoing the "don't become distracted" call? Yes, I guess I can't really echo that, since I was writing about it in the first place.

I think the discussion here is beginning to point out possibilities, which is wonderful. Keeping the GIL, but allowing multiple threads to run simultaneously. That would work wonders already. As you said in another response: "...a project to add GIL-free threading to Python might work...".

You know, as I said in my article: I really like Python and it's my language of choice for most anything these days. I posted a couple of articles to that effect before. I don't need to reiterate the points I made about threading here, but I guess you could put it this way: If you really care about something (in this case Python) then having strong opinions one way or the other should be understandable. If I wouldn't care about Python and want it to become applicable for a wide range of applications, I wouldn't have bothered writing that article.

I am not much concerned about CPython GIL, there are different languages which maximize computing efficiency also in many scenarios fork/RPC can help scaling much better then (evil..) threads. Just a thought though.. perhaps naive

Removing GIL is difficult and past results contradict purpose so what if instead the interpreter itself was parallelized and started to scale on duos, quads and whatever comes next (even with single threaded Python program!).