Recently I've received a lot more ithreads related questions, so I figured some background information might be in order. Some of these issues were already addressed about a year ago in Status and usefulness of ithreads in 5.8.0, but I think a recap is in order.

This is not a tutorial about the how to use threads. It's more a tutorial about how to use threads in a good way once you figured out they may hold a solution to your particular need.

First of all, if you want to do anything for production use with Perl ithreads, you should get Perl 5.8.1 (or until then, one of the recent maintenance snapshots). There were several bugs in 5.8.0, one of which was a serious memory eating bug when using shift() on a shared array, which are now fixed in 5.8.1.

However, there are still a number of caveats that you should be aware of when you want to use Perl ithreads. It's better to realize these limitations beforehand before you start to put a lot of work only to find in the end you don't have a machine big enough or fast enough to run your code in a production environment.

So what are these caveats? Basically it boils down to one statement.

Perl ithreads are not lightweight!

Unlike most other threads implementations that exist in the world, including the older perl 5.005 threads implementation, variables are by default not shared between Perl ithreads. So what does that mean? It means that every time you start a thread all data structures are copied to the new thread. And when I say all, I mean all. This e.g. includes package stashes, global variables, lexicals in scope. Everything! An example:

This shows that the lexical scalar $foo was copied to the thread. Inside the thread the "same" lexical now lives at another address and can be changed at will inside the thread without affecting the lexical in the main program. But this copying takes place when a thread is started! Not, what you might expect, at the moment the value of the lexical inside the thread has changed (which is usually referred to as COW, or Copy On Write). So, even if you never use $foo inside the thread, it is copied taking up both CPU and memory. But it gets worse: the same applies to all other forms of data. One of them being code references (as shown in this example):

The code references are different! So, did it copy the whole subroutine? I've been led to understand that the actual opcodes of subroutines are not copied (but I've been hesitant to check in the Perl source code to actually conform this, so I'll have to take the p5pers word for it). But all the data around it, in this case the code reference in the package stash, is copied. Even if we never call foo() inside the thread!

Shared variables?
But wait, you might say, shared variables may be a lot better. So why don't I make all variables shared in my application so I won't suffer from this. Well, that is wrong. Why? Because shared variables in fact aren't shared at all. Shared variables are in fact ordinary tied variables (with all the caveats and performance issues associated with tied variables) that have some "magic" applied to them. So, not only do shared variables take up the same amount of memory as "normal" variables, they take up extra memory because all of the tied magic associated with it. This also means that you cannot have shared variables with your own tie-magic associated with it (unless you want to use my Thread::Tie module).

Implications
So what does this mean if you want to use Perl ithreads in your application? Well, you want to prevent a lot of copying of data to occur when you start a thread. One way to achieve this would be to only load modules inside the threads and after threads have started. But that's easier said than done. Observe the following code sample:

On casual observation, you might think that would do the trick. But alas, this prints:

Benchmark has been loaded!

even though you've used the code inside the subroutine with which the thread is started! That's because use is executed at compile time. And at compile time, Perl doesn't know anything about threads yet. Of course, there is a run-time equivalent to use. This example indicates indeed that the Benchmark module has been loaded inside the thread only:

Again, this is caused by use being executed at compile time, before the thread is started at execution time (even though it is listed later in the code). So even putting the use statements after starting your threads, is not going to help. More drastic measures are needed. If you do not want to have all the copying of data, you need to start your threads before modules are loaded. That is possible, thanks to BEGIN {}. Observe this example:

Yikes! What is that! "Scalars leaked: 1". Well, yes, that's one of the remaining problems/features/bugs of the Perl ithreads implementation. This particularly seems to happen when you start threads at compile time. From practical experience, I must say it seems to be pretty harmless. And compared to all of the other "leaking" of memory that happen because data-structures are copied, a single leaked scalar is presumably not a lot. And the error message is probably in error in this case anyway.

Tools for ithreads
So, is programming with Perl ithreads that bad? Well, if you expect lightweight threads as you would in other programming languages: yes. If you expect everything Perl to still be everything Perl even when you're using threads, Perl ithreads will do the trick. Just put a little attention to when you start the threads and what gets loaded when and where, and you should in general be just fine. And there are some modules on CPAN to help you with the various approaches to threaded programming:

Start up a number of worker threads to which jobs can be assigned. Job results can be obtained individually if necessary, using the given job-ID. Parallel resolving of IP-numbers is a typical application for this approach.

fork?
Now you may wonder why Perl ithreads didn't use fork()? Wouldn't that have made a lot more sense? Well, I wasn't involved in the thread design process at the time, so I have no idea what the exact reasons were. I can think of one particular reason, and that's the communication between threads, particularly for shared variables. Particularly to get the blocking right, where one thread is waiting for one or more other threads.

Not being hindered by the reasons for not using fork(), I developed a threads drop-in replacement called forks. Initially started as a pet project to see whether it would work at all, it became a bit more serious than that. The forks.pm has the distinct advantage of being able to quickly start a thread. But that's just because it does a fork(), which in modern *nixes is very fast. The communication and blocking and shared variables are handled by a TCP connection between the threads, in which the process holding the shared variable values is the server, and all the other threads (including the "main" thread) are clients. What you win in a quickly starting thread, you lose in delays with communication. So if you're not passing around a lot of data between threads, forks.pm might be for you. And additionally, forks.pm has the advantage of not needing a thread-enabled Perl. In fact, it even runs on Perl 5.6.0!

The future?
So what can we expect in the future for Perl 5 ithreads. Well, a COWed approach to shared variables is being considered for Ponie, but that's still at least a year or so in the future. And that doesn't seem to fix the non-shared data copying problem when a thread is started. And Perl 6, you may ask? It's not clear how that is going to be accessed from the Perl 6 language, but Parrot seems to consider everything a continuation. And a thread is a special case of a continuation. So I think in Perl 6, things will be good from the start.

Liz

Comment on
Things you need to know before programming Perl ithreads
Select or Download CodeReplies are listed 'Best First'.

In the context of this discussion, it is worth repeating the thread usage discussion in "perldoc perlthrtut". I have emphasized the primary usage directive, which explains the "right" way to use threads with the current implementation (5.8).

Performance considerations
The main thing to bear in mind when comparing ithreads to other
threading models is the fact that for each new thread created, a
complete copy of all the variables and data of the parent thread has to
be taken. Thus thread creation can be quite expensive, both in terms of
memory usage and time spent in creation. The ideal way to reduce these
costs is to have a relatively short number of long-lived threads, all
created fairly early on - before the base thread has accumulated too
much data. Of course, this may not always be possible, so compromises
have to be made. However, after a thread has been created, its
performance and extra memory usage should be little different than
ordinary code.
Also note that under the current implementation, shared variables use a
little more memory and are a little slower than ordinary variables.

Nope. That's not really true. When you do use threads::shared(), what in fact happens is that in the background a thread is started. The dataspace of that hidden thread, contains the final version of each shared variable.

All the other "shared" variables are basically simply tied variables to the "threads::shared::xxx" module. When want to get the value of a shared scalar, internally the FETCH subroutine gets excuted which fetches the value from the hidden thread, stores it in your local thread dataspace (mainly for consistency with XS modules) and returns that value. So the same value exists both in your thread as well as in the hidden thread.

Same thing happens if you want to store a value in a shared scalar: the STORE subroutine stores the new value both in your local thread dataspace and in the hidden thread's dataspace. Some mutexing is involved of course. But e.g. incrementing a shared variable without locking is not guaranteed to increment. This is because incrementing with tied variables is implemented as a FETCH and a STORE, and between the FETCH and the STORE in one thread, another thread can already have done a FETCH. The following will rarely show the expected total:

You'd expect $scalar to have the value of 10 * 100000 = 1000000, however unless you have a very fast machine, you will find it to be significantly less. To do this properly, you would have to lock the variable before doing the increment. For example:

As you say, Perl 6 uses continuations, and with continuations you can implement threading.

However the threading model that you get is a cooperative multi-tasking model in one process. Which means that you don't get the ability to keep running other threads when one thread does a blocking operation (like a long-running database call), and you again don't get the ability to ever take advantage of multiple CPUs.

But I know that elian is fully aware of both points, and would be astonished if he did not make the OS threading model available to Parrot in some fairly sane fashion.

I think you can do these thing well with continuations. It would work like this. Instead of calling a blocking operation (like read or listen), you call the same in non-blocking mode and a yeald to the next thread repeatedly, until the non-blocking call succeeds. This can be implemented at any level: either Parrot could do it, or a threads library written in Parrot and using continuations, or your program.

For some operations it might be a problem that a non-blocking operation is unsupported by the OS. But there's an other problem unrelated with Perl: in some cases you cannot just use nonblocking syscalls, you need blocking calls (maybe in an other OS thread) to simulate a bckground operation. An example for this is opening a unix fifo. Opening either end of a fifo normally blocks until the other end gets opened. If both processed try to open a fifo with O_NONBLOCK repeatedly, the operation will never succeed (or only by chance after a long time, I don't know). Is this clear?

IMO, to solve this problem we'd need some kind of background system calls in the OS that can be started, then checked if it's finished nonblockingly, (or cancelled, or requested that a signal be sent when finished, but these are not quite important). We'd also need a new select syscall that can wait for multiple background syscalls.
I would like an OS where all syscalls (except for the most trivial ones like getpid and time) can be started in any of these four modes: blocking, non-blocking, background, and signal-sending.

Some sub-cases are supported in linux too, but not a general mechanism I imagined here. For example, linux has read and write that runs in the background (asynchronous io), and can send signals when a filedes is ready for io, but this is not extended for other operations like opening a fifo.

Addressing them out of order, the second point that I made is that cooperative multi-threading doesn't get you the ability to take advantage of multiple CPUs. (Or even multiple virtual CPUs, see Intel's hyperthreading.) That remains true.

The first point was that any blocking call anywhere blocks the whole program. Yes, there are strategies to avoid making blocking calls. However those strategies might not work in every case, and depend on everyone's cooperation. Even if a strategy can work, unless it works on every OS of interest, Parrot will have to implement the blocking forms. Furthermore in the real world, you have to accept that some extension authors will slip blocking calls in.

Therefore even though cooperative multi-tasking can work smoothly in theory, in practice it doesn't.

This claim isn't just cynicism on my part. Historically cooperative multi-tasking has been the first kind of multi-tasking that people reach for. After all it has lots of advantages. It is simple. You can trivially avoid lots of nasty synchronization bugs. In theory it can work wonders. But again and again experience showed that people don't all cooperate, and people have wound up biting the bullet and accepting pre-emptive multi-tasking.

Hmm, so is there a way to efficiently have a couple threads work on the same large data buffer? Simplest case is producer/consumer flow: if they pipe data from one to another. The Thread::Conveyor looks like its for discrete lumps, not a stream of bytes. I'm afraid a filehandle based on a scaler to use as a pipe will be hidious in this situation!

Now you may wonder why Perl ithreads didn't use fork()? Wouldn't that have made a lot more sense? Well, I wasn't involved in the thread design process at the time, so I have no idea what the exact reasons were.

Hysterical raisins--surprisingly good ones, but still
hysterical raisins. ithreads were originally developed
to emulate fork(1) on Windows, which doesn't have
such a call; thus, they couldn't be built on
fork.

Besides, as your forks module demonstrates,
most of the speed wins of using fork(1) on
systems that have it would be lost in the interthread
communications. The general consensus seems to be that
you should use ithreads in code with a lot of
interthread communications, a magic open (or
such) in code with a little communication, and a raw
fork in code with very little communcation.

Yikes! What is that! "Scalars leaked: 1". Well, yes, that's one of the remaining problems/features/bugs of the Perl ithreads implementation. This particularly seems to happen when you start threads at compile time. From practical experience, I must say it seems to be pretty harmless.

The leaked scalar is the result of returning a closure from a thread. In many cases it is, as mentioned, harmless. However, in some implementations of Perl, returning a closure from a thread may cause the interpreter to crash (i.e., dump core). Caveat emptor.

> But that's just because it does a fork(), which in modern *nixes is very fast.

this unfortunately is not true now, because while pages are still 4kB, memory rapidly grows last few years, so if process uses 500MB, it has around 125.000 pages, and just copy page descriptor tables take about 0,5 sec.

Well, a COWed approach to shared variables is being considered for Ponie, but that's still at least a year or so in the future. And that doesn't seem to fix the non-shared data copying problem when a thread is started.

I think you meant a COWed approach to non-shared variables, and in fact that should fix the data copying problem.

Remember, a shared variable is nothing other than an ordinary tied variable, and cloned as any other variable when a thread is started. Whenever a thread accesses the value of a shared variable, its value is copied from the hidden shared variables thread into the thread local copy, and then presented to the outside world as a "normal" variable. Whenever a thread updates a shared variable, the value is updated in the thread local version, as well as in the hidden shared variables thread.

There is one more problem with threads even with PERL 5.8.8 ... on Win32 machines with multi processors...
Sometimes, even if it appears that the threaded scripts are working...they are actually not ...
I found out that running scripts on a single CPU will generally solve the problem...

I have following queries:
ithreads is how much robust in perl 5.8.8? Like the problem with shared variable is sorted out or still Thread::Tie module is still preferable?
The package variables(declared with 'our') is paced in symbol table of the package. So when we import a package with 'use' directive in main package, is it safe to declare them as shared variables?

Nice write up. Any idea how to get the actual system (windows) thread ID value rather than the lexical TID? I have several pools of threads and I believe one is going into a dead lock but cannot determine which one. Thank you.