Re: [clisp-devel] multi process design doc?

I hope someone out there is willing to discuss this with me.
Who out there is at all interested in this topic?
As a design document this left a lot to be desired.
I couldn't even tell what the goals were. I get the feeling that
the desire is to allow multiple processors to use the lisp heap at
the same time. In that case I still wonder what the desired locking
granularity is. If there's any other documentation for this design
or any other design on which it is based, or for that matter, any
other design with similar goals, I'd appreciate a pointer.
1. Global C variables
2. The C functions
3. Make socket_getpeername reentrant.
4. Make libreadline reentrant.
About these I confess total ignorance.
5. The access to current_thread() must be very fast. It seems best
to do it this way: Choose a very large memory region (say, 1 GB),
and divide it up into 8 MB chunks. Each of the chunks is used as the
stack for a possible thread. This gives room for 128 threads, not too
bad. Given a stack pointer SP, SP & 0xff800000 depends only on the thread.
Therefore we can just put the `struct thread_' at exactly this address.
(Using mmap, of course.)
This is one of the things I don't like. On today's machines with only
2G available VM it's really a problem to allocate half of it to
stacks. Now that clisp is capable of using > 16M heap it's quite
possible that you might want to use as much of that VM as you can get
hold of for your own data.
Also, while you might think that 100 processes is plenty, there are
good reasons for wanting to get more. Most will be idle most of the
time, of course, and also most will use very little stack most of the
time. That's why I'd like to pay for only (a small amount more than)
the amount of stack they are actually using rather than the maximum
conceivable stack size for each. This problem may well disappear when
we get bigger VM's.
6. It doesn't make sense to use the same symbol per-thread and across
threads.
I don't understand why you think so.
Why should I not have a global *print-pretty* for instance, and still
be able to bind it in one thread?
Therefore `symbol-value' and `setf symbol-value' are still O(1):
I realize this is your main motivation for this scheme.
And you want to make context switch extremely fast.
However I think these can be done in other ways.
I also don't think that special variable access is so critical as to
influence the design in such a basic way.
For more details, see doc/multithread.txt.
Where can I find that?
7. constobj.d
(back to total ignorance)
8. GC must walk multiple STACKs.
of course
9. The *print-circle* get_circularities() must be rewritten to not use
mark bits. Instead, use a 3-level bitmap, similar to the Linux kernel's
page tables.
of course, or use a more standard method of circular printing
10. It must be possible to set breakpoints in eval, funcall, apply, and
the bytecode interpreter. A global variable suffices in the beginning;
later it can be a real breakpoint (implemented via mprotect()). This
is hairy.
What does this have to do with multiple processes/threads?
11. This mechanism is used by the GC: Before GC starts, it has to stop
all threads. When GC is terminated, the threads are woken up.
If "this mechanism" means breakpoints in eval etc. then I don't see
it. GC is normally triggered by cons. You do have to be able to stop
other threads from reading and writing the heap before you gc, of
course. But this assumes you were expecting to allow multiple threads
to access the heap at the same time in the first place. If so, it
seems to me there are very big issues about locking and what you
guarantee to be atomic. In either case, though, I'd expect that it
would suffice to break at a few critical points in the bytecode
interpreter, which would probably be the same points where you'd check
for interrupts.
12. Locks: I'm now convinced that having every object potentially
contain a lock is too bad (because it causes a hash table access, not
just a memory access, for every synchronized operation).
I agree that you don't want every object to contain a lock.
But could we start with something a little more basic?
What operations do you want to guarantee to be atomic?
Obviously you don't want it to appear to the user that everything is
atomic or he gets nothing out of multiple processes. On the other
hand, you at least have to protect against errors that the programmer
has no way to forsee or prevent, e.g., if he does (incf x) in two
different processes then x should end up incremented by two, not one.
(So far I don't even see how you plan to guarantee that.)
On the other hand, if you do (length x) in one process while in
another you're doing (setf x (sort x)) is it required that you get the
same answer that you would have before or after the sort? Or is it
the programmer's job to prevent these from overlapping? If you try to
protect the programmer from this then you have to do something like
- start the sort by traversing the whole list and write locking every
list cell
- similarly traverse the list read locking every cell at the start of
the length function, then perhaps count the cells as you traverse the
list again unlocking them.
- and of course, back up and wait if you can't get the locks.
Deadlock detection is mandatory. Just as every user is happy about
type checking (which correct programs don't need!), they will be
happy about an error message instead of deadlock (although correct
programs don't make deadlocks :-)).
Are you talking about things that are automatically locked by the
system or about the locks that the programmer introduces? I would
hope that the system introduced locks cannot generate deadlock.
Deadlock detection only works if there is a clear notion of a lock
"holder". In general, a lock can have multiple holders, but no thread
can increment a lock's count without taking any responsibility.
You evidently have something very specific in mind here, but I can't
tell what it is.
There are two kinds of data structures:
- Low-level locks...
- Lockable objects...
It's hard to tell what these are meant to accomplish.
13. Locks at a lower level:
- Spin locks, see xthread.d.
I don't see this either. Where is it?
14. Gilbert's catalogue:
with-timeout
This can be done in lisp if you make sleep work right
process-suspend/continue, process-kill
(only to be used as emergency tools)
why only as emergency tools?
father/child-notions among threads
why is this needed or even appropriate?

Thread view

I hope someone out there is willing to discuss this with me.
Who out there is at all interested in this topic?
As a design document this left a lot to be desired.
I couldn't even tell what the goals were. I get the feeling that
the desire is to allow multiple processors to use the lisp heap at
the same time. In that case I still wonder what the desired locking
granularity is. If there's any other documentation for this design
or any other design on which it is based, or for that matter, any
other design with similar goals, I'd appreciate a pointer.
1. Global C variables
2. The C functions
3. Make socket_getpeername reentrant.
4. Make libreadline reentrant.
About these I confess total ignorance.
5. The access to current_thread() must be very fast. It seems best
to do it this way: Choose a very large memory region (say, 1 GB),
and divide it up into 8 MB chunks. Each of the chunks is used as the
stack for a possible thread. This gives room for 128 threads, not too
bad. Given a stack pointer SP, SP & 0xff800000 depends only on the thread.
Therefore we can just put the `struct thread_' at exactly this address.
(Using mmap, of course.)
This is one of the things I don't like. On today's machines with only
2G available VM it's really a problem to allocate half of it to
stacks. Now that clisp is capable of using > 16M heap it's quite
possible that you might want to use as much of that VM as you can get
hold of for your own data.
Also, while you might think that 100 processes is plenty, there are
good reasons for wanting to get more. Most will be idle most of the
time, of course, and also most will use very little stack most of the
time. That's why I'd like to pay for only (a small amount more than)
the amount of stack they are actually using rather than the maximum
conceivable stack size for each. This problem may well disappear when
we get bigger VM's.
6. It doesn't make sense to use the same symbol per-thread and across
threads.
I don't understand why you think so.
Why should I not have a global *print-pretty* for instance, and still
be able to bind it in one thread?
Therefore `symbol-value' and `setf symbol-value' are still O(1):
I realize this is your main motivation for this scheme.
And you want to make context switch extremely fast.
However I think these can be done in other ways.
I also don't think that special variable access is so critical as to
influence the design in such a basic way.
For more details, see doc/multithread.txt.
Where can I find that?
7. constobj.d
(back to total ignorance)
8. GC must walk multiple STACKs.
of course
9. The *print-circle* get_circularities() must be rewritten to not use
mark bits. Instead, use a 3-level bitmap, similar to the Linux kernel's
page tables.
of course, or use a more standard method of circular printing
10. It must be possible to set breakpoints in eval, funcall, apply, and
the bytecode interpreter. A global variable suffices in the beginning;
later it can be a real breakpoint (implemented via mprotect()). This
is hairy.
What does this have to do with multiple processes/threads?
11. This mechanism is used by the GC: Before GC starts, it has to stop
all threads. When GC is terminated, the threads are woken up.
If "this mechanism" means breakpoints in eval etc. then I don't see
it. GC is normally triggered by cons. You do have to be able to stop
other threads from reading and writing the heap before you gc, of
course. But this assumes you were expecting to allow multiple threads
to access the heap at the same time in the first place. If so, it
seems to me there are very big issues about locking and what you
guarantee to be atomic. In either case, though, I'd expect that it
would suffice to break at a few critical points in the bytecode
interpreter, which would probably be the same points where you'd check
for interrupts.
12. Locks: I'm now convinced that having every object potentially
contain a lock is too bad (because it causes a hash table access, not
just a memory access, for every synchronized operation).
I agree that you don't want every object to contain a lock.
But could we start with something a little more basic?
What operations do you want to guarantee to be atomic?
Obviously you don't want it to appear to the user that everything is
atomic or he gets nothing out of multiple processes. On the other
hand, you at least have to protect against errors that the programmer
has no way to forsee or prevent, e.g., if he does (incf x) in two
different processes then x should end up incremented by two, not one.
(So far I don't even see how you plan to guarantee that.)
On the other hand, if you do (length x) in one process while in
another you're doing (setf x (sort x)) is it required that you get the
same answer that you would have before or after the sort? Or is it
the programmer's job to prevent these from overlapping? If you try to
protect the programmer from this then you have to do something like
- start the sort by traversing the whole list and write locking every
list cell
- similarly traverse the list read locking every cell at the start of
the length function, then perhaps count the cells as you traverse the
list again unlocking them.
- and of course, back up and wait if you can't get the locks.
Deadlock detection is mandatory. Just as every user is happy about
type checking (which correct programs don't need!), they will be
happy about an error message instead of deadlock (although correct
programs don't make deadlocks :-)).
Are you talking about things that are automatically locked by the
system or about the locks that the programmer introduces? I would
hope that the system introduced locks cannot generate deadlock.
Deadlock detection only works if there is a clear notion of a lock
"holder". In general, a lock can have multiple holders, but no thread
can increment a lock's count without taking any responsibility.
You evidently have something very specific in mind here, but I can't
tell what it is.
There are two kinds of data structures:
- Low-level locks...
- Lockable objects...
It's hard to tell what these are meant to accomplish.
13. Locks at a lower level:
- Spin locks, see xthread.d.
I don't see this either. Where is it?
14. Gilbert's catalogue:
with-timeout
This can be done in lisp if you make sleep work right
process-suspend/continue, process-kill
(only to be used as emergency tools)
why only as emergency tools?
father/child-notions among threads
why is this needed or even appropriate?

Ok, I got a source and looked at xthread.d and multithread.txt.
They don't contain much.
My previous questions stand.
I'd especially appreciate if someone could provide more references,
for a similar design if not for this one.
Is the goal to allow multiple processors to read and write the lisp
heap at the same time? If so, where is the proposed boundary between
locking done by the system and that which is the programmer's
responsibility?
6. It doesn't make sense to use the same symbol per-thread and across
threads.
multithread.txt does add a little detail here. It suggests that of
the specials in the lisp package only *features* and *modules* would
be global. So I use this as an example. Why should I not be able to
bind *features* in a thread? For instance, I may have different
read-eval-print loops in different threads and want them to read with
different features.

>>>> In message <200001192251.OAA14851@...>
>>>> On the subject of "Re: [clisp-devel] multi process design doc?"
>>>> Sent on Wed Jan 19 17:58:34 EST 2000
>>>> Honorable Don Cohen <donc@...> writes:
>> 3. Make socket_getpeername reentrant.
it calls non-reentrant gethostbyaddr(3)
>> For more details, see doc/multithread.txt.
>>
>> Where can I find that?
in the CLISP source distribution, in the doc top-level subdirectory
>> - Spin locks, see xthread.d.
>>
>> I don't see this either. Where is it?
in the CLISP source distribution, in the src top-level subdirectory
find(1) is your friend!
--
Sam Steingold (http://www.podval.org/~sds/)
Micros**t is not the answer. Micros**t is a question, and the answer is Linux,
(http://www.linux.org) the choice of the GNU (http://www.gnu.org) generation.
We're too busy mopping the floor to turn off the faucet.