On Wed, Feb 27, 2002 at 11:24:17AM +1100, Rusty Russell wrote:> On Mon, 25 Feb 2002 10:00:25 -0500> Hubertus Franke <frankeh@watson.ibm.com> wrote:> > > Rusty, since I supplied one of those packages available under lse.sourceforge.net> > let me make some comments.> > > > (a) why do you need to pin. I simply use the user level address (vaddr) > > and hash with the <mm,vaddr> to obtain the internal object.> > This also gives me full protection through the general vmm mechanisms.> > I pin while sleeping for convenience, so I can get a kernel address. It's> only one page (maybe 2). I could look up the address every time, but then I> need to swap the page back in anyway to look at it.

As stated above, I allocate a kernel object <kulock_t> on demand andhash it. This way I don't have to pin any user address. What does everybodythink about the merit of this approach versus the pinning approach?Most importantly I think the merit of both approaches stems from the factthat the access is purely through the userlevel address rather thanthrough a handle. In your case, can the lock be allocated at differentvirtual addresses in the various address spaces.How do you cleanup ? Do you search through the hashtable ? In other wordswhat happens if you SIGTERM a process ?> > > (b) I like the idea of mmap or shmat with special flags better than going > > through a device driver.> > Me too, but I'd rather have people saying "make it a syscall" than "eww...> not another special purpose syscall!" 8)

One more syscall its all that needed like semop(). I multiplex all operationsthrough this interface. Agree, its more scalable then a device driver assomebody pointed out that the device has locks associated with it.> > > (c) creation can be done on demand, that's what I do. If you never have > > to synchronize through the kernel than you don't create the objects. > > There should be an option to force explicite creation if desired.> > Absolutely, except there is no real "creation" event. Adding a "here be> semaphores" syscall is sufficient and useful (and also makes it easy to> detect that there is no FUS support in the kernel).

That's what I have done. All internal objects are created on the fly.Somebody pointed out to me that RT programs might want to allocate the internal object apriori, in that case the forcing the creation would berequired.

> > > (d) The kernel should simply provide waiting and wakeup functionality and > > the bean counting should be done in user space. There is no need to > > pin or crossmap.> > See above.

Yip, no state sharing required between kernel and user. I analyzed this vary carefully, particularly for spinning locks, I can show you sequencesof events that get you in real trouble because the atomic update in userspace and the waiting are non atomic. As far as I am concerned separationis imperative.

> > > (e) I really like to see multiple reader/single writer locks as well. I > > implemented these > > Hmmm... my current implementatino only allows down one and up one> operations, but off the top of my head I don't see a no reason this couldn't> be generalized. Then:> 1) Initialize at INT_MAX> 2) down_read = down 1> 3) down_write = down INT_MAX> > Sufficient?>

No, you effectively need two queues or you have to basically replicatethe low level semaphore routines to walk the list and identify thesequence (R*W)* and only wake those up that are the same. There are manyissues in the wakeup. Do I want to wakeup all readers waiting regardlessof intermittent writers....> > (f) I also implemented convoy avoidance locks, spinning versions of > > user semaphores. All over the same simple interface.> > ulocks_wait(read_or_write) and ulocks_signal(read_or_write, num_to_wake_up).> > Ofcourse to cut down on the interface a single system call is enough.> > Interesting. Something like this might be needed for backwards> compatibility anyway (spin & yield, at least).

This is a must. I talked to database folks, they can see huge utilizationout of these locks. I hence echo again Ben LaHaises opinion, the kernelmechanism must be flexible enough to support more complex locking stuff.I have added in my next update on lse download support for queues.

> > > (g) I have an extensive test program and regression test <ulockflex>> > that exercises the hell out of the userlevel approach. > > Excellent. I shall grab it and take a look!

Indeed this is/was a lifesaver, it points out so many interesting thingswith respect to bugs and performance when the rubber hits the road.

> > Thanks for the feedback,> Rusty.

Let's keep the sync going. At this point we need to figure out whatis the right approach for the kernel. Is the explicite kernel object approach I have chosen better/worse/equal to the pinning approach ?Inputs from everyone please. We need to score against codepath length, cleanup, memory requirements

> -- > Anyone who quotes me in their sig is an idiot. -- Rusty Russell.