**Now, key of course could be a pointer, but there's no reason for it to be particularly. This is easily a superset of futex; if you want
you could just have one global Waitset object, and key could be an int pointer, and you could check *ptr in between PrepareWait and Wait,
that would give you futex. But you can do much more with this.

I prefer having a "waitset" object to put the waits on (like KeyedEvent), not just making it global/static (like futex). The advantage
is 1. efficiency and 2. multiple meanings for a single "key". It's more efficient because you can have different waitsets for different
uses, which makes each one cover fewer waits, which makes all the lookups faster. (that is, rather than 100 global waits pending, maybe you
have 10 on 10 different waitsets). The other advantage is that you can reuse the same value for key without it confusing the system.
You could have one Waitset where key is a pointer, and another where key is an internal handle number, etc.

2. A proper cond_var with waker-side condition checking.

First of all, a decent cond_var API combines a lot of the disjoint junk in the posix API. It should
include the mutex, because that allows for vastly more efficient implementation :

A good implementation should do the compound ops (signal_unlock, etc) atomically. But I wouldn't require that
because it's too hard.

But that's just background. What you really want is to put the condition check in the API. It should be :

void wait_lock( [] { wake condition } );

The spec of the API is that "wake condition" is some code that will be run with the mutex locked, and when the function exits
you will own the mutex and the condition is true. Then client usage is like :

5 comments:

Wake side condition testing is pretty cute, but it might be too hard to implement efficiently. You basically need to have a locked wait list under which you evaluate the condition, stretching out your lock hold time on the wake side. An alternate approach would be to just have more CVs, each of which corresponds to a condition that someone might wait for.

You generally should do the actual waking outside the lock, because thread scheduling is pretty expensive. Windows has the wait morphing optimization for those who signal a cv under an SRW lock, so the scheduling actually occurs when the lock is released.

Wake side condition testing is pretty cute, but it might be too hard to implement efficiently. You basically need to have a locked wait list under which you evaluate the condition, stretching out your lock hold time on the wake side. An alternate approach would be to just have more CVs, each of which corresponds to a condition that someone might wait for.

You generally should do the actual waking outside the lock, because thread scheduling is pretty expensive. Windows has the wait morphing optimization for those who signal a cv under an SRW lock, so the scheduling actually occurs when the lock is released.

"Wake side condition testing is pretty cute, but it might be too hard to implement efficiently. You basically need to have a locked wait list under which you evaluate the condition, stretching out your lock hold time on the wake side."

Part of the point is that if the OS provides an API like this (rather than emulating it in user space), it could be super efficient.

The wait list is just the list of threads that the OS already has sleeping on that signal, no additional list. The API always has the CV mutex held when that list is checked, so no additional mutex is needed. The only addition is that OS has to test the condition before waking a thread.

But that can only be a performance *win*. It's of course much better to test the condition on the waker-side, rather than just blindly waking it and having it checked by the wakee. You can construct pathological cases, like some thread has a very expensive wait condition and it is at the head of the list and never true - but that same case would be even *worse* with the alternative (wakee-side checking, or out-of-cv checking).

And of course you still wouldn't want to just stick every thread on the same CV, that would be an abuse of the system. It's mainly for cases where you have multiple wait conditions that are sort of interrelated by slightly different; like several threads that might consume the same item but have slightly different conditions to do so.