Comments

Sometimes condition variables are the best fit for a problem, and sometimes channels are. They don't compose well, though.

I've increasingly been finding myself in a position where I would like to select on both channel(s) and a condition variable. This has come up a few times in http2, where I use condition variables for flow control, but there are plenty of channels (including the net/http.Request.Cancel channel) in play too. Sometimes I need to both wait for flow control, or wait for a user to cancel their request, or the peer side to close their connection, etc.

I would love to be able to select on a condition variable, like if the sync.Cond type had a method:

// WaitChan returns a channel which receives a value when awoken// by Broadcast or Signal. Unlike Wait, WaitChan does not unlock// or lock the c.L mutex.func(c *Cond) WaitChan() <-chanstruct{} {
// ....
}

The semantics would be the receiving from it acts like a runtime_Syncsemacquire on c.sema. We'd need to define what happens if another case in the select fires first. (does the semacquire get canceled?).

This comment has been minimized.

My expertise in this area is very limited, but as I understand it, it is important that Wait atomically unlock c.L and suspends execution. Maybe in your sample code this will occur, in that Unlock is immediately followed by select, and we always have cooperative scheduling, but in the event that someone added other code just before the select in order to prepare for the select (e.g. calling a function to decide whether to nil out a channel), then we have a race. Seems like a footgun. And if we moved to pre-emptive scheduling (or used another toolchain), then there might be no way to write the code correctly.

I don't see how this is any more of a footgun than anybody using the sync package in general. Or anybody using two goroutines and sharing global state without the sync package. Bad Go code is full of data races.

I'm proposing this as a mechanism for people who do know what they're doing. Not as some recommended high-level mechanism.

This comment has been minimized.

@josharian I think what you are thinking of is that it is critical that a condition variable not miss a signal/broadcast in between the unlock and the goroutine suspension. Other than that there is no implied atomicity. A simple implementation of condition variables can easily make that mistake--unlock, check signal, (receive unnoticed signal), suspend. Our implementation doesn't have that problem. The required atomicity is in runtime.notifyListWait, via the internal lock it acquires.

This comment has been minimized.

A more generic solution is perhaps for the compiler
to pass stack frame size hints to the runtime.
for example, in Ian's example, the compiler could
recognize that the function will usually use very
little stack, even less than the minimum stack, so
it could hint the runtime to start the goroutine with
a smaller stack.
In general, if we make the compiler determine the
initial stack size for a new goroutine, could that
solve at least half of the problem? (goroutine leaking
might still occur, but that's another issue.)

This comment has been minimized.

Kinda. Really I want a magic type of hchan (sentinel value of hchan.elemtype) which makes hchan.buf mean it's a pointer to a notifyList, and have the select implementation on entry to select do the notifyListAdd, and then if separate case fires in select, we do something like a notifyListRemove, and unregister that waiter.

Otherwise I'm afraid of the leaks where my condition variable is woken up (because, say, I selected on my context.Done()), but then I find that my condition is satisfied and I exit the function, and never wait on the condition variable again. I need to tell the cond.Wait running in the goroutine to stop. Which makes me think it's better done in the implementation of select with a special channel type.

This wouldn't be a language change. But it would involve modifying the runtime to support an API addition.

This comment has been minimized.

edited

could you design a user facing sync.Cond API for this?

we need to automatically lock cond.L when the channel
unblocks and the unlock needs to happen for the goroutine
that is actually winning the race. This seems too magical
to me. Maybe I'm missing something.

This comment has been minimized.

Something that has come to trouble me about this: channels are level-triggered. A channel has a value or it does not, and values do not disappear until somebody reads them.

Condition variables are edge-triggered. A call to the Signal method is a trigger that wakes up anything that is waiting right now. If nothing is waiting, nothing happens.

If we turn an edge-triggered mechanism into a level-triggered mechanism, then we either do something laborious to drain the level (read the value from the channel) if nothing needs it, or we have too many triggers.

This is particularly obvious when we consider the Broadcast method. The Broadcast method is easy to understand in terms of edge-triggering: everything that is waiting right now is alerted. Implementing that was level-triggering, without race conditions, is much more complex. Do we wake all select statements waiting on the channel right now? What value do we hand them?

The select statement, unlike channels, can be either edge-triggered or level-triggered. It would be meaningful to design some way to attach an edge trigger to a select statement. It just wouldn't go through a channel.

In particular, a send-with-default converts an edge-triggered event into a level-triggered event by "latching" on the first edge-crossing. A receive-with-default converts a level-triggered event to an edge-triggered event by "unlatching" the channel when the edge-trigger no longer applies.

The Broadcast method is easy to understand in terms of edge-triggering: everything that is waiting right now is alerted. Implementing that was level-triggering, without race conditions, is much more complex. Do we wake all select statements waiting on the channel right now? What value do we hand them?

On a Broadcast, we would wake all select statements that have evaluated the case to receive it (and reacquire the lock before unblocking the case, which would either require a language change or a very deep runtime hook).

With the "channel as cond-var" pattern we can implement without such a hook, Broadcast corresponds to close (and unfortunately you end up needing to allocate a new channel every time you Broadcast).

With a built-in mechanism, we could do even better: we could have the Signal and Broadcast operations receive values, and the Wait operation return the value from the corresponding signal. (If we want something compatible with sync.Cond, I would propose that we send c.L as the value.) You could imagine using the same (or a similar) mechanism for implementing futures natively: a future is a selectable that can be sent to exactly once but received from any number of times. And there are also some interesting possibilities involving Broadcast with a channel (or another Cond) as a parameter...

The tricky part, I think, is constructing the API such that either the sender or the receiver can abandon a case after it has marked the other side as "ready".

This comment has been minimized.

If we do this, we might also want to extend it to sync.Lockers. I'm working with some code now where I have a sync.Mutex that might be held by someone else for a long time and a context.Context to respect. For the same reasons as Brad already outlined, I'd like to be able to use a select to either acquire the mutex or read from ctx.Done().

This comment has been minimized.

Sorry to stumble in, but this edge/level-trigger goroutine leak issue reminds me of an old golang-dev thread [1] about "non-parallel channels". On that thread, I (admittedly without much thought) proposed a new way to make a channel/goroutine pair where the two are intimately tied, such that when all receivers are garbage collected, the goroutine can also be garbage collected. It's a language change, and not fully-thought-out, but perhaps some such thing could present a more general solution to this type of goroutine leak without needing special handling for select or timer-specific channels.

This comment has been minimized.

For example, I have an implementation of a connection pool which is cleanly expressed with a mutex and a sync.Cond. If there are no available connections nor capacity to create new ones, a caller waits to borrow with cond.Wait. I want to be able to pass a context.Context through and cancel the wait, so it would be nice to select on both cond.Wait and ctx.Done().

This comment has been minimized.

@cespare Are you running into significant cases where you can't easily replace the sync.Cond with a channel (for example, due to too much overhead from allocating the channel)? If so, could you describe them in a bit more detail?

This comment has been minimized.

edited

@zombiezen Provided that you can keep the locked:unlocked ratio low (to reduce contention), I would only use a mutex for the Cond, not for the Mutex itself.

But I think the example you linked is a bit too simple anyway: the Cond usage you're showing there looks more like Signal than Broadcast, and for that kind of usage the channel can function as both the Mutex and the Cond: https://play.golang.org/p/beoQFQEx0e

FWIW I have found sync.Cond particularly useful when dealing with one-to-many
goroutine topologies (for example when one goroutine is changing state and wants to
broadcast the fact of the changed state to many listeners but without blocking
or maintaining per-listener state).

This comment has been minimized.

@bcmills
You're right - I could use a channel there, although it's a pity to have
to allocate a channel for every message send. I suspect there would
be a significant performance hit in a micro-benchmark, but it's probably
negligible compared to other work.

This comment has been minimized.

@bcmills I already need a mutex to protect other internal state. It's very natural and clean to express my code with a sync.Cond that piggybacks on that mutex. The changes required to change the Cond to a channel aren't immediately obvious to me, but I'll give it some thought and provide some sample code in a few weeks when I'm back at a computer.

edited

It would be a considerable improvement, both in readability and runtime resource use, if this could select on a channel and a Cond together. Currently, that code allocates a new channel plus one or two goroutines for every waiter, which is a big waste.

Note that we use a single Cond here, but each waiter is waiting on a slightly different condition. They each want to know when the incrementing counter reaches a particular threshold, but the threshold can vary for each waiting goroutine. This is a poor fit for channels, as you can see from the code. We could modify this code to make a different efficiency tradeoff (one channel per threshold rather than one channel per waiting goroutine), but it would not improve readability.

This comment has been minimized.

edited

We could [use] one channel per threshold rather than one channel per waiting goroutine[,] but it would not improve readability.

@kr The code you linked to seems like it leaks the BlockWaiter and BlockSoonWaiter goroutines for an arbitrarily long time (forever, if the condition never holds but the caller cancels their Context). That seems much more severe than a readability issue, and it wouldn't obviously change if you could select on the sync.Cond.

This comment has been minimized.

Indeed, the BlockWaiter goroutine exists until its condition is satisfied. That's part of what I was referring to by "wasteful". (Note that, in context, this is a reasonable behavior because we expect the counter to make steady progress and eventually to satisfy any threshold.)

If we could select on a Cond, there'd be no need for us to create these one-off goroutines in the first place.

This comment has been minimized.

edited

If we could select on a Cond, there'd be no need for us to create these one-off goroutines in the first place.

I guess that's part of my point? It looks like you'd already be better off without creating these one-off goroutines.

A heap that associates thresholds with channels would let you register and wake up each waiter in O(log n) for a maximum queue depth of n waiters. With a selectable sync.Cond, each threshold is O(n) due to spurious wakeups. That's the difference between O(n log n) and O(n^2) total running time.

I don't know what kind of constant factors you're dealing with, but to me, O(n log n) with n extra allocations seems a lot less wasteful than O(n^2), especially when you factor in the extra cache contention from each of those spurious wakeups locking and unlocking the mutex.

This comment has been minimized.

It looks like you'd already be better off without creating these one-off goroutines.

A heap that associates thresholds with channels would let you register and wake up each waiter in O(log n) for a maximum queue depth of n waiters.

Yeah, that would be the tradeoff I mentioned above. If anything, it would make the code even more complicated. It would certainly not improve readability beyond what we have already, so we wouldn't be better off. As for resource use, our current code is efficient enough in practice, it's not a problem. If we could select on a Cond, the code would become both more readable and more efficient.

Here I'm primarily concerned with code complexity and readability. In other scenarios, the efficiency improvement could be make-or-break; for us, it would be a nice bonus.

(Maybe don't focus too much on efficiency at the expense of complexity. Readability is extremely important!)

I don't know what kind of constant factors you're dealing with, but to me, O(n log n) with n extra allocations seems a lot less wasteful than O(n^2).

I couldn't answer that without measuring both options. But it seems like either one could benefit to some degree from selecting on a Cond.

This comment has been minimized.

@jbasync.Cond itself is not subject to spurious wakeups, but if you're using the same Cond to signal crossings of many independent thresholds, each waiter gets signaled for potentially many threshold crossings that aren't the one it is interested in.

This comment has been minimized.

@cespare Are you running into significant cases where you can't easily replace the sync.Cond with a channel (for example, due to too much overhead from allocating the channel)? If so, could you describe them in a bit more detail?

Let's set aside efficiency for now (though in some cases, as others have pointed out, that's definitely a relevant concern).

I took one of my sync.Cond use cases, which is a connection pool that I mentioned over in #21165 (comment), and extracted out some the interesting stuff into an example that I can share. (The example does not include the full complexity, but I think it's interesting enough to discuss.)

I included the sync.Cond-based implementation (which is what we use in our real code) and I also wrote a channel-based implementation modeled on your demo code at https://play.golang.org/p/beoQFQEx0e.

I'd be interested in what you think about the code. Do you see simplifications that preserve the desired properties that I listed?

I found that writing the channel-based version was a fun and interesting exercise, and I had to think pretty hard to come up with something that works (and to convince myself that it's correct). I had a bunch of previous iterations that had fairly subtle bugs. I'm sure that the cond-based solution is also bug-prone in this way, but on balance, I feel like the channel-based version is significantly trickier/subtler.

This comment has been minimized.

Also running into this issue, where I have a gRPC call that waits for a state transformation on a server and reports changes. If clients cancel these blocking/waiting RPCs, I currently leave goroutines behind, waiting on the condition variable.

What about having a func (c *Cond) WaitWithContext(context.Context) error? By using context objects, support for timeouts comes for free.

This comment has been minimized.

I just found an implementation of a condition variable using channels at https://github.com/anacrolix/missinggo/blob/master/chancond.go . It's similar to some of the backup slides in @bcmills talk referenced above. While it seems useful for event notifications, because of its locking behaviors it would be prone to race conditions if you tried to use it for some of the things one might traditionally use a condition variable for.