in server/handler, when a message is received on the transport, farm the handling out to a future (this is half-baked & caused additional intermittent failures)

The debugging I've been doing (print statements galore) suggests that when this problem happens, the client does send back the proper :stdin, but it doesn't get picked up by the add-stdin handler in time to write to the :stdin-writer and avoid the expiring timeout.

My current vague suspicion is that it's a CPU contention thing between the server and client.

It's kind of hard for me to reason about the execution of the response-values lazy seq, and where the blocking actually happens. And of course adding logging changes the behavior sometimes. I wouldn't be opposed to adding sleeps in there, but I expect those kinds of things would still be flaky later, on build machines in particular :/

Incidentally, bumping the client timeout up to Long.MAX_VALUE helps a bit the stdin failures, but doesn't solve them. It does create new ones: session-lifecycle, unknown-op, and read-timeout block forever trying to reduce across the whole seq in combine-responses. It was a bit surprising to me to see that reduce in the response-values pipeline, and I pursued that at one point as well, but it hasn't gotten me anywhere. Decreasing the timeout helps to reproduce the failures.