The idea is that clients should connect randomly to one of the addresses on the server, to provide some distribution of the traffic between the two ISPs. On the server, multihome is used so replies to traffic coming in from one ISP go out the same ISP.

Problem: under certain mysterious circumstances, some clients are not able to connect. Better (or worse): the client connects, say to 1.1.1.1, the handshake completes successfully (up to the "Initialization Sequence Completed" message), but as soon as data traffic begins to flow, the client log fills with these messages:

Some investigation on the server side, raising the debug level, shows that the server changes the outgoing interface after the initialization sequence is completed and the first data packet is sent to the client (in bold below):

To make things worse, the problem does not always happen. The first client that connects always works fine, regardless of the IP address it connects to. If a different client connects, that works fine as well, again regardless of the IP. But if a client connected to one server's IP disconnects and then reconnects after a short time, and that new connection goes to the server's other IP, the problem happens.
To add some fun, the fact that I have another virtually identical setup in production, and that works flawlessly.

After some days of unsuccessful troubleshooting, no progress on the problem. Searching the Internet, an old message on the mailing list hints at something similar, but it's not clear whether that could apply to the case in question, so it's no help.
Being lost and thinking of some bug or some other obscure corner case (though the config looked straightforward), I ask for help on the openvpn-devel mailing list, where another (different) issue related to multihome was being discussed (and, btw, sorry for hijacking the thread).

Well, the reply surprised me. James suggested to add nobind to the **client** configuration. I was thinking of something weird on the server instead. And adding nobind to the client config indeed did the trick. A quick check of the client config in the working production environment reveals that the clients are indeed using nobind there.

Here's the official explanation for the behavior:

Using nobind on the client for UDP client connections generates a socket
with a dynamic source port number. This is key because it means that
when the client reconnects, it does so with a new source port number,
and this allows OpenVPN to detect that the initial UDP packet represents
a new connection, and is not part of the old connection.

The problem is that when nobind is not used, the source port on the new
connection is recycled -- it's the same as the old connection. So when
OpenVPN sees the connection-initiating packet, after the client switches
over to the secondary server address, it gets confused because it
doesn't expect sessions from a given source address to change its
destination address mid-session.