So far I never had a need to look in detail how the OpenVPN protocol actually
looks on the wire. It seems like not many people had that much of a close
look, as the wireshark plugin is fairly recent (from 2012 I think) while
OpenVPN is around for ten more years than that. If I was an OpenVPN developer,
the wireshark plugin would be the first thing I'd write to help debugging and
development. At least that's what I've been doing from OpenPCD to SIMtrace and
through the various GSM and other protocols I encounter...

The reason for my current investigation is some quite strange and
yet-unexplained problems when running OpenVPN on high-latency satellite links.
I'm not talking about high-bandwidth VSAT or systems with dedicated /
guaranteed bandwidth. The links I'm seeing often have RTT (as seen by ICMP
echo) of 2 seconds, sometimes even 5. This is of course not only the satellite
link, but includes queuing on the ground, possibly the space segment and of
course the terminal, including (possibly) access arbitration.

What struck me _very_ odd is that OpenVPN is sending tons of UDP messages with
ridiculously small size during the TLS handshake when bringing up the tunnel.
Further investigation shows that they actually internally configure a MTU of
'0' for the link, which seems to be capped at 100 bytes control payload, plus
HMAC and OpenVPN header resulting in 124 to 138 bytes UDP payload.

Now you have to consider that the server certificate (possibly including even a
CA certificate) can be quite large, plus all the gazillions of TLS handshaking
options in ServerHello, the first message from server to client. This means
that OpenVPN transmits that ServerHello in something like 40 to 60 fragments of
100 bytes each! And each of the fragments will have to be acknowledged by the
remote end, leading 80 to 120 UDP/IP packets _only_ for the delivery of the TLS
ServerHello.

Then you start reviewing the hundreds of OpenVPN configuration options, many of
them related to MTU, MSS, fragmentation, etc. There is none for that insanely
small default of 100 bytes for control packets during hand-shake. I even read
through the related source code, only to find that indeed this behavior seems
hard-coded. Some time later I had written a patch to add this option, thanks
to Free Software. It seems to work on client and server and brings the
ClientHello down to much smaller 4-6 messages.

The fun continues when you see that the timeout for re-transmitting fragments that
have not been ACKed yet is 2 seconds. At my satellite RTT times this of course
leads to lots of unneeded re-transmissions, simply because the ACK hasn't made
its way back to the sender of the original message yet. Luckily there's a
configuration option for that.

After the patch and changing that option, the protocol trace looks much more sane.
However, I still have problems establishing a tunnel in a number of cases. For
some odd reason, the last fragment of the ServerHello is not acknowledged by
the client, no matter whether patched or unpatched OpenVPN is being used. I
get acknowledgements always only up to fragment N-1 after having transmitted N.
That last fragment is then re-transmitted by the server with exponential
back-off, and finally some 60 seconds later the server gives up as the TLS
handshake didn't finish within that time. Extending the TLS handshake timeout
to 120 seconds also doesn't help.

I'm not quite sure why something like 39 out of 39 fragments all get delivered
reliably and acknowledged, but always the last fragment (40) doesn't make it to
the remote side. That's certainly not random packet loss, but a very
deterministic one. Let's see if I can still manage to find out what that might be...