How unreliable is UDP?

16 Oct 2014

I realized something recently: I know virtually nothing about UDP. Oh, I know it's connectionless, has no handshaking and thus doesn't provide any guarantees about delivery or ordering. But, in practice, what does that actually mean?

I setup 5 VPS to send each other a few UDP packets over a 7 hour period. I didn't send much traffic (though that's certainly worth trying). Each server, every 9-11 second, randomly picked a target and sent 5-10 packets ranging from 16 to 1016 bytes.

2 servers were in the same data center in New Jersey. 1 each in LA, Amsterdam and Tokyo.

[Un]Reliability

The first thing I wanted to know was how unreliable UDP was. Are we talking about a delivery rate of 25%? 50%? 75%?

Packets Received - click table to toggle %

Receiver

NJ 1

NJ 2

LA

NLD

JPN

NJ 1

-

2981/2981

2888/2889

2964/2964

3053/3054

NJ 2

3016/3016

-

3100/3101

2734/2735

3054/3054

LA

2901/2941

2932/2975

-

2938/2942

2712/2712

NLD

3038/3038

2771/2772

2724/2724

-

2791/2791

JPN

2551/2552

2886/2886

2836/2838

2887/2887

-

These numbers were better than what I had expected. I was specifically thinking NLD <-> JPN would see above normal loss, but there was none. Data being sent out of LA, specifically to the two servers in NJ, seems to have struggled some. Was there a pattern?

First, I thought maybe the size of the packet would be an issue. Admittedly, I kept them small (16 byte header, 0-1000 byte payload):

Packet Loss Per Size (bytes)

0-115

116-215

216-315

316-515

516-715

716-915

13

11

12

13

23

23

Nothing obvious there. Did the packet loss happen around the same time? Unfortunately, I didn't keep timestamps (why?!), but I did keep a counter per pair. If you look at the 43 packets that failed to make it from LA to NJ2, 29 were lost during 2 ~1 minute periods. The NJ1 packet loss also largely happened during 2 short periods.

Ordering

The other thing I was interested int was ordering.

The first way I looked at this was to measure the inversion of the array. Essentially, that's the number of pairs that are out of order. If you have an array with the values 10, 8, 3, 7, 4, you end up having to do 8 swaps ((10, 8), (10, 3), (10, 7), (10, 4), (8, 3), (8, 7), (8, 4), (7, 4)).

Inversions

NJ 1

NJ 2

LA

NLD

JPN

NJ 1

-

0

2994

2581

4658

NJ 2

0

-

3147

2459

4645

LA

3980

3861

-

3237

4010

NLD

3125

1826

3133

-

4189

JPN

3920

4417

4147

4425

-

Don't know about you, but I'm not sure I find that useful. It sure seems high. Of course, one of the reasons to use UDP is when you're able to discard some packets. If you send 10 000 packets, and they're all ordered, except that the last one is somehow first, you can just discard it rather than doing 9999 swaps.

What if we discard any packet that come after a later packet we've already processed (later meaning the counter is great)? For example, if we get 1, 5, 4, 3, 6, 7, we'd discard 4 and 3 since we've already seen 5. How many "good" packets would that leave?

# of ordered packets - click table to toggle %

NJ 1

NJ 2

LA

NLD

JPN

NJ 1

-

2981

1514

1658

1123

NJ 2

3016

-

1627

1483

1161

LA

1227

1259

-

1485

1067

NLD

1407

1645

1220

-

1096

JPN

980

1083

1141

1087

-

As a slight tweak, what if we group 5 packets together, sort them, then re-apply the above discarding code:

# of ordered packets (with grouping) - click table to toggle %

NJ 1

NJ 2

LA

NLD

JPN

NJ 1

-

2981

2061

2235

1807

NJ 2

3016

-

2214

2041

1889

LA

1868

1873

-

2066

1720

NLD

2200

2273

1920

-

1712

JPN

1541

1804

1735

1732

-

Conclusion

It's hard to draw any conclusions without running this for longer and with more data. Still, it seems that UDP reliability is pretty good. Distance usually involves more hops and each hop increases the risk or something going bad, but if things are normally ok, then distance doesn't seem to be an issue.

What is an issue is ordering. Here, distance does appear to play a bigger factor. By grouping the packets we see a substantial and expected improvement. In a lot of cases, ordering might not matter. Unless you're streaming, it's possible that simply keeping a timestamp and re-ordering on the receiving side would work.

I'd like to test more things. More data for a longer period of time and more locations. I'd also like to compare the performance to TCP. But, overall, I feel that the better-than-I-expected reliability makes UDP something I should keep in my toolbox.