How Can the Packet Size Be Greater than the MTU?

So you’ve got a problem and you decide to fire up Wireshark and take a capture. When you look at the packets you see a bunch of them that are far larger than the 1500 byte MTU.

HOW CAN THIS BE?!?!?

There’s something you need to know about taking captures on the host that is sending data. Let’s say you’re uploading some data to a server while capturing packets on your machine. You look at the capture and see something like this:

Clearly these large packets exceeding the MTU must be part of the problem, right? Probably not. Here’s why.

Here’s the kicker: Wireshark uses libpcap or winpcap to grab the data before it gets handed to the NIC.

Check it out:

So you don’t see the actual packets that are put on the wire unless you capture outside the sending host with a tap or span port. This is one of several reasons it’s a good idea to capture traffic outside of the hosts involved in the connection whenever possible.

Here’s what the data looks like captured on the sender and then arriving at the receiver after it has been segmented:

This behavior makes TCP sequence number analysis a pain in the ass. If you’re a network troubleshooter using packet analysis, you’ve GOT to be comfortable doing sequence number analysis.

I saw someone post on reddit the other day asking about sequence number interpretation in tcpdump output. The most upvoted comment said that they had been looking at tcpdump output for 15 years and that they had never had to calculate sequence numbers.

WHUT?

I mean, what have you been doing for 15 years, son?

Anyway.

There’s another side to it that I recently saw for the first time. Large Receive Offload (LRO) or Receive Segment Coalescing (RSC). The is the same thing but in reverse. The NIC coalesces TCP segments it receives from a remote host into larger packets before sending them up to the TCP stack. Again, by offloading this to the NIC, it’s a performance enhancement but a pain in my ass.

Check out this capture taken on the client. Notice that this large frame is coming from the server and there’s no way it could have traversed a WAN without fragmentation, so it must be LRO.

One time, I got annoyed so much at this behavior that I wrote a perl script to break large packets in a capture file into MSS sized packets just to make sequence number analysis easier. I don’t know if anyone is interested in that, but I could post it up if y’all wanted. Of course, if you plan ahead you could just disable segmentation offloading before taking the capture.

So next time you take captures on a host sending and receiving traffic, do not be alarmed if you see Really Big Packets™.

Share this post! Spread the packet gospel!

Related

Follow

About the Author

I like being the hero. Being able to drop a bucket of root cause analysis on a burning network problem has made me a hero (to some people) and it feels real good, y’all. Get good at packet analysis and be the hero too.
I also like french fries.

Leave a Comment:

Name *E-Mail *Website

Save my name, email, and website in this browser for the next time I comment.

Comments:

Notify me of follow-up comments by email.

Notify me of new posts by email.

(22) comments

Derek
August 18, 2014

Thanks Kary! Another great insight! Oh hey, I recommended packetbomb to a guy on reddit in /r/networking who was looking for some help with a file server performance issue. Hope you don’t mind. Thanks again!

In my humble opinion captures should never be taken on client or server unless you can live with the drawbacks and are aware of them. So I would not complain about LSO or LRO, CRC errors etc. if doing local captures, because that’s just what happens if it is done that way.

Also, I would never write a script to break up packets into MSS sizes. When the source (local capture) is already “artificial” it can only get worse by assuming things that may not have happened that way on the wire. E.g. you can only guess the timings etc. But again, if you can live with the drawbacks, go ahead :-)

Cheers,
Jasper

P.S.: there are tons of guys out there that think they know all about TCP, but give them one simple sequence to track and they fail every single time.

Leave a Comment:

Save my name, email, and website in this browser for the next time I comment.

Comments

Notify me of follow-up comments by email.

Notify me of new posts by email.

krishna
August 31, 2015

Nice article to help clear doubts. It would be great if the pkt dumps mentioned above are attached somewhere so that user can themself see how packets under TSO. One thing not clear to me is when TSO is done, are the original TCP options being copied to all the segments or some changes are done in the same.. I am trying to understand how TSO and MPTCP co-exist (if they).

Leave a Comment:

Save my name, email, and website in this browser for the next time I comment.

Comments

Notify me of follow-up comments by email.

Notify me of new posts by email.

Ernest
November 21, 2016

Hello Kary

Thanks for posting, very useful information. I have looked at a handful of Wireshark traces now and have seen ‘TCP Segment of reassembled PDU’ by the way what does PDU stand for Physical Data Unit?

So basically are you saying is if this offload behaviour is in action, it is impossible to deduce any thing sensible from the TCP sequence/acknowledgement numbers in he normal fashion? or am I misunderstanding that point?

Yes, I would be very interested in the Perl script please (I will likely turn it into a PowerShell script as I am working on Windows)

One last question please

Lets say I have to capture on a Windows Server (as the Cisco guys will not setup a span port for me. I then turn off, offloading on the Windows NIC, If the host at the other end of the connection (storage appliance for example) has offloading enabled, will I also have issues with Seq/Ack numbers.

So, supposing that I don’t have access to the network switch. If I use a third machine with a soft switch between the sender and receiver and configure the soft switch to dump all frames to the hard drive, do you think that would be any different? I’m not even sure if the soft switches available today have that feature. Ignoring, of course, if the traffic is such that it can be captured in software without losing frames.

I have captured the traffic from the vyatta, which is the FW in front of my server. I still see the packages there that are greater than the MTU. Is it possible that the vyatta is also doing the reverse LRO as you explained above?

On the dump I am seeing packages with 2764 bytes, although the MTU is set to 1400 on the interface of my server. How is that possible?