Small Intro

Andrew Hutchings (aka LinuxJedi) has worked on many of the Open Source software projects that make up the Internet. He now works from home in the UK for MariaDB developing their MariaDB ColumnStore engine.

All views / opinions on this blog are his own and not necessarily those of his employer or anyone else.

Tags

Protocol reverse engineering with tcpdump

Sometimes network protocols don’t entirely behave as documented. Other times there is no documentation at all beyond code. Either way you can sometimes find a need to sniff the traffic of a connection to find out what is really going on.

Whilst I have been working on MariaDB ColumnStore for a year now there are still some parts of the codebase I know little about. I recently had to write some code that worked with the network protocol of ColumnStore, but there were a few parts that were difficult to understand exactly what was happening just by looking at the code. This is where tcpdump came in.

tcpdump is a powerful tool to help you sniff the raw packet data for network connections. It can be very verbose giving parts of the TCP/IP handshake, headers, etc… This is way more than I often need for reverse engineering network protocols so I use tcpflow to filter the results. The final command looks a little like this:

sudo tcpdump -i lo -l -w - port <PORT> | tcpflow -D -C -r -

Breaking this down we are listening on localhost interface with a line buffered output to pipe using raw packets. We then use tcpflow to just show the hex data when reading from the pipe.

If we look at port 8616 (DBRM controller) for ColumnStore the end result can look a little like this during a small insert query:

From observing the ColumnStore messaging code I know that “37c1 fb14” is an uncompressed packet header and the next 4 bytes are the packet length. The next byte is usually packet type (or response) which we can lookup some ENUMs to discover. From there we can figure out the rest packet contents. I won’t go into details here but on some occasions it required printing off this data and using highlighters to figure out the parts of the packet.

This method has been extremely useful for other things in the past as well such as debugging MySQL’s replication protocol. It is definitely part of my toolset for working on network daemons. If there are any similar tools you use please put them in the comments below. I’m always interested in improving my workflow and toolset.