During the course of my work on botnet security we have had to deal with mammoth traffic traces captured at a local ISP. While analyzing the traffic we needed to extract traffic for some certain hosts out of large pcap files. An obvious solution would be to run tshark once for each host, filtering the traffic for that particular IP and writing it to a separate pcap file. However with the number of hosts approaching thousands and the pcap traces approaching terabytes in size tshark didn’t really fit the bill.

Initially I thought of writing a splitter in Python but my colleague’s aversion for using Python on large network traces coupled with lack of maintenance of libpcap bindings resulted in me going for C/libpcap directly. The new C-based slicer is available at our GitHub respository. It needs glib to compile though, as I needed a hash table implementation for maintaining the list of hosts that need to be sliced. The Makefile in the repository should take care of compiling with the appropriate flags.

Onto the performance, the speed of slicing is only throttled by libpcap‘s own read/write throughput as most of the remaining work is done in constant time. It took only 71 minutes (or 1.1 hours) to slice 1019 hosts out of a 180 GB pcap file on 2.5 GHz CPU. In simpler words, it’s lightning fast.

Right now the script does its job well enough. If someone needs to package it I’ll prefer removing the glib dependency in favor of perhaps glibc‘s own hash table implementation (search.h). In any case, I hope it proves helpful for other people playing with large pcap files.

Kami bhai, I might start blogging again, and you told me to let you know if I wanted a new domain. Don’t like the name I have right now either. Haha. If you can do something, email or comment on my blog.