Version 1.1, April 28, 2006
Lots of changes and improvements by David Koconis at ICSA labs (thanks,
Dave!). These are summarized below:
1. Replacing pcap IP Addresses (courtesy ICSA labs)
Changed algorithm for assigning rewritten IP addresses. The new
format is X.HID.N.N, where
- The first byte (X) can be either a constant - provided by the
user on the command line - or taken from the first byte of the IP
address in the original packet.
- HID is the handler ID. This method allows for 254 consecutive
handlers (values 0 and 255 are reserved in the second octet).
- The last 2 octets (N.N) are either chosen at random
and guaranteed to be unique within a pcap.
In choosing to keep the first octet the same as that which was in
the original pcap, you not only introduce randomness and uniqueness
into the address space but also get IP addresses similar to those in
the original pcap since the first octet remains the same. Use the
.d flag on the command line to activate this behavior.
2. Use of Broadcast, Multicast, and Network Addresses (courtesy ICSA labs)
Previously Tomahawk did not handle broadcast, multicast, and network
addresses differently than a normal host or server IP. Thus Tomahawk
wouldn't stop itself from replacing broadcast IP addresses that were
in pcaps with something else - such as 205.160.10.16. As a result
traffic like bootp and DHCP would be replayed with destination IP
addresses that are never seen with in the real world.
Now when Tomahawk reads in IP addresses it determines whether or not
the traffic can be faithfully reproduced. If it decides some traffic
cannot be faithfully reproduced, then Tomahawk does not replay those
packets. The following traffic cannot be faithfully reproduced
and is therefore not replayed: multicast traffic, traffic where
the first 2 octets are 0.0, and broadcast traffic where either of
the first 2 octets is 255. Note that these types of things cannot
be faithfully reproduced by Tomahawk largely due to the fact that
there is presently no way to reconcile the fact that some packets
need a 0 or a 255 as the second octet while the handler is likely
to be something other than one of those values. Also ignored are
any traffic other than TCP, UDP, and ICMP. So, traffic like EIGRP -
for example - is never replayed.
Additionally, Tomahawk is now able to preserve broadcasts where
either or both of the last two octets in an IP address are 255 or 0.
If the last octet alone is 255 Tomahawk will choose a random value
for the 2nd to last octet.
To summarize, these changes ensure that Tomahawk will not replay
multicast traffic, traffic with addresses typically reserved for
networks, and some broadcasts. It is flexible enough however to
preserve broadcasts indicated by either of the last 2 octets from
the original pcap IP address. And in general it is more careful
when assigning IP addresses such that .0 and .255 are avoided (except
as mentioned when preserving broadcasts indicated by one or more of
the last 2 octets).
3. Tomahawk and IP Fragments (courtesy ICSA labs)
Previously, Tomahawk could not be used to replay fragmented TCP and
UDP packets. Here's why: Tomahawk looked at all frames the same.
It always attempted to rewrite the TCP & UDP header checksums.
However IP fragments, other than the first fragment, do not include
a TCP or UDP header. But Tomahawk didn't realize this. Instead
Tomahawk wrongly assumed all packets are non-fragments, determined the
memory location where the transport layer header checksum should be,
and rewrote the data beginning where the checksum would have been -
had it not been a fragment.
Resolution: Tomahawk now checks to see if each packet is either
the first fragment OR not a fragment at all. If either of these is
true it recalculates and rewrites the TCP and UDP header checksums.
Thus when the packet is a fragment but not the first fragment,
it makes no such calculation and doesn't make the mistake of
overwriting anything.
This enables us to use Tomahawk to properly replay pcaps with TCP
and UDP fragments in them.
4. Logging the Blocked Packet (courtesy ICSA labs)
When a Network IPS blocked a packet replayed by Tomahawk this caused
Tomahawk to die. However, it was difficult to determine which packet
was blocked and caused Tomahawk to die especially when modified IP
addresses were used.
Resolution: A command line switch (-L) was added to run Tomahawk
in .logging mode.. In logging mode, Tomahawk provides enough
information about the offending packet to greatly simplify the
process of finding it. This speeds the process of stripping out
each flow from the pcap where one or more of the packets in the flow
was blocked. As a result we are able to replay the pcap through
the Network IPS in its entirety.
5. TCP and UDP Checksums (courtesy ICSA labs)
Tomahawk uses a shortcut to calculate TCP and UDP header checksums.
Tomahawk begins with the *replacement* source and destination IP
addresses and the *original* checksum from the pcap. Tomahawk then
performs some calculations to determine the replacement checksum.
This works only when the original checksum in the pcap is correct.
Unfortunately, Tomahawk fails to consider when the original checksum
is incorrect. Tomahawk also fails to consider when the original
checksum is all 0's - a situation permitted by the UDP RFC (i.e.,
calculating the UDP checksum is optional - and if it is not calculated
it must be all 0's.).
Resolution: Now if Tomahawk sees a UDP packet with all 0's for a
checksum, it leaves the checksum as is instead of modifying it.
We did not correct the problem of pcaps with incorrect checksums
in Tomahawk. Instead, we added a step in creating clean background
traffic pcaps that strips out flows where one or more packets have an
incorrect checksum. Clearly, some Network IPS devices may not like
it if there are packets with incorrect checksums especially since
the UDP RFC says these can be silently discarded and we wouldn.t
want our background traffic stopped because of such traffic.
6. Tomahawk and Products that Alter the TTL Field (courtesy ICSA labs)
Some Network IPS devices modify the TTL field, setting it to
the lowest value that they have seen in a stream of traffic.
Dynamically modifying frames like this caused headaches for Tomahawk.
Here's why: Suppose Tomahawk sends frames from eth0 through the
Network IPS to eth1. Because Tomahawk changes the IP addresses,
ethernet addresses, header checksums, etcetera - there needs to be
some means on the receive side to ensure that the frame received at
eth1 corresponds to the frame sent out of eth0. Once it ensures
that the sent frame has been accounted for, Tomahawk can send out
the reply frame in the same stream on eth0.
But Tomahawk doesn't want to chew up cycles replacing the addresses
in the received frames with the actual addresses, then recalculate
the checksums, and then compare what it received to what it thinks
it should have received. So, before doing any other work Tomahawk
checks if the lengths of the frames are the same. And if so, then
Tomahawk smartly gets around this extra processing by zeroing out
the IP addresses and checksums (and perhaps a couple other things)
on both the frame it received and on the frame it thinks it should
have received. Finally, Tomahawk compares the first 100 bytes of
these two frames and verifies that they are indeed the same.
This works great and saves CPU processing until vendor products
start to mess with other fields - like the TTL field. Here's an
example of the headache for Tomahawk: The simple, benign http pcap
that comes with Tomahawk has a higher TTL value for one of its FINs
than any of the other packets. As a result some vendor products
reset this to the lowest TTL value previously seen in the stream.
Once the frame is received and after zeroing out the fields, the
100 byte comparison shows that the received frame (which had its TTL
modified) differs from the original. As a result Tomahawk conclude
that the frames are not the same and times out waiting for a frame
since the one it received is not the expected one.
Resolution: We modified the means by which Tomahawk determines whether
or not the frame received is the same as the original frame that
should have been received. Tomahawk now zeroes out not only the
IP addresses, ethernet addresses, and checksums on both the frame
received and the original frame but also zeroes out the TTL field
on both frames as well. Then it still compares the first 100 bytes.
In this way Tomahawk can still accurately decide whether or not the
frames are most likely the same. If so, then the pcap does not die
when it is replayed.
This enables us to test Network IPS products with Tomahawk that may
sometimes dynamically change things such as the TTL field.
7. Code Efficiency (courtesy ICSA labs)
Tomahawk allows the user to pass a rate limiting value from the
command line. Keeping track of the number of bytes sent out, Tomahawk
performs some math using the system clock through the ReadSysClock()
function to rate limit traffic. Due to this functionality, the
code when executed is not as efficient as it could be. In fact,
profiling the code shows that Tomahawk spends 6-8% of its time in
the ReadSysClock() function - even when no rate limiting parameters
are passed to it from the command line.
Resolution: Calls to ReadSysClock() are no longer performed when
rate limiting is not chosen at the command line. The result is more
efficient code that when executed improves throughput. This increase
in throughput for a single Tomahawk box ultimately helps decrease
the number of Tomahawk engines and switches in our testbed, thereby
decreasing both expenses and testbed complexity.
8. Running Multiple Tomahawk Processes at Once (courtesy ICSA labs)
If one were to start two Tomahawk instances on the same box and
use the same IP address space for replacement IP addresses, it was
impossible to prevent overlap of IP addresses and handlers used by the
other Tomahawk process. As a result, one process might determine that
a frame belonged to it when in fact it belonged to the other process.
Because of this potential for overlap, only 1 Tomahawk process per
box was reliably possible.
Resolution: Users can now specify the starting and ending handler ID.
Since the handler is now the 2nd octet in the IP address, overlaps in
the address space and handlers are avoidable. For example handler
.1 through .23 may be chosen for one Tomahawk instance while .24
through .46 may be chosen for another. If the 16 net is chosen for
both Tomahawk processes, then one would run from 16.1.x.y through
16.23.x.y while the other would run from 16.24.x.y through 16.46.x.y.
So, even though the IP address space specified may be similar,
Tomahawk processes will not inadvertently steal the others' frames.
This capability was added originally in hopes that having more than
1 Tomahawk process running on the same box would increase throughput
and reduce cost for lab infrastructure. Unfortunately, the result
was that we did not find that this markedly improved throughput since
each Tomahawk process must examine all traffic coming into the NIC
and since each process ultimately discards half the traffic - since
1/2 the traffic was meant for the other process. Note that we did
find this functionality useful for testing the logging of exploits
when repeatedly sent to and from different IP addresses.
9. What happens if the pcap Contains > 65535 Unique IPs? (courtesy
ICSA labs)
Tomahawk may have problems if a pcap has > 2^16 unique IP addresses.
There was no check for this possibility.
Resolution: There is now a check to see if there are > 2^16 unique
IP addresses in a pcap. When encountered it gets reported and
then Tomahawk exits. This may be an unlikely problem as most of
our larger pcaps had ~1-2K unique IPs. Nevertheless the check was
added as a precaution since it is one less problem to debug.
10. Handling Truncated Packets (courtesy ICSA labs)
If a pcap is captured with a non-1518 snaplength, then there may
be some number of non-full frames. libpcap captures information
related to this before each frame in a pcap. In fact libpcap writes
two values before each frame in a pcap. The first is the size of
the frame that was on the wire and the second is the size of the
frame that was captured. If these values are not equal, it means a
non-full frame was captured. Tomahawk dies when it encounters the
first truncated, non-full frame.
Resolution: After determining that you have a pcap with some non-full
frames it still may be salvageable for use with Tomahawk. You just
have to identify the non-full frames and strip out all those frames
as well as their corresponding flows. But you do not want to have
to do this one at a time - i.e., find one, strip it out along with
its corresponding flow, replay the pcap a second time, find the next
one, strip it out along with its corresponding flow, repeat, repeat,
repeat. Therefore in log mode, we added code to log all truncated
packets to a file instead of stopping at the first truncated packet.
Then if there were any truncated frames Tomahawk exits after
reading through the entire pcap and logging this information.
Now since they've been identified you can collectively strip out
all the truncated frames and their corresponding flows at once.
Then you can make a decision based on what remains as to whether
the pcap is worthwhile or not.
11. Send Groups - How Many Frames From How Many Flows Can
Tomahawk Send? (courtesy ICSA labs)
Tomahawk's algorithm for choosing what frames to send could, in
some circumstances, not run as fast as possible. The algorithm was
modified to improve performance.
The original algorithm begins with Tomahawk looking at the first frame
in the trace. That frame is then part of the first group sent out.
The interface that the first frame is sent out is noted by Tomahawk.
The next frame is then reviewed. If it was supposed to go out a
different interface than the first, the frame is not added to that
first group and it is not sent. No further progression into the
trace is possible until that first frame comprising the first group
is sent and received. This is true even though the default value of m
(20) has not been reached.
The maximum number of outstanding frames that can be sent in a
group by Tomahawk (i.e., denoted by 'm') is a command-line option.
The default value is 20. This may lead one to believe that Tomahawk
often sends groups of 20 frames at a time. However, in reality,
it is seldom true that 20 consecutive frames in a trace are sent
out the same interface. The normal group size is almost always less
than 5 and very frequently just 1 or 2 before a frame going out the
other interface is encountered.
Here's an example of how the previous version of Tomahawk could slow
things down:
Suppose multiple TCP packets (e.g., 5) are sent from an http server
to the client to download a file and the packets pass through the
Network IPS device in the packet order 5-1-2-3-4. The Network IPS
caches packet 5 temporarily since it is out of order until 1-2-3-4
have arrived or a timeout has been reached. In a pcap, however,
the packets may be stored as 5-1-2-X-3-4 (for example), where X
is a packet from some other flow that needs to go out a different
interface than packets 5-1-2-3-4. Tomahawk's send group then
will only consist of 5-1-2 even with a higher value for m than 3.
The other packets will not be sent until these 3 have been received.
When the Network IPS receives this group of 3, it caches them since
it is expecting to receive packets 3 & 4 imminently. When it does
not, throughput performance suffers and Tomahawk slows down.
Resolution: The new algorithm for deciding what frames can be sent
in a given group takes into account the interface that a frame is
sent on AND the flow that the frame belongs to. The algorithm adds
the first frame (i) to the send group. It stores the interface that
frame i should be sent out on as well as a hash value indicating
the flow to which the frame belongs.
The next frame (i+1) is then reviewed. If there are no frames in the
send group that belong to the same flow as the frame under review,
then it is added to the group (regardless of the interface it must be
sent out on). If there is at least one frame in the send group that
belongs to the same flow, then the frame is not added to the group
unless the interface which the frame should be sent out matches the
interface for the existing frame(s) in that flow that are already
in the group.
If the frame can not be added to the current send group (e.g., since
it is a response to a frame already in the group), then the new
algorithm will progress up to 500 packets further into the trace.
Additional frames will then be added to the send group until all
frames for one side of each started flow in the send group are
collected or m packets have been collected. Once m packets have
been added to the send group the algorithm will now look ahead 500
additional packets only to try and complete any open flows begun in
the send group.
12. Switched to Local Versus Global Run-states (courtesy ICSA labs)
Tomahawk used a global run-state rather than separate, per handler
run-states. The Tomahawk process was running, stopped, or stalled.
Also, it was never clear if the timeouts were accurate since a global
variable kept track of them rather than a local, per-handler variable.
As a result there was only a global measure of progress being made
through a trace. When a handler would send frames from its send
group it would start a global stopwatch and compare that to the
maximum time prior to a timeout. There was no way to ensure that
frames sent from other handlers. send groups would timeout properly.
Resolution: Now, each handler keeps track of its own run-state. There
is no longer a stalled state. Now each handler can be running
or stopped. It.s no longer a per Tomahawk process issue; instead
it is a per handler issue. This way timeouts are more accurately
computed. To do that, each handler records the time after sending
its send group. The handler is periodically polled to see if frames
have been received since the last check. If so, the handler re-sends
only those frames in the send group that were not received, records
the time, and resets the timeout status. If not, it checks to see if
the timeout value has been reached. After these changes you can get
throughputs in the 500-600Mbps with 15-20 handlers instead of having
to use 100 handlers for that same level of throughput as in the past.
13. Ensuring Unambiguous Assignment of Frames to Interfaces (courtesy
ICSA labs)
Tomahawk looks through pcaps and makes IP address assignments
to interfaces. But it does not check to make sure the assignments
are unambiguous. For example, suppose there are 3 flows in a pcap.
The first flow begins with A as the client sending to B. In the
second flow C is the client sending to D. The last is B sending to D.
In the first flow Tomahawk determines that since it has never seen IP
address A that all packets where that IP is the source will be sent
out of eth1. Tomahawk then determines that since it has never seen
destination IP address B that all packets where that IP is the source
will be sent out of eth2. In the second flow Tomahawk makes the same
determination. It determines that since it has never seen IP address C
that all packets where that IP is the source will be sent out of eth1.
And Tomahawk then determines that since it has never seen IP address D
that all packets where that IP is the source will be sent out of eth2.
The problem then occurs with the third flow where B is the client
sending the first packet in the flow to D. Tomahawk has seen both of
these IP addresses. However, previously B and D were both servers;
now B is a client. This time Tomahawk looks at the first packet in
the flow and sees that B is the source. Having seen B.s IP address
Tomahawk is prepared to send that packet out of eth2 as it had
done previously. Before doing this it looks at the destination IP.
Tomahawk had also sent packets out of eth2 where D was the source.
So, Tomahawk overrides its initial impulse to send that frame out
of eth2. Having looked at the destination IP last it decides that
it will send the frame out of eth1 since all packets where D is the
source went previously out of eth2. The result is that packets in
this flow that have B as the source IP will be sent out of eth1.
This could lead to problems for a Network IPS that sits in between.
A Network IPS that is paying attention will notice that frames
having a source IP address = B are arriving on both interfaces of
the same segment.
Resolution: A .warning. mode (-W) switch was added to the Tomahawk
command line. Tomahawk can now be run first in warning mode to
determine where ambiguous assignments like this are being made.
Tomahawk now will write all these packets to a file. They can then
be subsequently removed from the pcap via other means.
14. Rate Limiting Improvement (courtesy ICSA labs)
Since the cbq service is not particularly reliable at rate limiting
above 40Mbps, we can use the Tomahawk rate limiting feature when
necessary. However, it too wasn.t as accurate as it could be.
Previously, the rate was computed from the start time of the tomahawk
process by dividing number of bytes sent by the time. Suppose you
set the rate limiting to 400Mbps. Because of the algorithm, if
there was a long period of time where the program was running under
the specified rate (say 100Mbps), it was possible that afterward
there could be a prolonged period of time where it could run at
(400Mbps+300Mbps).
Resolution: This situation was alleviated by a new algorithm. The
implementation of the rate limiting algorithm was modified so that
it will time average the send rate over the last second.
15. Handler ID Assignment (courtesy ICSA labs)
The function that assigns the ID to a newly created handler was
modified so it will begin looking for a free ID to use that is
one greater than the highest in the set of handler IDs that were
most recently used. This prevents a newly created handler from
reusing the ID of a handler that just finished. This change was
needed since some Network IPS devices will block the second Handler
instance because, if the handler IDs are the same, then the flows
will have the exact same IP addresses as traffic already seen.
Note that since 10-15 handlers are needed to achieve high throughput
rates now, as opposed to the 100-200 handlers needed in the past to
get those high throughput rates, then Tomahawk is far less likely
to cycle back through old handler IDs.