Hello and welcome to Part 11 of this series, wherein we finally use
some of the code I prototyped way back when I was planning this series.

Where are we standing?

Let's review the progress we've made in the first 10 parts: first, we've
started thinking about what it takes for computers to communicate. Then,
we've followed a rough outline of the various standards and protocols
that have emerged since the 1970s.

We took a few articles to get comfortable with binding Win32 APIs, and
built a ping program on top of its ICMP facilities. Then we dove into
WMI (Windows Management Instrumentation), and right back into Win32 APIs,
just so we could find the “default network interface”.

We started looking at raw network traffic, and parsing Ethernet frames with
nom, then we took a detour through a few ways to deal with error handling
in Rust.

Now, we didn't want our packet sniffer (currently the ersatz binary
in the crate of the same name) to be too noisy, so we filtered out non-ICMP
traffic, but we did that in a sort of wonky way:

Where process_packet() does the actual Ethernet parsing. This was always
meant to be replaced with, y'know, actual filtering - because submiting a
form with “abcdefghijkl” over HTTP would definitely be caught by that
filter, whereas ICMP traffic that isn't sent from Windows's PING.exe
would be missed.

For us to filter things more accurately, we'll need to parse IPv4 packets.

Much like we did for Ethernet, let's take a look at IPv4 packet structure:

There's a lot of interesting things on this diagram - and also, just a lot of
stuff in general, but let's focus on the one thing we want to do: filter only
ICMP traffic - and ignore the rest for the time being:

192.168.1.16 is my laptop's IP address on the local network,
whereas 35.186.224.53 belongs to Google.

We've captured that network packet as it arrived to the network interface
(a wireless NIC), so it makes sense that the destination IP is a local one,
rather than my public internet IP (which you won't see in any of those logs).

Addresses starting with a number between 224 and 239 are used for IP
multicast. IP multicast is a technology for efficiently sending the same
content to multiple destinations. It is commonly used for distributing
financial information and video streams, among other things.

Another TCP packet, this time going out to 52.157.234.37, which belongs
to… Microsoft. Seeing as I'm running all of this from Windows 10, it's not
too surprising. Win 10 does phone home quite a bit, for analytics but also
just “are we still online?” checks.

(Note: 8.8.8.8 is one of Google's DNS servers, and 93.184.216.34
belongs to “Edgecast”, which seems to be Verizon's CDN offering. I'm listening
to Spotify while writing this, so I'm satisfied by this explanation!)

Cleaner.

We'll also only print ICMP packets, and only the IP packet part, not
the whole Ethernet frame.

Now it's complaining about error types. And, that part makes sense too!

When a nom parser throws an error, it includes the input that it failed to parse

up until now, a byte slice. But we're between bytes now, so it makes sense
that to know the precise position of an error, we need not a byte slice,
but a tuple of a byte slice and a bit offset into it.

In our case, since we own our custom Error type, we definitely can
implement nom's ErrorConvert trait for it.

Now, what we're actually doing is converting a bit-level error to
a byte-level error. Our top-level error type is still Error<I>, where
I is &[u8], so we're going to need to convert that (&[u8], usize)
(where the usize is a bit offset) into a byte slice.

TL;DR we're going to have to cut somewhere.

// in `src/parse.rs`
use nom::{ErrorConvert, Slice};
use std::ops::RangeFrom;
impl<I> ErrorConvert<Error<I>> for Error<(I, usize)>
where
I: Slice<RangeFrom<usize>>,
{
fn convert(self) -> Error<I> {
// alright pay close attention.
// `self` (the input) is a bit-level error. since it's
// our custom error type, it can contain multiple errors,
// each with its own location. so we need to convert them all
// from bit-level to byte-level
let errors = self
.errors
// this moves every element of `self.errors` into the
// iterator, whereas `iter()` would have given us references.
.into_iter()
// this converts bit-level positions to byte-level positions
// (ie. plain old slices). If we're not on a byte boundary,
// we take the closest byte boundary to the left.
.map(|((rest, offset), err)| (rest.slice(offset / 8..), err))
// this gives us a Vec again
.collect();
Error { errors }
}
}

The way paste works is: it comes with two macros, and within either one,
you can use a special notation to do “token pasting” - in this case, we want
to paste u with $width (which happens to be 4 in this invocation).

DSCP is 0 - Differentiated Services Code Point is used for real-time
data streaming, not exactly necessary for ICMP.

ECN is 0 - no Explicit Congestion Notification.

length is 60, which is enough for our entire header (20 bytes) and 40 bytes
of data.

identification is used to regroup IP packets that belong to
the same datagram. For outgoing packets (to Google), we see a seemingly
random number, the number our OS network stack picked. For incoming packets,
we see only zero, presumably because it was stripped somewhere up the line?

flags is zero in both cases, because the first bit is reserved and must
be zero, the second is “don't fragment” - which would mean that this IP
datagram should not be split across several packets, and the third is “more
fragments” which would mean that this packet belongs to a series of packets
all belonging to the same datagram.

fragment offset is also only used for, well, fragmented datagrams (but
ours fit in just one packet, so, being the only packet, it's zero).

checksum we'll hear more about in the future, but for now we'll observe
that it's also zeroed - this is probably also an effect of raw network
traffic capture.

The ttl field is set to 128 for outgoing packets - hey, that's the TTL
we specified in our own ping tool from earlier! It comes back as 54, which
means we've done a total of 128 - 54 = 74 hops!

Finally, the source and destination IP addresses are still correct, which
means we probably got the whole parsing right. Without a single bit shift!

The IP header contains many values that are not byte-aligned - 3-bit
integers, 13-bit integers, etc. It contains a checksum that lets us check the
integrity of the whole packet, information about fragmented datagrams (sent
as series of packets), information about the number of hops the packet will
still be routed for, and more.

We've used the nom crate for bit parsing, the custom_debug_derive crate
for human-friendlier Debug implementations, the paste crate for additional
macro powers, and the ux crate for integers with non-multiple-of-8 widths.