It.. seems to work? Although it's hard to tell, because I haven't printed the
contents of each packet. And I haven't printed the contents of each packet,
because I don't want to post raw traffic from and to my own computer on the
internet, even if I can't make sense of it right now - somebody might!

Sniffing the dumb way

I can think of a way to make sure it works.

Remember in Part 2 when we discovered that
Windows's ping.exe sent lowercase letters of the alphabet?

If the payload for ICMP echo packets isn't compressed (and this seems unlikely),
then we should be able to find it in the packets we're sniffing. Let's give it a go.

Wait, frick, no, &[u8] doesn't have .find. It has contains but that's
just for a single element (a single u8) - which is not helpful here.

How do we find if a slice contains another slice?

Well, there are certainly smart ways to do it, but for the time being we
can simply use windows iterators.

haystack.windows(N) gives us an iterator over iterators over all subslices
of length N of haystack. If we call it with N = needle.len(), we can test
all those subslices for equality with needle, and stop whenever we find one
that matches!

So we know that somewhere in there, there's probably some ICMP packets.
But what it is exactly we're getting? Ethernet frames? IP packets? It could
be either, depending on the library
rawsock is using under the hood.

If it is an Ethernet frame, then it should have the following structure:

And the EtherType for IPv4 is 0x0800. So if we read a 16-bit
integer at position 12, we should be good?

But how do we get a u16 from a &[u8]? Well, by now, we know an unsafe
way to do it:

Yeah! Seems okay. We don't know yet if they're actually ICMP packets, but at least,
it looks like we're getting Ethernet frames that contain IPv4 packets. Either that, or
many coincidences are happening in a row (which is always a possibility, because computers).

Two MACs in a rowboat

How about we check that the MAC addresses look reasonable? We know MAC
addresses look something like 12:34:56:78:9A:BC, so, let's make a quick struct.

This is not the last we've seen of Ethernet, so, let's make an ethernet module.

We'll use x64dbg to find out. We can just open up the
wat\target\release directory in explorer and drag our .exe into the
x64dbg window.

To find our function, we can use the “Symbols” tab:

Double-clicking on the symbol brings us to its disassembly:

Yeah that's uhh pretty short.

movzx eax, word ptr ds:[rcx]
rol ax, 8
ret

Let's set a breakpoint and start debugging:

Here we are at the very start of read_u16:

There's only one argument passed to read_u16, and it's a slice.
It appears the address of the slice's contents are passed through
the RCX register, which seems correct on Microsoft x64.

movzx eax, word ptr ds:[rcx]

We're reading a word (two bytes, 16 bits) from memory, starting from
the address contained in the RCX register. We're also zero-extending
(that's the zx in movzx), so that the rest of EAX contains zeroes.

Which, after movzx, it does!

All that's left is to rotate the AX register (16-bit wide) left 8 bits and…

rol ax, 8

wait.. just rotate left? Does it wrap around?

Yeah! Apparently it does.

We now have the u16 we want in RAX, and, what a coincidence, that's also
the register used to return integers in the Microsoft x64 calling convention.

We'll be using version 5, which I hear is significantly better than the
previous releases.

In particular, nom 5 is based on impl Fn rather than macros, making the
code easier to read and write, and even giving a performance boost!

$ cargo add nom
Adding nom v5.0.1 to dependencies

nom is a parser combinators library, which means we'll get to.. combine..
parsers.

A parser is just a function that takes an input, and returns a result.

If we look for be_u16 in nom's documentation we'll find two variants:
one in nom::number::complete and one in nom::number::streaming. We're
only interested in the former, as we have complete Ethernet frames available.

We're also going to adjust our process_packet() function a bit - by
matching the error more precisely.

As long as we never use the cut combinator, and never use any streaming
parsers, we should only ever get nom::Err::Error, never nom::Err::Failure
(from cut) or nom::Err::Incomplete (from streaming parsers).

That already looks a lot better. I wonder if it's useful to
print the input for every line though. What if we give a slightly
longer truncated Ethernet frame, so that it fails in, say, the middle
of reading the EtherType?

What happens if we start pinging google.com instead? A the moment, it
resolves to [2a00:1450:4007:817::200e] for me and that.. doesn't look like
IPv4.

$ cargo run --quiet
Listening for packets...
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src\libcore\option.rs:378:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.