Thursday, October 22, 2015

Probes from the IPv6 nebula

Introduction

I'm going to break down some of the recent work I'm doing in the area of IPv6 and unsolicited scans that originate from the internet. I've identified a number of patterns and have followed the data to reach a number of conclusions. Although this is a work in progress, I'm starting to unravel what appears to be a sophistocated network of IPv6 scanner and harvester hosts that work together to probe internet-connected devices by connecting to their temporary, randomy-assigned IPv6 addresses which are often regarded as private or hidden.

The Landscape

The IPv6 address space is vast. That fact has
been pounded into our heads for years now, and is pointed out in almost every
IPv6 document in existence. A home with a /64 prefix has pentillions of addresses available
for it to use at will. However only less than 0.000000000000002% of those
addresses ever get used. It’s a vast nebula of address space.

It’s so vast in fact, that performing a
traditional network scan, address by address, would take many lifetimes to complete. That option is basically off the table for any sort of network scanning/discovery efforts

Some assume that because the address space
can’t be scanned, that the 10-20 IP addresses that are used within your /64 are
neatly tucked away, safe and secure from outside predators.

But that couldn’t be further from reality…

Observations of a New Trend

I’ve observed a steady increase in rather
intelligent IPv6 probes/scans targeting individual random / privacy extension (PE) IPv6 addresses within my home
network. There’s nothing special about my particular network – I would assume
these scans target many home and corporate networks. The method of these scans is
particularly clever though. In every case, the PE address being scanned is behind a
randomized IPv6 address – a privacy address – allocated and used by a device
for a short period of time, usually less than two days. These addresses should
be private and hidden from attackers… but it seems they aren’t.

I was first alerted to the existence of these
scans as I was parsing through my firewall logs in Splunk. My firewall logs
go back several years, and I can account for every inbound and outbound packet
going in and out of the home network. This lends itself well to digital
forensics and packet sleuthing. And it is infinitely interesting to me to analyze the latest techniques used by network attackers.Each week I see new forms of attacks and
clever new ways of scanning. And any time there is a new vulnerability in the
wild, I can see the spike in traffic to the affected port(s). Oftentimes I can
see a significant spike before the disclosure
is even made public, which is interesting.

In the case of the IPv6 scans against the PE addresses within my network, I can see the scans
targeting specific hosts. Clearly the entity (probably a script) doing the
scans has been provided with the exact 128-bit randomized IPv6 address of each
host that it scans (it's certainly not guessing). The question is, who or what provides this information to the scanner? A
quick search for any outbound packets to the actual scanner host from my network comes up empty
– the scanner address is never contacted by my hosts directly – it must be getting the
private addresses indirectly, from another host that I am connecting to.

Dissecting the Attacker's Methods

It’s reasonable to assume that a privacy-extension (PE) IPv6
address isn’t known until it comes into existence. Some time after it is
used for the first time one of the numerous hosts that it connects to is being a little evil and passing the address to the scanner.

This is the most reasonable explanation I
assume, although I can think of others as well (malware on a PC passing its
IPv6 addresses to the scanner via ipv4 for example) but this is less likely, so
I’ve been focusing on the former hypothesis.

So if the scanner is some how provided with
each private IPv6 address by one of the IPv6 sites that is accessed by the good
host then we can start to narrow down the culprit or culprits with the use of
Splunk, and that vast repository of packet logs that I mentioned.

Testing the Hypothesis

The first thing that came to mind Is to build
a set of all hosts which I connected to using each private IPv6 address that
was scanned, and then look for intersections between those sets. In fact I
think this approach is promising, although it makes a number of assumptions if
optimal results are to be expected. First and foremost it assumes that only one
remote host triggers the scanning, not multiple different remote hosts.

I think
in order to proceed with this hypothesis we have to accept this assumption. To
minimize its affect on the result we can work with smallish sample sets – say
30 days of data at a time.

So over the last 30 days, I’ve seen the
following scans against privacy extension addresses on my network (all were of
course blocked at the firewall):

Which does a lot of things… It first builds a
list of my internal IPv6 addresses that have been targeted by probes. It then
feeds that into the outer query as a search filter but first it reverses the
DST/SRC fields so that it can find all outbound connections made by those
private addresses.

Then, as a sort of union of sets, it performs
a distinct count of destinations visited by those addresses. The distinct count
represents the number of items in common between all of the sets.

In essence the query builds those sets of
externally contacted sites that I mentioned earlier and finds the union of
those sets indirectly by showing the top matches first – in other words, sort
the list of results in descending order, by distinct count and you can uncover
the hosts that were contacted by each privacy address that was probed.

Stated even more simply, it identifies the
most likely suspects that may have triggered the scans.

It isn’t a perfect query yet. I’m working on
that. It has a high probability of false-positives, especially for sites which
are always contacted, by each address.

Observations

I performed a few honeypot probes against
those addresses, each using a unique IPv6 source address, to see if I could
trigger a response scan. So far I’ve tried about 50 and none resulted in a
response scan. I have a hunch that I need to do more than a simple TCP connect
to the host in order for it to harvest my IP and pass it to the scanner.

This is where I’m at now. Brainstorming a new
hypothesis that I can test.

Conclusions

Scans are not launched by the same IP addresess that harvest the IPv6 addresses so the scanner network consists of numerous nodes.

Scans are triggered once and not repeated for days to weeks - the scanner or its controller has a persistent storage database of what has been scanned.

Not every PE address gets scanned. Perhaps because a very specific malware site or script has to run in the browser to contact the harvester host.

Triggering a scan doesn't seem to be as simple as just opening an HTTP connection to the harvester - This conclusion is supported only by circumstantial evidence - the 50 probes that I sent to the suspected harvesters didn't result in any response scans.

Many of the suspected harvesters are owned by Google / Facebook, so perhaps this is actually some sort of pet project to collect internet statistics of connecting devices, or a way of detecting bots.

In the case of scans originating from Shodan scanner hosts, at least one set of harvesters resides on the public torrent networks (downoading a CentOS ISO from a torrent results in reverse scans from Shodan).

Also Noteworthy

These scans are probably yielding good results by the attackers, for many reasons, including:

Devices with IPv6 addresses often don't have firewalls!!! Android phones for example.