Tutorial: Geolocation and WHOIS behind it

Contents

Description

This tutorial details the different features of T2 concerning geolocation and the determination of the organization behind an IP address. There are two options:

basicFlow, T2 geolocation and organization

geoip, open source geolocation geoIP, maxmind DB

Note that the geoip DB is considerable slower than basicFlow. Indeed, geoip produces currently more information, about city etc, which basicFlow will do in the next release of 0.8.4 and still be faster.

Preparation

In order to do so, we need to prepare T2. If you did not complete the tutorials before, just follow the procedure described below.

First, restore T2 into a pristine state by removing all unnecessary or older plugins from the plugin folder ~/.tranalyzer/plugins and compile the following plugins.

If you did not create a separate data and results directory yet, please do it now in another bash window, that facilitates your workflow:

$ mkdir ~/data
$ mkdir ~/results
$ cd data

The anonymized sample PCAP being used, can be downloaded here: faf-exercise.pcap Please extract it under your data folder. Now you are all set for T2 IP label experiments.

basicFlow subnet and IP labeling

T2 provides its own geolabeling and IP identification service, so no need anymore to lookup a maxmind DB or whois every IP address. The files necessary are always updated with each version of T2. The bzip2 subnet files for IPv4/6 are extracted by the autogen.sh script or by t2build using the programs under utils. We will look at it below.

Open basicFlow.h and look for the user defined switches concerning subnets as shown below:

$ vi basicFlow.h

BFO_SUBNET_TESTactivates the subnet labeling. It is switched on by default. If GRE, L2TP or TEREDO output switches, not shown here, are activated, then separately for these addresses the subnet labeling can be activated. We leave them off because the pcaps in this tutorial do not contain any of these encapsulations.

To be close to the default geoip plugin output we switch on the Autonomous Systems Numbers (ASN) and the longitude, latitude output as indicated below. The HEX option we leave off, it toggles between a human readable whois output, or a hex coded one. The latter is a powerful selection mechanism when searching large flow files.

The SUBRNG constant defines the search mode, either CIDR or ranges. The range mode has the advantage that any range can be defined by one single line whereas the CIDR notation would need many lines in the subnet file. We leave it at the default CIDR.

The WHOLEN constant defines the length of the WHOIS column in the basicFlow output including the “\0”.

SUBVERS defines the subnet version. Different versions are NOT compatible. t2build will warn you if there is a discrepancy. So leave it at the default value.

Save all open files and rebuild basicFlow, basicStats and connStat, because basicStats and connStat depend on the subnetHL4.c routines if BFO_SUBNET_TEST is activated. You may also rebuild all plugins build so far, it is shorter to type. And run t2 on the pcap. Instead of editing all the files you can also use the t2conf command:

A 666 in the longitude, latitude column means that there is no location defined, also indicated by the radius -1. If you look in the subnets4.txt file you can confirm the IPv4 labeling. We will look at these files in detail below: Internal WHOIS: subnet your own

TOR address labeling

By default TOR addresses are integrated in the subnet file by the subconvscript under basicFlow/utils when t2build or autogen.sh are invoked. You can switch it off, just edit the autogen.sh file and remove the -t option. Below a flow file is shown where TOR addresses are present, I currently do not have an anonymized pcap for you to play with. I’m on it.

Note that the end report indicates that TOR addresses are present. In the flow file TOR addresses will be labeled by a TOR,, or just select all TOR traffic with the TORADD bit in flowStat as shown below.

Note the geoIP DB: GeoLiteCity.dat.gz and GeoLiteCityv6.dat.gaz as well as the maxmind2 DB: GeoLite2-City.mmdb.gz. If you move into the scripts folder you see two scripts:

genkml.sh (map coordinates to google earth)

updatedb.sh (update DB)

The first maps a flow file to a KML google earth file to produce an earth view with the location of the various IPs. The second updates the DBs. Please refer to the documentation and the doc folder for detailed information.

Now move to the src directory and look into the .h file

$ cd src
$ ls
geoip.c geoip.h Makefile.am
$ vi geoip.h

Important is the selection of the type of DB. Since the 0.8.4 default is the maxmind DB. As you can see the classification of srcIP or dstIP can be separately enabled. Any output of country, city, language etc can also be enabled. For this tutorial we leave everything in default configuration as shown below.

Hex code labeling

As mentioned above t2 supports hex code labeling, which is a powerful flow selection mechanism, as integer AND operations are much faster than strings compares. Open basicFlow.h and set BFO_SUBNET_HEX 1, rebuild all and rerun t2, as indicated below

Internal WHOIS subnet your own

Which admin was not asking himself WHO, WHERE and WHY the fuck is somebody doing what he is doing, or how to find an in-house IP 10.23.4.5? Yeah, I did lot’s and got weary to lookup Excel sheets, logs or if I was lucky, DBs. Now you try to do that on 1000 addresses and hand over a report in no time.

As the private IPv4/6 address space is hopefully only listed inside your organization we need to build our own subnet file. Building one is fairly easy if IP to location and organization is available as a tab or csv file. So that you can expand the current subnet files or rewrite them, T2 is shipped with the .txt version and including scripts to convert them to the T2 compatible binary version. That is the reason, why the initial build of basicFlow takes a bit longer.

Let’s look now at the basicFlow directory after the plugin is compiled. The HL.txt files are intermittent files to the binary format HL.bin. The original is the decompressed subnets4/6.txt file, which contains all information.

You can now write your own subnet file or modify the original one, so make a copy of the subnets4.txt to have an easy way to restore the default. Let’s define the 192.168. network a bit more precise by adding two more lines describing the Knoedelrutschen company with one /24 and one /28 network:

Because autogen.sh decompresses the subnets4.txt.bz2 and thus overwrites the subnet file we need first to bzip2 your subnets4.txt and the build the basicFlow with the -f option. That is for beginners the easiest way to reconstruct the binary and ship it to the .tranalyzer/plugins folder. Then rerun t2 with the pcap.

As we are using the CIDR mode, lets now test the range mode. So open utils.h and setSUBRNG 1` or use the t2conf command below.

$ t2conf basicFlow -D SUBRNG=1
$

Now t2 selects the third column in the subnet file. Add a new /28 network as listed below. If you have a dash in the CIDR column and CIDR is configured, the entry is ignored, as the range is definetely not CIDR. You can have any values in the CIDR or range column, as non CIDR ranges would consist of several rows of CIDR. Here we have clearly a non CIDR network and we are in the RANGE mode anyway. We have now SW and HW engineers separated.