Mapping IP addresses to country codes

Extreme Optimization can lead to remarkable improvements in both speed and size.
In this article, we will see an example of how we can modify a standard
algorithm using our knowledge of the problem at hand to achieve significant
performance gains.

The Problem

We build on an article written by R. Reyes called
Optimized IP to ISO3166 Country Code Mapping in C#. The
objective is to perform fast lookup of the country code corresponding to an IP
address in a table of over 52,000 entries while keeping memory usage to a
minimum. We are only interested in the current IPv4 addresses, consisting of a
sequence of 4 byte values in the range 0-255.

The existing solution

Mr. Reyes introduces the PATRICIA trie structure (pronounced ‘try’ according to
Donald Knuth) to efficiently perform the lookups. We will not repeat the
details of the implementation here. They can be found in the original article.

The Extreme Optimization

Extreme Optimization™ begins with a thorough understanding of the problem we are
solving. We combine this understanding with in-depth knowledge of existing
algorithms

Storing IP addresses more efficiently

The original algorithm uses a BitVector class to store the keys. Keys are
represented internally as a Byte array. Only the distinctive portion of the key
is stored, so many vectors will actually be only one byte long. However, a lot
of shifting of bits is involved in isolating the relevant portion of the key
and to check for matches.

A first observation is that we do not need a general BitVector class to store
our data. Our keys are sequences of up to 32 bits. They fit perfectly into a 32
bit unsigned integer. This will eliminate the overhead of managing the array
object.

Furthermore, we can eliminate the excessive shifting by storing the entire key
in every node and using the bitwise Xor operator. The boolean Xor (exclusive
OR) operator returns true if its operands are different and false if they are
the same. The bitwise version compares identically positioned bits in two
numbers. It assigns 0 to the corresponding bit in the result when the bits are
the same, and 1 when they are different.

To find the shared portion of two keys, we Xor the two values. The key length is
the number of leading zeros in the result. We can find this number by comparing
the result to powers of two. For a key length of 32 bits, if the result is
smaller than 2^n, we know that the first 32-n bits of the two
keys are the same.

Here is an example. We want to find the common key length of two IP addresses,
IP1= 207.176.130.34 and IP2= 207.176.151.11.

IP Address

Binary value

Numerical value

IP1 (207.176.130.34)

11001111.10110000.10000010.00100010

411,303,936

IP2 (207.176.151.11)

11001111.10110000.10010111.00001011

411,336,704

IP1 xor IP2

00000000.00000000.00010101.00101001

5,417

The result, 5417, is smaller than 8192 = 2^13. Therefore, the two IP’s have the
first 32 - 13 = 19 bits in common.

When comparing a key value to the key values of the children of a node, we can
start counting up from the key length of the parent node.

Optimizing Patricia tries for binary keys

Secondly, we are dealing with binary data. The acronym PATRICIA is short for
Practical Algorithm To Retrieve Information Coded In Alphanumeric. This
name suggests that Patricia tries are best suited for alphanumerical data,
where every node can have many children. The keys in our problem are sequences
of 0’s and 1’s. This means that any node can have at most two children. One
child will have a prefix starting with a 0. The other will have a prefix
starting with a 1.

This means we can eliminate the ArrayList we have been using to store the
references to the child nodes. We replace it with two references, one each for
the child whose key prefix starts with a 0 or a 1, respectively. If either
child is not present, the corresponding reference is null.

An ArrayList is a relatively expensive structure to store very short lists. In
addition to the usual overhead associated with reference types, it also
allocates space for unused entries. Eliminating both the overhead and the
unused space should significantly reduce our memory requirements.

When looking up values in the trie, we can simply use the value of the first bit
after the current key length to select the next child node. We can then quickly
check that our value matches the full length of the child node.

Further Optimizations

A trie can work with one root element. The first few steps of the lookup will be
spent working through the first byte or 8 bits of the IP address. If we create
a separate root for each possible value of the first bits in the key, we
eliminate 6 or 7 unnecessary recursive steps. We can do even better by
pre-selecting on more leading bits. We will call this number of bits the index
length.

We have to be somewhat careful, however. Every root node spans a certain range
of IP addresses. Some entries in our input tables will span a range that covers
more than one root node. We will have to add these entries to all the
appropriate root nodes, adjusting the range to fit that root nodes' range. This
will come at a small penalty in the memory footprint, but the speed gain is
significant.

If we use an index length of at least 1, we can use Int32 values
for the keys instead of the non CLS-compliant UInt32 without any
complications resulting from using signed numbers. A key passed to a node will
always have at least the leading bit in common with the node's key. The leading
bit of the result of the XOR operations will therefore always be zero,
indicating a positive value.

Looking at the code

The new project contains three classes.

BinaryTrie represents a trie with 32 bit binary values as its
keys.

BinaryTrieNode represents a node in BinaryTrie.

IPCountryTable represents a lookup table. It inherits from BinaryTrie.

BinaryTrie has two constructors. The first constructor takes one
argument, indexLength, which is the number of bits to use in
pre-selection. It creates the _roots array which will contain the
root nodes. The default constructor uses an index length of 1.

The Add method adds a key to the trie and returns the BinaryTrieNode
corresponding to the key. The AddInternal method does the actual
work of adding a key to the trie. If no root node exists for the given index,
it creates one. If the root node exists, the request is passed on to the root
node.

BinaryTrieNode represents an entry in the trie. It contains the
following fields:

_key is a UInt32 value containing the key of the current node.

_keyLength is the number of bits in the current key prefix.

_userData points to any user-supplied object associated with the
key. In this instance, this will be the Country Code associated with the node.

_zero contains a reference to the child node whose prefix starts
with a 0.

_one contains a reference to the child node whose prefix starts
with a 1.

BinaryTrieNode has one internal constructor that is used by BinaryTrie
to create a root node. It also contains two private constructors. One creates a
shallow copy of a node. The other creates a node using the data that was passed
to the Add method. It calculates the key length based on the
number of addresses available to the network. Note that APNIC provides this
information directly.

Most of the hard work of building the trie is done in the internal AddInternal()
method. We first calculate the common key length of the current node and the
new entry. If the match is complete, i.e. the common key length is at least the
key length of the current node, the new entry is passed on to a child node.
Which child we work with depends on the first bit after the current key.

If the match is incomplete, the current node is replaced with a new node with
the common portion of the key as its key. It has two children: (a copy of) the
current node, and a node for the new entry. In both cases, the bit at position
_keyLength+1 determines which of the two child nodes gets to go in
the _zero and _one slots, respectively.

The lookup is also passed on from IPCountryTable to the root IPCountryTableEntry.
If the lookup key matches one of the child node keys, the request is passed on
to that child node. Otherwise, we have found the best match.

Benchmark results

Using the same data set and test code as the original article, we found the
following results. The Optimized version #1 refers to the original version of
the code as posted here on May 13, 2003. The Optimized version #2 refers to the
latest code, which includes the further optimizations.

Original version

Optimized version #1

Optimized version #2

Result

Result

Change

Result

Change

Memory usage

9,409 KB

2,578 KB

- 72.6%

2,963 KB

- 68.5%

Load time

7.361s

2.864s

- 61.1%

1.858s

- 74.8%

25,000 lookups

3.475s

0.2567s

- 92.6%

0.1997s

- 94.3%

Lookups/second

7,194

97,048

+ 1,249%

125,190

+ 1,640%

Here is a graphical representation of the results:

Possible improvements

Is there room for even more optimization? Possibly.

A single country code will have many nodes referencing it. We can eliminate a
node if, for each of the IP addresses in its range, the best matching node in
the resulting trie has the same country code as the node we eliminated. This
will happen if the node's parent node has the same country code.

After the trie has been built, we can go through all the nodes iteratively and
eliminate unnecessary nodes. If we can eliminate a significant number of nodes,
the memory usage will decrease and the lookup speed will increase accordingly.

As it turns out, the databases already reflect this optimization. Only a
handful of nodes can be trimmed from the trie. There is no significant
reduction in memory usage or speed, at most 1 or 2%.

Conclusion

The two main objectives were minimizing memory usage and maximizing speed. We
have further optimized an already optimized solution to produce dramatic
improvements in both areas. We originally reduced memory usage by 3/4, and
increased the lookup speed by a factor of more than 13.

With the latest version, we give up a little bit of memory for an additional 30%
increase in performance. We also untangled the binary trie code from the IP
lookup code. It can now be used independently.

References

Update

APNIC (the Asia Pacific Network Information Centre), one of four organizations
that publishes the data used in this article, recently changed the format of
its statistics file to conform to the format used by the other three
organizations. The size field in the database now contains the total
size of the segment rather than the number of bits. To use files in this new
format, you will need to make a change in the source code. In the call to LoadStatisticsFile
which loads the APNIC data file, set the calculateKeyLength parameter
to true.

Update #2

As a further consequence of these changes in database format, the original
assumption that the length of a range of IP addresses in a database record is
always a power of two is no longer valid. We have adjusted the code in LoadStatisticsFile
to reflect this change.