Licensing & Copyright

The copyrighted material on this page is made available to anyone wishing to use, modify, copy, or redistribute it subject to the terms and conditions of the GNU General Public License. The scripts published on this page are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY expressed or implied, including the implied warranties of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. For further information, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

This has been on my to-do list for a while now; I just had to find the time to write it. Having witnessed the adoption of the scripts documented on this page over the past several years, across various open source projects, I felt the time has finally arrived to unify them into a single script solution that runs on most modern day Linux distributions. That script is now documented below.

Simply change to the directory where you would like the GeoIP.acl file to be created and then invoke this script. It will source all the necessary GeoIP data directly from the MaxMind website and create a single ACL file containing country specific ACL entries for both the IPv4 and IPv6 address space.

BUG FIX ANNOUNCEMENT

If you have accessed this page before the 1st of January 2010, and thus are using these scripts as they were published on this page before this date, changes have since been made to them to address a couple of discovered issues.

The first is a change to the fastest recursive script. The change is nothing major but effectively reduces execution time slightly by splitting IP ranges when generating the GeoIP.acl file rather than splitting IP ranges when creating the CBE (Country,Begin,End) CSV file. The change is purely in relation to where the range splitting takes place, resulting in grep pattern matching against fewer lines, thus marginally reducing the execution time of the script.

The second fix has been made to all scripts and was discovered when noticing that the recursive awk function could not correctly split extremely large IP ranges, with an order of magnitude exceeding about 231. For example, giving the script the range 0 to 2147483647 would result in it printing 0.0.0.0/0 rather than 0.0.0.0/1. I located this issue to a rounding anomaly with the printf function within awk and the solution is to simply ensure that all occurrences of the logarithmic division calculation in each script are truncated to a whole number using the int function. This bug has probably not caused people too much grief because the ranges supplied within the MaxMind GeoIP CSV file are nowhere near a magnitude of 231 (the largest IP range listed as of writing is of magnitude 226, representing the network 28.0.0.0/6 in the United States). Nevertheless, this was a bug and has now been fixed in the scripts published below.

Overview

I was recently asked by my employer to bring our DNS in-house from UltraDNS where we originally hosted all our domain names. Due to various requirements within the company, they were utilising UltraDNS's geo-targetting feature to enable internet users in different areas of the world to resolve hosts on their domains to varying IP addresses, depending on the geographical (country) location of these users.

Having already been exposed to BIND's views feature some years ago, I googled on how it would be possible to make BIND geo-aware. There is not much documentation about this online but I found one such solution which involved patching the BIND source code. All well and good but, in all honesty, this seemed like using a sledge hammer to crack a nut. Besides, our company does not like patching (hacking) source code unless there is a real requirement to do so as it normally entails maintenance by having to refit changes into revisions of the BIND source code as and when the ISC release newer versions of BIND.

I analysed the patching BIND method further and the solution still uses two fundamental things to achieve a geo-aware DNS setup; BIND's views feature and the freely downloadable GeoIP data available from MaxMind. It was then I realised that to make BIND geo-aware, all that is required is to reformat the data in the MaxMind GeoIP CSV file into something which BIND likes, and will accept in its configuration file. The easiest and most manageable way to achieve this is by using the BIND Access Control List clause, but here lies the problem. The MaxMind GeoIP CSV file operates in IP ranges whereas BIND ACLs operate on IP networks, in classic net/mask notation. So, basically, I had to formulate a method to transform MaxMind IP ranges into BIND ACLs. This method is attainable by using the Linux BASH script(s) shown below.

The result is the automatic creation of a single and maintainable GeoIP.acl include file that can be instantly added into any already running BIND DNS server, without the requirement for source code patching and recompilation, producing a geo-aware production-ready DNS server in a matter of minutes.

Linux BASH script(s) to fetch, unzip, reformat and generate the GeoIP.acl include file for BIND

There are two different BASH scripts documented below which will generate the GeoIP.acl include file for BIND. The second is an improvement over the first but I've left it documented anyway as it was my original implementation. The first uses an iterative BASH loop (slower) whereas the second uses a recursive AWK function (much faster). Both achieve exactly the same thing by employing different programming constructs. For speed and efficiency, I recommend using the second recursive script.

NOTE: By default, some distributions of Linux use a non-GNU version of AWK which lacks the bitwise AND function. In this instance, GAWK must be installed (the GNU version of AWK) for the scripts below to function correctly (thanks to Ruben for pointing this out).

Each script will attempt to download the latest MaxMind GeoIP CSV file (which is actually a ZIP file). Once downloaded, it will use this file and reprocess it each time it is executed. Removing the ZIP file and then rerunning the script will force it to perform another fetch from MaxMind. Once the ZIP file has been fetched, each script will unzip it, reformat the enclosed GeoIP CSV file (taking several passes to do this if the iterative version is used) and then generate the file GeoIP.acl which is the include file that can be added into BIND's configuration to make it geo-aware.

How do these scripts work?

I wont go into the technicalities of how these scripts work (this is left as an exercise for the reader) but the first iterative script creates a new CSV file containing 3 fields (Country,Begin,End) and then repeatedly searches for and splits these IP ranges on network boundaries so we are left with a CSV file that has exactly the same coverage of IPs as before but has been processed so that the IP ranges reside on values that allow for each range to be expressed concisely in net/mask notation. The final part of the script then uses this CSV file to generate the GeoIP.acl include file.

The second recursive script achieves the same result faster by creating a new CSV file as before, containing 3 fields (Country,Begin,End), and then performs recursive range splitting "on the fly" within awk itself, for each country, to generate the GeoIP.acl include file.

Once either of these scripts have finished running, you can slot the newly created GeoIP.acl file straight into your existing BIND configuration file, by adding the line:

include "/path/to/GeoIP.acl";

to named.conf. It will then be possible to create custom geo-views within BIND, like this:

If you decide to cron these scripts within your BIND name server(s), do remember to reload named (normally achieved by running the command service named reload on RedHat/CentOS) so the new ACL definitions within the GeoIP.acl file are loaded into BIND's memory.

Summary

I hope this article proves useful for others (that's why I have documented it). Interestingly, my original implementation of this was by using a PHP script coupled with MySQL, loading the MaxMind CSV file into a database table, and then running SELECT, UPDATE and INSERT queries to split up the IP ranges. Whilst this worked, it depended on having PHP and MySQL installed and configured. The above scripts achieve exactly the same thing but only using BASH commands and utilities, such as awk, grep and sort, which in my view, is far cleaner!

Incidently, it is actually possible to produce the GeoIP.acl file without using grep or any intermediate CSV file (shown below). These scripts may be used instead but with markedly longer execution times and, because of this, an echo statement, outputting the current country code to standard error, has been introduced into their main loops to give an indication of progress while the scripts are running.

We can marginally reduce the execution time of the above script by adjusting its awk line to match the current country using a regular expression, as opposed to setting the awk variable c and then checking if c == $10, as follows:

Do note, however, that I personally prefer the previous grep method as it is much faster than these two scripts because it initially reformats the data within the CSV file into something that allows for fast regex pattern matching on the country field (by moving this field to the beginning of each line) allowing awk to take care of the more complicated task of IP range splitting that operates on the begin (2nd) and end (3rd) integer IP fields.

Over the last decade, IPv6 has become more and more mainstream. More recently, as of the 3rd of February 2011, IANA allocated the last remaining 5 IPv4 /8 blocks to each RIR, thus completely exhausting the IANA pool, meaning there is now no further free IPv4 address space available for allocation. Due to this, I predict demand for adoption of IPv6 is now likely to rise over the coming years.

As much as I have not yet seen any requirement for geo-aware DNS serving on the IPv6 network, I would imagine this will gradually become needed as services begin to migrate away from IPv4 to IPv6. BIND already handles IPv6 addresses within its ACLs so I have published further scripts below that allow the creation of a GeoIPv6.acl include file containing IPv6 net/mask entries, using the freely downloadable GeoIPv6 CSV file available from MaxMind.

It was a challenge to come up with a working solution using the same principles as in the above scripts, but across a much larger address space. This is because IPv6 uses a 128 bit address space, compared to IPv4 being only 32 bits. The scripts above get away with using simple BASH utilities such as awk for doing the necessary IP range splitting with 32 bits but, as I found out, awk is unable to handle numbers which are up in the realms of 64 bits and beyond. So I've had to pull various different Linux utilities into play here to achieve this.

In order to handle large numbers up to and beyond 64 bits in magnitude, one has to look at other programming languages and the libraries they offer. After evaluating today's available languages like Python (which handles large numbers out the box) and PHP (which can only handle large numbers with an additional library installed), I decided to go with Perl. Perl has, on most standard installs, the bignum library that is available and ready to go. This library is transparent and as soon as it is included into a script, all number processing will automatically use it. It has all the necessary operations like bitwise AND that the above scripts make use of. However, when writing the Perl script below, I ran into an inconsistency with the log function whilst using the bignum library and, for anything above 64 bits, bignum also exhibits major rounding anomalies. To avoid this curveball, I decided to bring the common Linux arbitrary precision calculator bc into play to take over both of these roles. Together, Perl and bc offer the accuracy and speed required to split decimal IP ranges with magnitudes of 64 bits and beyond.

So, here are the scripts. The first script is, as before, a standard BASH script (called GeoIPv6.sh). It is much the same as before but rather than piping the filtered grep lines to awk, it pipes them to a newly created Perl script instead. It also contains some further adjustments at the top to download the latest GeoIPv6 CSV file from MaxMind's servers, as well as an optional pipe of the Perl script output to sed to abbreviate IPv6 addresses to their "double-colon (::) notation" equivalent.

This Perl script effectively reads from standard input in precisely the same way as the original awk script does (expecting each line to be in the format of a CBE (Country,Begin,End) CSV file) but, unlike awk, can perform IP range splitting on 128 bit decimal numbers, printing IPv6 net/mask entries to standard output. Note the use of a dual pipe to the Linux arbitrary precision calculator bc to manage the logarithmic division calculation and also to accurately truncate values before they are passed to the printf function (done by a small for loop that places these entries into an array). Most importantly, note that we must increase the default scale of 20 within bc to at least 40 to be able to accurately cope with the logarithmic division calculation. Observe:

The reason we also choose to open a dual pipe to bc within Perl is to stop the forking of a separate bc process each time we need to perform a division calculation (forking a new process is costly in terms of CPU time). By opening up a dual pipe to a single persistent bc process, we can simply throw and retrieve each calculation into and out off it quickly. The IPC::Open2 Perl module is required to do dual pipes and this may need to be installed on your system.

Once these two scripts have been created, it will be possible to run ./GeoIPv6.sh to generate the GeoIPv6.acl include file for BIND. Note that the execution time here will be far greater than before, since we are using Perl with bignum support, and passing division calculations to a separate persistent bc process. As such, the BASH script has been modified to output the current country code being processed to standard error to indicate progress. Once the script has completed execution, the GeoIPv6.acl include file will have been created in the current working directory, which looks something like this:

Faster Python Implementation of the above Perl Script

I have recently been learning Python at my current place of employment. After much procrastination, below is my Python implemention of my original Perl script above. When used within the BASH script, execution time to generate the GeoIPv6.acl include file is reduced to about one quarter that of when the Perl script is used. This is most notably because it is a self-contained script that does not depend on making external calls to the common Linux arbitrary precision calculator bc (an external process) as it utilises the mag function from the additional Python library mpmath to determine the magnitude of the IPv6 ranges it is potentially having to split.

John 'Warthog9' Hawley, the chief administrator of www.kernel.org (a high-traffic site which implemented BIND GeoDNS on the 19th of September 2008 via patching), recently contacted me about this HOWTO with some interesting points concerning the implications of using this ACL method over BIND source code patching. I will briefly discuss this here, as it will affect which route you take when implementing GeoDNS within BIND.

In a nutshell, patching BIND for GeoDNS support results in a DNS server that can answer queries at an extremely rapid rate compared with this ACL method (I have confirmed this; it is quite easy to test; see below). This is because the MaxMind binary database is a binary search tree data structure, and so the worst case maximum number of lookups required to determine the country location of an IPv4 address will be 32 iterations (and most times, far less than this). Similarly, for their IPv6 binary database, this number changes to 128 iterations. As you can imagine, patching the MaxMind GeoIP C library directly into BIND to achieve GeoDNS will result in a server which is able to process, lookup and answer DNS queries with very few CPU cycles. As such, if your DNS servers are high-traffic servers, responding to many DNS requests per second, it would be advisable to go with the source code patching route.

Alternatively, if maintainability is of more importance to you, the ACL method described in this HOWTO is still a viable option, but with the consequence of a substantial performance hit. According to John (who has been chatting with Paul Vixie, the primary author and architect of BIND until release 8), the ACL feature was never designed with the intention to store and hold the number of ACL entries that the above scripts generate, for GeoDNS purposes. This I can believe, as the scripts above (for IPv4) produce an ACL definition file containing over 200,000 ACL entries, which BIND has to load and subsequently store in its memory once launched. I am not fully aware of the data structures used within BIND to store ACLs, but they will be far less efficient than the simple binary search tree that MaxMind offer with their binary GeoIP databases. It is for this reason that the ACL method described in this HOWTO will result in a far slower DNS server, depending on how many views you create and the ACLs assigned to them.

To give you an idea of just how much of a performance hit this ACL method induces, I have a small low-power server on my network running a CentaurHauls VIA Nehemiah CPU @ 1 GHz (2000 BogoMips) with a 192.168.0.0/16 IP address (see RFC 1918; all other hosts on my LAN are in this network so none of them would be a match in any of the above ACLs). When loading BIND with the GeoIP.acl include file, and creating a catch-all view that matches any client (not using any of the ACLs in the GeoIP.acl include file), the DNS response time tends to be about 2 ms. If, however, another view is created before this catch-all one in named.conf, and the clause:

is added to this view (forcing it to attempt a match across every single ACL definition inside the GeoIP.acl file), the response time sores to around 85 ms. In other words, the amount of work that we have now asked BIND to do, in order for it to verify if any of the ACLs are a match for a client with IP address in 192.168.0.0/16, has resulted in it slowing down by a factor of 40 (a rough guestimate figure only) which is a substantial performance hit that needs to be considered. For this reason, if using the ACL method described in this HOWTO, try and limit the number of views you create and the number of ACLs assigned to them as this will lower the amount of work BIND has to do when answering DNS queries made to it.

In short, you should determine if speed (source code patching) or maintainability (ACL include file) is of more importance to you and be fully aware of the pros and cons of each method of GeoDNS implementation within BIND. As a systems administrator, use your head to decide which method to go with. As www.kernel.org is a global site, ranked around 10,000 across all sites on the internet (according to Alexa), John has done the right thing and gone with the patching method when deploying BIND GeoDNS servers for Kernel.org.