For some years now at APNIC Labs we've been conducting a measurement exercise intended to measure the extent to which IPv6 is being deployed in the Internet. This is not a measurement of IPv6 traffic volumes, nor of IPv6 routes, nor of IPv6-capable servers. This is a measurement of the Ipv6 capabilities of devices connected to the Internet, and is intended to answer the question: what proportion of devices on the Internet are capable of supporting an IPv6 connection?

We've often been asked about our measurement methodology, and this article is intended to describe in some detail how we perform this measurement.

General Approach

Using Flash and JavaScript, clients’ web browsers are inducted into a measurement of their capabilities to use IPv6, based on the scripted fetch of a set of ‘invisible’ 1x1 pixel images. Each test is intended to isolate a particular capability of the client, in so far as if the client successfully fetches the object associated with the test then the client is considered to be capable in that aspect.

We have a set of five basic properties that are available in the test: IPv4-only, IPv6-only, Dual Stack, Dual Stack with unresponsive IPv6 and Dual Stack with unresponsive IPv4.

We are interested in the behaviour of the DNS transport as well as in the behaviour of the HTTP transport.

The URL of each of the images is constructed using labels that describe the object's transport properties in DNS resolution and the HTTP transport for the web fetch. For example, a URL of http://xxx.r6.td.labs.apnic.net/1x1.png describes a web object that is only accessible using IPv6, (‘r6’) and the domain name itself is served by authoritative name servers that can respond to DNS resolver queries made over both IPv4 and IPv6 (‘td’), while a URL of http://xxx.rd.t6.labs.apnic.net/1x1.png describes a dual stack web object (‘rd’) whose domain name is only served by authoritative name servers that are reachable only on IPv6 (‘t6’). The complete name structure of the various tests is provided in the following table:

Behaviour

Value

DNS

t

HTTP

r

IPv4-only

4

IPv6-only

6

Dual Stack

d

Dual Stack, unresponsive IPv6

x

Dual Stack, unresponsive IPv4

z

In order to ensure that each client is forced to perform both the DNS lookups and the web object fetches from the experiment’s servers, and not use locally cached values, we make use of dynamic name generation and wildcard DNS capabilities to generate a unique string as part of the object's name. This unique string is used in the DNS part of the URL, and is also used as an argument to the resource name part of the URL. Each client is served with a unique name value, and all the tests presented to the client share the same name value so that at the server end we can match the operations that were performed in the context of each test instance. As these domain name components map to a wildcard in the DNS zone, it does not increase the complexity or time taken to perform DNS resolution. The components of this unique string value include the time of day (seconds since 1 January 1970 00:00 UTC), a random number, and experiment version information.

The time taken by the client to fetch each URL is recorded by the client-side script.

The set of URLs concludes with a “result” URL. This URL is triggered either when all the other URLs have been loaded, or when a local timer expires. The fetch of this “result” URL includes, as arguments to the GET command, the results (and individual timer values) of the fetch operations of all the other URLs. The default value of this timer for result generation is 10 seconds.

As well as getting the client to perform self-timing of the experiment, we also direct all traffic associated with the experiment (the authoritative DNS name servers that will receive the DNS queries and the web servers that will receive the HTTP fetches) to a server that is logging all traffic. We perform logging at the DNS, HTTP and packet level. These logs provide server-side information on the nature of that clients capabilities such as the client-resolver relationship, apparent RTT in DNS and web fetch, IPv4 and IPv6 capability, MTU, and TCP connection failure rates.

Client-side Code

We use two forms of encoding of this method: Flash and JavaScript. They are used in different experiment contexts.

Flash permits embedding of the measurement in advertising channels using flash media for image ads. This channel delivers large volumes of unique clients who can be targeted by keyword, or economy, or exclude specific IP ranges.

There are a number of weaknesses of Flash, most notably being that Flash code is not loaded on some popular mobile platforms, including Apple’s mobile platforms. It’s also been observed that the Flash engine does not appear to perform consistent client-side timer measurements, probably due to a more complex internal object scheduler within the Flash engine. We have also observed that the Flash engine does not preserve fetch order, so that the order of objects to fetch generated by the Flash action script is not necessarily the order in which the Flash engine will perform the fetches. The most common permutation is that the Flash engine reverses the object fetch order as it retrieves the set of objects.

JavaScript permits embedding of the measurement in specific host websites. There are two variants of this script. One is where the JavaScript is directly inserted into the host web page, and the other form is as a user-defined code extension to Google's Analytics code. In the latter case the web administrator can use the Analytics reports to view the IPv6 capabilities of the site's visitors in addition to the other Analytics reports. The website does not itself have to be IPv6 enabled: the tests cause the client to interact with our experiment servers and the IPv6 capability is measured between he client and these servers. In this case there is no control over who performs the test: the test is performed by all end clients who visit the site where the JavaScript is embedded.

JavaScript appears to be more widely supported than Flash. However, because JavaScript uses code embedded in web sites, the number and diversity of clients being testing in this manner depends on the visitor profile of the hosting web. Many web sites have a large volume of repeat clients, so the tested client population of the JavaScript test appears to record a particular profile of capability (for example, we have observed an anomalously high proportion of IPv6 capability in the clients who use APNIC's whois web service). The JavaScript code can also be configured via cookies not to re-sample a particular client within a certain period (the cookie has a default retry value of 24 hours), in order to counter, to some extent, measurement bias generated by repeat visitors to the site.

The original versions of the test code explicitly enumerated the individual URL tests to be executed. There is a more recent variant of both the Flash and JavaScript codes that includes a runtime configuration server. In this variant of the test code, the client will initially perform a fetch from a configuration server. This server will return the set of URLs to be used for the test. This allows the parameters of the test to be varied in the fly without having to reload the JavaScript that was embedded in the web page, or without re-submitting the ad with the embedded Flash script.

Server Configuration

We use three servers for this experiment, One is located in Australia, one in Germany and one in the United States. One server is a Linux-based host, while the other two use FreeBSD as their host OS.

The servers use Apache for the web server, Bind for the DNS server, and tcpdump for the packet capture. They are also configured with a local Teredo server, and a local 6to4 relay.

Where possible, we use 16 different addresses in both IPv4 and IPv6.

When running these tests on a highly visited web page, or using a high volume ad campaign, we have noted that there can be relatively large peak demands for web fetches on our web servers. The experiment’s webservers need sufficient capacity to handle hundreds of queries per second, which means using a system configuration that has thousands of pre-forked http daemons and kernel configuration support for thousands of open/active TCP sessions. This also requires servers with large memory configuration. Sufficient disk is required to ensure tcpdump and server logs can be held for a continuous cycle of experiments.

Post processing is currently performed in a central log archive, to integrate all sources of experiment data into a collated experiment log, which is then post processed on a daily basis.

DNS configuration

The DNS part of the experiment configuration depends on the ‘wildcard’ DNS record. All zones which serve terminal fully qualified domain names have a wildcard record which maps any name under that domain to the IPv4 or IPv6 address for the head server.

For the current experiments in both DNS and IPv6 capability, 4 distinct subdomains of DNS are registered under a single prefix:

The master server generates an experiment set for the client based on a basic geo-location mapping of the client's address to a geographic region. This is done as a rudimentary load balancing exercise, and, more importantly, to minimize the round trip time between the server and the client, and thereby avoid, to some extent, retransmits and timeouts at the client side while performing the experiment. The mapping of address to region is intentionally quite coarse, and some traffic inevitably goes to a distant head server, but this does not appear to have had a significant impact on the measurement outcomes.

The parent domains are provisioned to have identical sub-domains, which characterize DNS transport by the listed NS delegations. A domain which is only delegated to IPv6 DNS servers cannot be successfully resolved by a DNS resolver which does not have access to IPv6 transport. Consequently the client should not be told the experiment’s IP address. A DNS resolver which is dual stacked may be fetched over either IPv4 DNS transport, or IPv6 DNS transport.

As shown in the DNS zone file above, f.labs.apnic.net has 5 subdomains:

t4.f.labs.apnic.net DNS NS is on IPv4 only
t6.f.labs.apnic.net DNS NS is on IPv6 only
td.f.labs.apnic.net DNS NS is dual-stacked
tx.f.labs.apnic.net DNS NS is dual-stacked, but IPv6 is unreachable
tz.f.labs.apnic.net DNS NS is dual-stacked, but IPv4 is unreachable

Separately, results.g.labs.apnic.net and *.results.g.labs.apnic.net (a wildcard) are defined as an IPv4 A record.

Within each subdomain(t4, t6,td, tx and tz) a further family of subdomains are defined. For example, the following is the zone file for t4.f.labs.apnic.net:

Therefore from this delegation chain, an experiment configuration server can request a client to fetch an experiment such as:
http://t10000.u8738132781.s1367808039.i333.v6024.r4.t4.f.labs.apnic.net/1x1.png

The t10000.u8738132781.s1367808039.i333.v6024 part is all matched by the wildcard, under the r4.t4.f.labs.apnic.net domain.

With 5 t* subdomains and 5 r* subdomains a total of 25 domains have to be populated, each slightly different, respecting the NS and A/AAAA combinations which have to apply to that experiment.

We operate the experiment’s servers with 11 discrete BIND processes, each listening to a different IPv4 and IPv6 address. One server is used for the parent domains. One sever is used for t4, one for t6, one for td, one for tx and one for tz. One server is used for all subdomains that include the r4 forms (r4.t4, r4.t6, r4.td, r4.tx, r4.tz). One is used for all r6 forms, one for rd, one for rx and one for rz. This separation of parent and child in the DNS servers ensures the integrity of the IP behaviours in the DNS, as within this structure of authoritative server separation, the authoritative name server for the parent is unable to answer questions that can be resolved by the authoritative name server for the child.

Web server and Client code

The Apache webserver needs to be configured to accept all local IP bindings and use ‘virtual server’ configuration to service them. In the simple configuration model we use the ability of the Apache httpd 2.2 to define a default virtual server, which captures all otherwise un-defined instances. This framework is suitable to be the handler for all incoming 1x1.png requests.

.HTACCESS file configuration

For .htaccess, we define a few extra behaviours to force the expiry on served images to be in the past, and ensure we can serve compressed data (this helps reduce the load time of the .js significantly)

Log configuration

To ensure that the logfile includes the specific virtual server called by the client, the ${Host}i field captures the Host: HTTP value from the client initial query.

The 1x1.png is served with a ?= list of arguments which also record the specific runtime fetch, in another field. The combination of this ensures that within the logfile we can correlate the specific experiment, and returned results with DNS and tcpdump logs.

Runtime Configuration

The runtime configuration is achieved by a CGI handler, which is called by the flash logic, or embedded in the <script>….</script> for the JavaScript instance.

This represents a URL to use for each of four tests, and URL to use to pass the results of the four tests back to the server.

The head-end has used the Apache REMOTE_ADDR environment variable to select which head-end server to use, based on the parent /8 block. This provides a crude granularity to the responsible RIR, mapping ARIN and LacNIC to node G, AfriNIC and RIPE to node H, and APNIC to node F.

The AdvertID= variable permits the head-end CGI handler to select different experiment criteria, so we can use the same control logic to run DNSSEC and IPv6 experiments, or vary the experiment behaviour slightly for a subset of clients.

This call is embedded directly in the flash experiment (see code in appendix).

JavaScript

Two forms of JavaScript are used. One is a pre-defined .js file which can be included along with configuration in a website, and then subsequently hand-tuned by the website manager to perform variants of the experiment, and send results to google analytics.

This shows a small embedded instance of head-end configuration, which sends a download of the complete .JS to the client, which then executes this JavaScript. This variant of the code uses the head-end server to ‘minimize’ the JavaScript to the specific test set required for this experiment, and is therefore not a fully general case.

The generalized JavaScript code, and this specific code, is included as an example in the appendix.

Appendix 1: Flash Code

The Flash Code has been created using HaXe as a development language, which is converted by a HaXe compiler (ML) into actionscript compatible with flash version 8.

Reduced JavaScript

This version has a slightly different configuration and runtime invocation model, because it is embedded as a function call directly in the markup rather than as a text/javascript reference to a specific .js.

The head-end configuration engine for this, is also running other experiments using a file-backed mechanism which specifies the URLs for experiments from back-end configuration master files.

Appendix C. Collating the data

Data about each experiment has three sources of capture:

TCPDUMP of port 53 (dns) and port 80 (web) to the authoritative DNS server and webserver, which are co-resident on the same host, even if using different IP addresses. Additional captures can be made of teredo and 6to4 tunnel bindings, to detect tunnel specific behaviours.

DNS query logs

Web query logs.

All three sources should contain events which relate to a specific u*.s* instance, unique to one user.

The query logs in DNS provide the relationship of this experiment-id to a specific resolver, or set of resolvers conducting the DNS queries. The query logs additionally identify DNS flags (EDNS0, DNSSEC OK, Checking Disabled…) and transport (UDP or TCP, IPv4 or IPv6).

The query logs in Web provide the Clients IPv4 and IPv6 address, for the experiments and permits the specific IPv4 and IPv6 pair to be related to each other. The *.results. lines records for each experiment id the clients own sense of the completion times for the experiments or ‘null’ if not completed, which provides details on the client view of experiment behaviour.

Disclaimer

The views expressed are the authors' and not those of APNIC, unless APNIC is specifically identified as the author of the communication. APNIC will not be legally responsible in contract, tort or otherwise for any statement made in this publication.