At work I am given the task of implementing a basic device profiler service to
classify the incoming HTTP requests into a certain set of groups (desktop,
tablet, mobile, etc.) using the
User-Agent header. It opens a
multitude of new dimensions both at the client- and server-side for interface
and content customizations tailored to the device. For instance, you can
disable some of your fancy gestures (e.g., mouseover events) that will not
be properly used on a touch screen. Or you might want to prioritize console
games for a client who uses a PlayStation to browse your shop.

I first tried to investigate and (if possible) evaluate the existing solutions
in the wild, including the commercial ones. And eventually decided to go with
Apache DeviceMap that employs
OpenDDR in the
background. In this blog post, I tried to wrap up a summary of the
experience I collected through out this pursuit.

Parsing a User-Agent string

While Section 5.5.3 of
HTTP/1.1: Semantics and Content specification has a lot to say about the
formatting of the User-Agent header, the first thing for sure is that there
are no restrictive rules that shape the header in a machine-readable format.
Consider the following examples:

So making a conclusion on which part denotes the product name, version, etc.
is a fuzzy, tedious and error-prone task. That being said, there is another
thing we can do here! Every User-Agent is more or less unique to the device
that the software runs on. Hence, if we can come up with a database such that
User-Agent strings are mapped to devices, we can employ this database to find
the device of a certain User-Agent. This is where the term Device Description
Repository (DDR)
kicks in:

The Device Description Repository is a concept proposed by the Mobile Web
Initiative Device Description Working Group (DDWG) of the World Wide Web
Consortium. The DDR is supported by a standard interface and an initial core
vocabulary of device properties. Implementations of the proposed repository
are expected to contain information about Web-enabled devices (particularly
mobile devices).

The Rise of DDR

WURFL (Wireless Universal Resource FiLe)
was the first community effort focused on mobile device detection and dates
back to 2007. While WURFL was initially released under an “open source /
public domain” license, in June 2011, project’s founders formed ScientiaMobile
to provide commercial mobile device detection support and services using
WURFL. As of now, the ScientiaMobile WURFL
APIs are licensed under a dual-license model, using the AGPL license for
non-commercial use and a proprietary commercial license. In a world dominated
by capitalism, the current version of the WURFL database itself is no longer
open source. Inspired by WURFL and motivated by the gap in the market, it did
not take much for alternative companies to emerge, including, but is not
limited to, DeviceAtlas, Handset
Detection, and
51degrees.

So how far one can go using a DDR to detect the properties of a device by just
looking at the User-Agent header? Below is a sample output that I collected
from 51degrees:

Per see, what they can get by just looking at your User-Agent header is (to put
it mildly) a lot!

Long live F/OSS!

As usual, community’s response did not take long and the most recent open
source version of WURFL (dating back to 2011) is forked under the
OpenDDR project. Later
on, community kept updating the database by the effort of individual
contributors.

While OpenDDR file format allows hierarchical device representation as in
WURFL, it rather maps each device to a set of attributes explicitly. To make a
comparison, see how WURFL takes advantage of its hierarchical device
representation:

Apache DeviceMap

While DDR exposes you an almost exhaustive set of device vendors, models, and
attributes, it does not provide you a search mechanism in this swamp. Apache
DeviceMap (which is graduating from incubation
as of this writing) is a project that fills this gap. DeviceMap basically
ships two fundamental Maven artifacts: an OpenDDR clone (devicemap-data) and
a driver (devicemap-client) available for Visual Basic, C# and Java
programming languages.

Usage of Apache DeviceMap is pretty straigt forward. You first include
necessary set of dependencies in your POM file:

Per see, the output is not much detailed as the one we got from 51degrees.
Almost all of the crucial data is missing and there are some mistakes in
certain entries like display width and height. That being said, it got three
important bits right: is_desktop, is_bot, and is_tablet. Note that there
does not exist much of a mechanism to verify the correctness of the attributes
returned by the employed engine. That is, the nature of the problem also
implies the absance of a verification mechanism.

The Grand Decision

If you had a chance to check out the website of the commercial DDR and
User-Agent detection solutions, you should have noticed the giant IT leaders
(Google, Facebook, PayPal, etc.) in their customers list. Hence, it was a
little bit tempting to go with a commercial solution. That being said, I also
wanted to evaluate the performance of the Apache DeviceMap. For that purpose,
I collected a couple of months worth visitor data at work and tried to resolve
User-Agent headers. The results were very promising and Apache DeviceMap
succeeded to resolve almost 90% of all the collected User-Agent strings. That
beign said, resolving a User-Agent – that is, matching the given User-Agent
against DDR and returning a set of attributes – does not imply the
correctness of the returned attributes. Nevertheless, I repeated the same
experiment with almost two dozens of different devices at work and it
succeeded in almost everyone.

Since the project that I am working on is still in its early stages and the
initial results are more than satisfactory, we concluded to go with Apache
DeviceMap. Further, we managed to increase its coverage up to 99% by
introducing some entries manually to do the database. Indeed, we
reported
a majority of those enhancements back to the Apache DeviceMap project.