Kategori: tech

In 2002, the European community introduced Directive 2002/58/EC, commonly known as the Directive on privacy and electronic communications. Amongst other provision, it has the subarticle 5(3) which has made it known as ”The cookie directive”, as the subarticle states that information may be stored in or retrieved from end user computers only if the user is made aware of this and is given the opportunity to refuse this storage or retrieving. In 2009 the directive was amended (2009/136/EC) so that storage or retrieval is only permitted if the user has given his or her consent.

The full text of the amended subarticle is as follows:

3. Member States shall ensure that the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned has given his or her consent, having been provided with clear and comprehensive information, in accordance with Directive 95/46/EC, inter alia, about the purposes of the processing. This shall not prevent any technical storage or access for the sole purpose of carrying out the transmission of a communication over an electronic communications network, or as strictly necessary in order for the provider of an information society service explicitly requested by the subscriber or user to provide the service.

This regulation is generally understood to apply to the HTTP State Management Mechanism (RFC 6265, earlier RFC 2965, RFC 2109), most commonly known as ”Cookies”. In fact, preamble 25 of 2002/58/EC and preamble 66 of 2009/136/EC explicitly mention cookies as one example of such mechanisms. National regulations and in particular guidelines have focused on this particular mechanism for storing and accessing information on end user computers over a network.

But the directive text can clearly apply to other mechanisms apart from HTTP Cookies. Among mechanisms that permit similar storage and retrieval of information are the Local Shared Object mechanism found in Flash, the userData functionality in Internet Explorer, and more recently, a varietyofmechanisms being defined and implemented under the html5 umbrella.

Two questions are therefore interesting:

Are there mechanisms in html5 that allow user tracking (including by third parties) in a way that is not subject to the consent requirement?

Are there mechanisms in html5 that have no privacy concerns, yet is subject to the consent requirement?

The first question is the most sensitive, and the hardest to answer. But consider a javascript that is served by by a third-party ad network, and is included by a number of unrelated content sites. If such a script:

Generates a local GUID on the client (ie an identifier that the ad network did not choose)

Sends this GUID back to the ad network using a background XMLHTTPRequest (and, presumably some other information, such as the URL of the page embedding the script) to the ad network.

(Step 1-2 are skipped if the GUID is already present in local storage)

Such a script has the same ability to track a user’s movement across sites, and to assign a user (or rather his/her computer) a permanent identifier. But does it require consent according to article 5(3)? One way to argue that it does not, is to take note of preamble 66: ”Third parties may wish to store information on the equipment of a user, or gain access to information already stored, for a number of purposes”. It may be argued that step 1-2 does not mean that it is the third party (the ad network) that stores the information (indeed, the ad network does not know what information is stored). If the third party hasn’t stored the information, then the gaining of information in step 3 might not be be subject to the rule as well, since the wording seem to require that information gained by a third party must have been previously stored by the same party. (If there is no requirement that the information gained must have been stored by the same party, one must note that every third party whose resources are included by a web page automatically gains access to a lot of information, such as the User-agent string, and ask if that information gaining is subject to the directive as well).

I will concede that this argument is not strong, as it’s assumption that step 1-2 does not constitute information storage by the third party, when the third party is responsible for sending the javascript code that ultimately results in information being stored. It seems functionally equivalent to traditional HTTP Cookie-based storage of information. But the difference is that using this method, the third party does not specify what information should be stored. Could this not be significant?

The second question seems easier to answer. Consider Offline web applications. These are web pages that contain a reference to all resources (HTML, Javascript, CSS) they require in order to work. A browser supporting offline applications will download all these resources so that the application works even if there’s no internet connection. Note that if the browser does not support offline apps, they still work — they just require you to be online. A simple example containing a version of the Halma game is described by Mark Pilgrim.

This mechanism causes the storing of information on the end user computer. This storage is not strictly necessary in order to provide the service (remember, the app works without the mechanism if the user is online — offline support is just a nice-to-have). No information is ever accessed by the provider of the game, but this is not a requirement of the directive, storing of information is enough. Thus, consent is needed. And yet there are no privacy concerns (no personal identifiable information is ever retrieved).

The aim of article 5(3) was to regulate certain usages of cookies percieved to be illegitimate. But it was written to be technology neutral, as new techniques similar to HTTP cookies were sure to be created after the directive (The diabolical evercookie uses 12 additional mechanisms, including a brilliantly twisted way of storing information in the users browsers history of visited URLs). The problem is that such mechanisms are only similar, not identical. This make writing technology neutral legislation really difficult.

The rest of the thesis consists of two appendicies (firstly describing the system prototype in detail, including how to run it yourself, secondly describing the ”gold standard” tests we’ve evaluated the system against) and the bibliography.

That is all, for this time! If you’ve read all chapters so far, I’d really appreciate your comments and suggestions for improvement.

After having described relevance and information retrieval in general and legal context, as well as reviewing previous work in the field as well as designing and evaluating a better relevance ranking method, are we done? No, we’ve only just started! Here are some pointers on how this approach might be improved.

Finally, this is the heart and soul of this thesis (even if it’s only a few pages). A system designed for better legal relevance ranking is described and evaluated. Although primitive, being based only on simple known link analysis algorithms, it seems to perform really good compared to traditional ranking methods.

There have been many attempts to improve relevance judgments in legal information retrieval systems, with many approaches. This chapter describes some of these, to better understand the context in which the prototype system described in the next chapter appears.

Relevance can be interpreted in many ways, from subjective to objective. Which interpretations are built into traditional information retrieval systems, and what properties does these manifestations of relevance have? The use of IR for legal information has a long history. How does legal information retrieval correspond to the legal method, and can we improve on this correspondance, by e.g. creating a relevance ranking function more in line with what is considered legally relevant?

In order to define a better relevance ranking method, we need to delve deep into what relevance really is, and what aspects of it we can measure in an information retrieval system. We also examine what relevance means in a legal context, and how it is connected to other concepts such as authority and what clues to relevance we can find in legal information.

The first chapter sets the scene by describing the basis for information retrieval systems, legal information and how it is used, as well as the motivation for improving the former so that we can use the latter better. It also contains a description of the method used in the thesis, as well as the general structure of it.

My graduate thesis, somewhat loftily titled ”Towards a theory of jurisprudential relevance ranking – Using link analysis on EU case law” has been submitted to and approved by my supervisor. It has taken far too long time since I first started working on it, but I’m very satisfied that it is finally finished. Except that it’s not really finished, since I hope to re-work and extend it with the aim of publishing it in some other form. Which is why I’m soliciting feedback on it.

Over the coming week, I’ll be publishing a chapter at a time. Each chapter will be available in PDF form and also inline in the form of images. This since that was the best conversion to a web-friendly format I could manage… (also note that the pagination differs slighly between the PDF and the web version).

If you are at all interested in legal informatics, information retrieval, jurisprudence or just what we really mean when we say that something is relevant, I hope you will find the time to read the chapters and maybe also give me your feedback below.

We’ll be kicking of with the front matter of the thesis. It does not contain anything substantial in itself, but it has a very neat Gephi-drawn cover and some interesting quotes. The table of contents should give you an idea of what it is about.