4 June 2014

A
recent paper
by myself and some colleagues
(Prof. Renée Hutchins of the University of Maryland Law School,
Prof. Tony Jebara of the Columbia University Computer Science Department,
and Sebastian Zimmeck, a Columbia CS PhD student who is also an attorney)
shows how to use computer science, and in particular a field called machine
learning, to answer two very specific questions in Fourth Amendment law:
is there some scientific basis for acceping the mosaic theory; and if so,
at what point is it reached? The paper has drawn some
press coverage;
it's also drawn some
criticism
from Prof. Orin Kerr of the George
Washington University Law School.
Not surprisingly, we stand by our conclusions.
(I should note that we also think that the
article about it is a fair representation of what we said.)

What is the mosaic theory? Basically, it's the concept that while a single
observation, say of someone's location, might not be a search under
Fourth Amendment law, a whole series of observations collectively might be
one.
Prof. Kerr
explained it
this way:

Under the mosaic theory, searches can be analyzed as a collective
sequence of steps rather than as individual steps. Identifying
Fourth Amendment searches requires analyzing police actions over
time as a collective "mosaic" of surveillance; the mosaic can
count as a collective Fourth Amendment search even though the
individual steps taken in isolation do not.

The sequence of a person's movements can reveal still more; a
single trip to a gynecologist's office tells little about a woman,
but that trip followed a few weeks later by a visit to a baby
supply store tells a different story. A person who knows all of
another's travels can deduce whether he is a weekly church goer, a
heavy drinker, a regular at the gym, an unfaithful husband, an
outpatient receiving medical treatment, an associate of particular
individuals or political groups—and not just one such fact
about a person, but all such facts.

The privacy invasion, then, comes not from the individual observations
themselves, but from what can be inferred from them:

The whole of one's movements over the course of a month is not
constructively exposed to the public because, like
a rap sheet, that whole reveals far more than the individual
movements it comprises.

In other words, the mosaic theory asserts that the issue is not just the
series of observations but also what else they imply:

the whole of one's movements is not
exposed constructively even though each individual movement is exposed,
because that whole reveals more—sometimes a great deal
more—than does the sum of its parts.

This is the definition of the mosaic theory that we're using.

Do mosaics exist, as a scientific concept if not necessarily a legal
one? Our review of the academic literature shows that the answer is
"yes". Experimenters have found that by using machine
learning, they could predict things like
ethnicity and partnered status just from location data.
Whether or not it can predict "trips to the psychiatrist, the plastic
surgeon, the abortion clinic, the AIDS treatment center, the strip club,
the criminal defense attorney, the by-the-hour motel, the union meeting,
the mosque, synagogue or church, the gay bar and on and on" (to
quote Justice Sotomayor's concurrence in
United
States v. Jones) is still an open question,
but the basic idea is sound. That is our first major conclusion:
conceputally at least, automated tools can make predictions that go
well beyond the directly observed data, thus validating the
underpinning of the theory.

The second question we addressed was posed by Prof. Kerr himself:

For example, what is the standard for the mosaic?
How should courts aggregate conduct to know when a sufficient
mosaic has been created?

Again, the precise questions are not answered in the technical literature;
however, there is sufficient data to let us draw a tentative line at about
one week. A paper
"Limits of Predictability in Human Mobility",
by Song, Qu, Blumm, and Barabási,
is one source for our conclusion: they found that, not
surprisingly, most people's weekday schedules are predictable;
more suprisingly, they wrote:

we did not find significant changes in user regularity over the weekends
compared with their weekday mobility, which suggested that
regularity is not imposed by the work schedule, but potentially is
intrinsic to human activities.

and

In summary, the combination of the empirically determined user
entropy and Fano's inequality indicates that there is a potential
93% average predictability in user mobility, an exceptionally high
value rooted in the inherent regularity of human behavior. Yet it
is not the 93% predictability that we find the most surprising.
Rather, it is the lack of variability in predictability across the
population.

[The location of a
person] is likely to be a good predictor of [the person's]
location exactly
one week from now.

In other words, where you are today, whether today is a work day or not,
is an excellent predictor of where you'll be a week hence. If the issue
is what can be learned beyond the direct observations, a week's worth of
data may suffice. This is not firm—again, we need to do more
precise experiments aimed at legally interesting questions—but
it provides some rational basis for setting a limit.
The Massachusetts Supreme Judicial Court
held
held that two weeks of monitoring
"was more than sufficient to intrude upon the defendant's expectation of
privacy safeguarded by art. 14.",
but they explicitly declined to say what the limit should be. One week
may not always be the answer, but there are some grounds, based on both
scientific evidence and intuitive reasoning, for adopting our limit for
now, pending insights from further research.

Have we settled the question of the mosaic theory? No, absolutely not.
Prof. Kerr posed several other difficult questions, not the least
of which is whether or not courts should adopt the mosaic theory as
a legal matter. That said, we think have answered two of the thornier
ones: whether or not mosaics in fact exist, and if so when they occur.