How the Snowden Saga will End

To understand how something will end, we have to understand how it began.

Around 2003, someone very very high up in the intelligence community
visited Cornell. You might think you know people high up in the
intelligence community; this person was a click higher. Naturally, I
signed up to chat with him. I remember that he walked into my office
with the aura of Brigadier Gen. Jack D. Ripper having sent the entire wing in
on Wing Attack Plan R. Underneath the bravado though, there was a strong
dash of Gen. Buck Turgidson, and in any case, he had no
military background and would have been too plump to make PFC if
he enrolled. The protruding belly betrayed years spent managing physicists,
whose research seems to run on beer as much as it does on taxpayer money
that is spent on gadgets that promise to "unleash the secrets
of the cosmos and the inner workings of God," with the latter added purely
to win Republican support in Congress, but ends up refining a constant
by an inconsequential amount.

It's not that the US surveillance industry is afraid of the public, but they do deny them their essence existence.

Once he decisively took a seat in my office, he leaned back, and I
could imagine him taking a puff from a non-existing cigar,
Ripper-style. This person knew how to pause. He then said "So. You're
a systems guy. You must know about systems." I braced for a question
from left field, probably involving flouridation and the purity of our
precious bodily fluids. "What if I had a graph. A really, really
large graph. Billions of nodes. Trillions of edges. Let's say every
node on the graph was a person. Edges between people described phone
calls, interactions, stuff like that." He paused for dramatic
effect, as he mentally took another puff from his non-existent
cigar. "How would you find bin Laden?"

The man was literally looking for bin Laden, in my office.

And he was using Big Data to do so, long before it was a trendy buzzword.

Big Brother discovers Big Data

It has been no secret whatsoever that the intelligence
community in the US has been collecting and collating data at a massive scale.
And any techie will tell you that you can't operate at "Google-scale" while
remaining compliant with all those pesky laws and regulations that
restrict the government from performing domestic surveillance.
This data is inherently tainted and inseparably intertwined.
Anyone who pretends otherwise has a Total Information Awareness program
to sell under a disguise.

So, I knew it, you knew it, and everyone else knew it long before Snowden
revealed awfully designed PowerPoint slides that confirmed what we all knew.
And thank God for that, so we can now discuss where we are and how to
proceed from here.

But to have a rational discussion, let's first dispense with the false
indignation and cheap righteousness that seems to run freely in DC these days.

We Were All Snowden Once

And we failed to take a stand. We consumed reports about "terrorist
chatter" without asking how those reports were compiled. We knew
that Atta and friends were not hanging out on IRC in the #terror channel,
chattering away. In actual fact, they were using messages left in GMail Drafts folders
to evade detection, and to get at this information, you either use
human intelligence (HUMINT) or signals intelligence (SIGINT). The
news reports clearly implied that we were hearing the distilled
result of massive SIGINT, and the nation ate this stuff up.
So if any Congressperson is going to get indignant now and get upset
about what was happening, let me remind them:

Das Parfum seemed like a creepy novel until the scent jars of the Stasi were revealed.

We knew it because we were warned. There was no shortage of people who
spoke out while the PATRIOT act was being passed that the
ensuing culture of surveillance was going to erode values we hold near
and dear. Many people urged caution, and noted that
we needed to define that which we hold sacred, so we can uphold those
principles even as we fight terror. Kind of the way Norway did when
tackling the threat Breivik posed to their open society.

But we threw that all away. It was the result of a decades-long
process of wussification, enabled by lack of leadership. John Wayne
was long dead and gone, the Marlboro Man had died of emphysema, Kurt Cobain had
offed himself, and Superman was on a ventilator. There was no one who
seemed to possess a pulse and a sense of American principles; in fact, one of
the top people in charge lacked both. I cannot overemphasize the
damage this overreach did to our identity as a nation. Even from where
I occasionally sit, on the admissions committee at an Ivy League
university, its impact is pretty clear: the number and quality of our
overseas applicants dropped as we squandered our moral high ground.

We knew it because we told them to do it. Massive data
collection is something that the intelligence community is explicitly
tasked to do. They would be remiss in their duties if they did not
actually attempt to collect everything they can collect. This
community consists of hard-working, reasonable individuals who have been
handed an impossible, ill-defined mandate;
namely, protect US interests here and abroad against all possible
threats. And the mandate comes from a public that has grown exceedingly
soft, one that is all too ready to compromise its core principles.
It is no surprise that they will encounter gray areas, and in
fact, it is not even a surprise that they will overzealously venture
onto not so gray areas.

And we actually need the NSA to perform signals intelligence. Shortly after the
Snowden leaks, there were calls for ceasing all SIGINT or defunding the NSA. Anyone who
advocates such an extreme position is either not living here or hopelessly naive.

But the boundaries of such data collection have to
comply with our values and have to be arrived at through a collective
debate. "The Johnsons do it"
is no reason to compromise that which
defines us, even, or especially if, the Johnsons are Chinese or French.
The same establishment that is
currently using the French as justification for surveillance
were calling them all kinds of derogatory surrendering
simian names just a short while ago.

Tigers have stripes and massive data collection of all kinds is what
this community does with tax money. This is why checks and balances
are necessary.

We knew it because it's in their DNA.
Massive data collection is nothing new. It was only
a few years ago that the Stasi archives were opened up to reveal
"Geruchsproben,"
carefully catalogued jars containing the smell
of their citizens. The Stasi had field agents
collect smell samples, sometimes by having the citizens sit on special
chairs, or sometimes by breaking into people's homes and literally stealing their underwear.

She has no reason to fear the underwear-stealing Stasi.

The obvious reaction is to get outraged and demand to know "under what
conditions would the smell of an individual be of any use to an
intelligence officer?" As with most obvious questions, this question,
its answer ("to sic the hounds"), and the ensuing debate are all
a waste of time. One can
already imagine the headlines. Wired will discuss, at length, the
mechanics of tracking people by scent, with a special highlight on a
nose designed out of commercial-off-the-shelf components by the MIT
Media Lab that outperforms a hound in carefully controlled laboratory
experiments, as long as that laboratory is squarely inside the MIT
Media Lab. DailyKos will ring out with indignation, while the National Review will
ask why anyone would use perfume if they didn't have something to hide.
When the topic is thoroughly exhausted, when it is so universally accepted that
the act of speaking up carries absolutely no political risk,
mainstream writers like Thomas Friedman will jump into the fray. The
only sane voice in all of this will be a short letter by a NYT reader
from Kalamazoo, Michigan, pointing out to no one in
particular on page C6 that "to sic a hound on a person's scent trail,
you need a starting point for the scent, for the hound cannot perform
investigative analysis to locate someone, and if you've got a starting
point, why do you need the scent sample?"

By the way, speaking of DNA: You can be sure that there are massive DNA databases in cold storage, and it'll take a Snowden Junior for that other discussion to come to light. We know that the intelligence community collected DNA data at the risk of forever tainting vaccination efforts worldwide. How surprised would anyone be
if busboys in trendy DC bars near Embassy Row earn an income on the side, swiping saliva from
the used utensils of foreign emissaries, or if worldwide bone-marrow registries have been compromised to compile aggregate DNA data for different ethnic groups.

Once again, this industry collects every kind of information when it's left unchecked.
All the soul searching and righteous indignation from inside the beltway is cheap,
after-the-fact posturing.

For the right question is: what will happen now that the Snowden leaks are public?
Having re-directed the emotional angst that this topic attracts, we can now dispassionately
analyze how we will find the balance point between national security interests and
privacy concerns. But to make any prediction, we need a framework.

A Framework for Analysis

There has been a lot of discussion on how the Snowden saga will end. I am
referring of course to the actual part of this story that has societal
consequences, not to the human interest story around Snowden that the
media wants to play up. While the questions that relate to Snowden-the-man
are interesting, e.g. "what motivated him to give up his life in Hawaii?", "did he time his leak
to undermine the Sino-US talks?", and most importantly, "what is it like being
stuck in an airport for life?", they are essentially of very little consequence
to the rest of us.

Instead, I want to offer a simple framework for
analyzing Internet policy issues, unabashedly cribbed from the
Internet visionary David Clark, but amended with a quantitative model
that can tell us what to expect. So, let's do some soft science
on a topic that involves technology, policy and society.

Three Forces

All online policy emerges as a result of three competing forces.
Think of them as vectors whose net sum determines what happens online
at the end of the day. Once we sort out the strength and direction of
each force, it becomes really trivial to figure out what will happen next.

Here are the forces, their direction, and their magnitude:

Military/Political:

This force vector consists of the
military, intelligence and political establishment, whose aims are
to keep online social movements in check. Its ostensible goal is
to control all online interactions. The force vector points in a
direction much like mainstream media today: left unchecked, this
force will reduce the number of voices, limit how they can be
expressed, and eliminate all discourse except for a handful of
corporate-approved messages. This is the right arm
of Big Brother, with swole fingers from the 5am morning beltway
jog followed by crossfit that makes typing out nuanced computer
policy difficult.

It's this force vector that tried to ban encrypted communications
for decades and tried to make it illegal to export three-liner
perl scripts for encryption. It's
the same forces that advocated, and still continue to advocate, a
"driver's license for the Internet." Everyone who has been on an
online forum and engaged in a meaningless online fight has
felt like wanting to hunt down whoever is hiding behind that pseudonym.
When I ran an online forum where political discussions took place,
the main request I received from
the older folks was that I "demand users to fax me their identity cards."
While most young people have deeply understood that this is neither
possible nor desirable, the people who are in a position to make
Internet policy proposals have yet to learn this lesson.

So it stands to reason that some of the more aspiring members of
this community will want to collect everything everyone does
online, keep it in a big datacenter in the middle of the country,
and run queries against it to see who's up to what.

And these folks have an enormous budget. Unopposed, they're
unstoppable, for what kind of a president can overrule this
community, and still count on them to feed him useful analysis
later on? The only thing tampering the destruction that this force
could unleash is that it historically has not been technically
savvy, but this is, evidently, changing.

Commerce:

This force vector consists simply of the collective
economic interests of companies that fund elections. And it points
in the direction of making the Internet a "pay-for-play"
environment, of maximizing revenue extraction from Internet
users. It has no ethics or higher goals; it is axiomatic, in our
current times, that companies are motivated solely by greed
and answerable solely to their own shareholders, regardless of the
fact that they rely on countless resources from their surrounding
society. This single-minded drive for profit maximization actually
makes it easier to analyze this force.

The commerce vector typically operates synergistically with the
military/political vector, where one herds the online populace to
a few corporate-owned and operated choke-holds, and the other
extracts profits. But the commerce force does break ranks with the
M/P force, to create more communication channels instead of fewer
venues, to create more interactions among users instead of
suppressing them, and so forth. And as powerful as the
military/political force vector is, and as many dollars as they
command, the commerce force vector commands two to three orders of
magnitude more. So they handily beat the Military/Political forces
every time they point in different directions.

What happened with encryption makes the relative magnitudes of the
forces very clear. For decades, it was US policy to prohibit the
export of cryptographic algorithms. And the cool geeky t-shirts
with those perl three-liners did absolutely nothing to change
policy -- the geeks were too few in number, and "freedom to share
cryptographic algorithms" was not exactly a cause the lay public could
rally behind. But the moment the US computer industry decided
that, for its own competitiveness, it needed strong encryption on
the Internet, the politicians suddenly discovered that the First
Amendment applies to crypto just as surely as it applies to
everything else. The transformation was overnight, and it brought
us good things, like SSL and online commerce and a host of other
developments that make the world a much more interesting, and
better, place.

Public:

This force vector consists simply of the collective
human interests of the people who use the network. It is by far
the most powerful force, but has a number of shortcomings: it is
slow to awaken, not technically sophisticated, and easy to derail
and divide into factions over trivial concerns. But once the
giant is awake, absolutely nothing can stand in its path.

What makes the public stand up and take a stance? No one knows.
The Arab Spring was precipitated by a street salesman whose cart
was taken away by the police, who got so depressed that he decided
to put himself on fire, and before we knew it, dictators across
many continents were spinning up their chopper blades. The Turkish
uprising was precipitated by a couple of trees in a park. Second
wave of Brazilian uprisings were over a 10 cent hike. This makes
this force terrifying, because when the giant shows signs of
awakening, when his eyelids flutter and he's asking questions
trying to get his bearings, it's too late.

Vector Sum

I propose a simple technique to decide which of these forces will reign supreme based
on simple high-school math with a dash of historical analysis.

On issues of Internet governance, I propose the use of dollars as a common, universal
unit of strength for measuring societal forces.

sum up the amount of spending that the M/P force field is going to need

to achieve its goals. There are slightly different multipliers for
"new funding to be appropriated" versus "already allocated funding,"
as laying people off meets more resistance than new pork-barrel spending,
but a unit multiplier is sufficient for first-cut analysis. Add to this
the cost of catastrophes that the policies could prevent, multiplied
by their likelihood.

sum up the amount of revenue the commercial sector stands to gain or lose

under different policy regimes.

sum up the amount of economic activity by which the proposed policies

affect citizenry.

Doing all this quantitatively is difficult, but engineering is all about back of the
envelope calculations.

In this case, the first number is simply the sum that the intelligence community
spends on eavesdropping at large scale, combined with very
tiny likelihoods for events that have modest costs. Contrary to the $30M claim in
the Snowden slide deck, the cost of the datacenters and the analysis personnel
will likely be in the single to very low double digit billions.

The second sum is the dollar amount that the cloud providers would lose due
to direct loss of revenue from antsy customers, especially foreign ones, as well
as the indirect costs of losing dominance in their field.

Finally, the last sum is the value people place on activities that cannot take place in a
surveillance society. It is almost impossible to estimate this, for we cannot know, say, the dollar figure
a dissident would place on being able to blog unfettered, or perhaps, the dollar value Andrew Weiner would
place on keeping Carlos Danger's emails private. But a good proxy for this metric is simply the amount of
money people plaintively spend on underground activities, a small percentage of which would be curtailed in a
surveillance state. Given that our underground economy is roughly 1-2 trillion dollars, even tiny
percentages have impact.

So my rough guesstimate is that the forces are aligned in the ratio 1:1:3, with
an alliance of the public and commercial interests that overpowers the M/P establishment
in favor of transparency and online privacy guarantees.

Let's sanity check: measuring by the highly scientific metric of column inches in newspapers that I have
personally seen (yes, I'm fully aware of the bias and write this with a tongue firmly in cheek. I would
love it if someone would help do a proper quantitative analysis), I see an
approximate 2-to-1 ratio. For every unabashedly condescending, non-self-reflecting, pro-surveillance gung-ho article in WaPo,
the beltway insider rag that once attributed the invention of
email to a self-proclaimed child-prodigy from MIT who made misleading claims through Wikipedia, there are at least two
articles critical of the alleged activities. The cloud companies have already started to
feel the sting of lost profits and have initiated a push for transparency. And in line with our estimates,
the reaction from the public has been much stronger than the tepid call for transparency from industry.

The numbers suggest that the US will emerge out of the Snowden debacle with a set
of processes that prohibit the kind of domestic surveillance that Snowden exposed.
But the forces are fairly close, and the victory will be a highly qualified one.
We'll get the minimal set of changes such that a figurehead can say
"we do not perform domestic surveillance" with a straight face,
for a specific definition of every word in that sentence.

Loopholes

For instance, we may be left with loopholes big enough to drive trucks
through, say, trucks containing UK data from the US to the UK, and
lorries containing US data from the UK to us. Cloud computing makes it
trivial to escape pesky jurisdictional obstacles by sending queries that
used to go to a datacenter in Utah to, say, Ireland instead. Closing
this loophole will prove to be difficult, because information-sharing
between different nations is actually desirable, and the legal system
is better at binary decisions than those of degree.

Or we may end up with only token changes to the way court orders are retroactively
issued. The idea behind the current scheme is that the surveillance engine can collect
data now, under hot pursuit, and justify it later. While there is merit to the hot
pursuit argument, its unrestricted use garners an environment where results come
first, principles are an afterthought, and there are no effective checks against overreach.

Or we may never get what is most needed, which are strict limits on
how the data, once collected, is used. We now know that a contractor
in Hawaii has access to the entire crown jewels of the NSA, namely,
"metadata" about the kind of information they can collect and analyze. How
many contractors have access to the lower-value phone call "metadata"? There are
now naive proposals to avoid a second Snowden mishap by doubling-up
every analyst and have someone look over their designated buddy's
shoulder. Besides doubling employment in the DC metro area, this
ill-conceived attempt only makes it more likely for data to fall
into the wrong hands. What we need are trustworthy processes, backed
perhaps by trustworthy operating systems that can provide assurance that
no one, not even system administrators, can violate a policy associated
with a piece of data. Anything less opens us up to a "deep state," where
certain elements within government misappropriate the surveillance data
for their own ends.

A Success Indicator

In big battles, it's instructive to pay special attention to certain smaller engagements
that serve as litmus tests. In this case, I expect the
"abandoned email loophole" to serve this purpose. The 1986 Electronic Communications Privacy Act
classifies old emails left on a server for more than 180 days as "abandoned" and gives the government
authority to read these emails without a warrant. This is clearly an outdated loophole, one that
would undermine discerning users' and companies' willingness to store emails in the cloud.
If the cloud providers really feel the sting of user apprehension about surveillance, and if they
really put their weight into fighting on the same side as the public, this loophole
would quickly be shut down. Whether or not it is indeed firmly closed will be an indicator of
genuine change of surveillance policies.

Overall, it's time to be somewhat optimistic: the fundamentals point to a ground shift,
where the commerce and public forces are now exerting an influence on previously unchecked
elements in government, and the net vector points towards a freer, better Internet. But
there are reasons for concern: the battle will be drawn out, victories will be partial, and
the extent to which loopholes get left behind will determine how much privacy we have online.

Dr. Strangelove meets Gen. Ripper

He can actually walk if we let him.

Back in 2003, the rest of my conversation with my esteemed visitor was
a lot of fun. I told him what I'd do to handle such a large
graph. And I gently told him that I wasn't building such a graph
database. I wasn't then, but I am now. We have a system called Weaver
in the works, inspired by the revolutionary HyperDex database. I realize that its utility is much
lower now that bin Ladin has been located and rightfully dispatched,
but there will undoubtedly be other massive data sources, and we'll need
systems that can handle them. For we cannot afford a graph database gap,
any more than we can afford a mineshaft gap.

But in addition to such a database, we need tightened definitions for what kinds
of surveillance data can be collected, as well as technical and legal measures to
keep that data used solely in accordance with appropriate policies. Interestingly,
there are technologies that can
restrict what users, including Snowden-like "super users", can do with data. Once
we re-establish our principles, we have the technical means to enact them. But first,
the current era of covert, boundless data collection must come to an end.

Related

In the movie, Dr. Strangelove meets with the President and
Gen. Turgidson but not Gen. Ripper. It had long been an
open question what would have happened had the technocrat met the
power-hungry general. Thanks to Snowden, we now know.