August 7, 2012

EXECUTIVE SUMMARY
Yesterday, Microsoft published their CU-RTC-Web WebRTC API proposal as an
alternative to the existing W3C WebRTC API being implemented in Chrome
and Firefox. Microsoft's proposal is a "low-level API" proposal which
basically exposes a bunch of media- and transport-level primitives to
the JavaScript Web application, which is expected to stitch them
together into a complete calling system. By contrast to the current
"mid-level" API, the Microsoft API moves a lot of complexity from the
browser to the JavaScript but the authors argue that this makes it more
powerful and flexible. I don't find these arguments that convincing,
however: a lot of them seem fairly abstract and rhetorical and when
we get down to concrete use cases, the examples Microsoft gives seem
like things that could easily be done within the existing framework.
So, while it's clear that the Microsoft proposal is a lot more work
for the application developer; it's a lot less clear that it's
sufficiently more powerful to justify that additional complexity.

Microsoft's arguments for the superiority of this API fall into
three major categories:

It allows the development of applications that would otherwise
be difficult to develop with the existing W3C API.

It will be easier to make it interoperate with existing VoIP
endpoints.

Like any all-new design, this API has the significant advantage (which
the authors don't mention) of architectural cleanliness. The existing
API is a compromise between a number of different architectural
notions and like any hybrid proposals has points of ugliness where
those proposals come into contact with each other (especially in the
area of SDP.) However, when we actually look at functionality rather
than elegance, the advantages of an all-new design---not only one
which is largely not based on preexisting technologies but one which
involves discarding most of the existing work on WebRTC itself---start
to look fairly thin.

Looking at the three claims listed above: the first seems more
rhetorical than factual. It's certainly true that in the early
years of the Web designers strove to keep state out of the Web
browser, but that hasn't been the case with rich Web applications
for quite some time. To the contrary, many modern HTML5 technologies
(localstore, WebSockets, HSTS, WebGL) are about pushing state onto the
browser from the server.

The interoperability argument is similarly weakly supported.
Given that JSEP is based on existing VoIP technologies, it
seems likely that it is easier to make it interoperate with
existing endpoints since it's not first necessary to implement
those technologies (principally SDP) in JavaScript before
you can even try to interoperate. The idea here seems to be that
it will be easier to accomodate existing noncompliant endpoints
if you can adapt your Web application on the fly, but given the
significant entry barrier to interoperating at all, this
seems like an argument that needs rather more support than
MS has currently offered.

Finally, with regard to the question of the flexibility/JavaScript
complexity tradeoff, it's somewhat distressing that the specific
applications that Microsoft cites (baby monitoring, security cameras,
etc.) are so pedestrian and easily handled by JSEP. This isn't of
course to say that there aren't applications which we can't currently
envision which JSEP would handle badly, but it rather undercuts this
argument if the only examples you cite in support of a new design are
those which are easily handled by the old one.

None of this is to say that CU-RTC-Web wouldn't be better in some
respects than JSEP. Obviously, any design has tradeoffs and as I
said above, it's always appealing to throw all that annoying legacy
stuff away and start fresh. However, that also comes with a lot of
costs and before we consider that we really need to have a far
better picture of what benefits other than elegance starting
over would bring to the table.

BACKGROUND
More or less everyone agrees about the basic objectives of the WebRTC
effort: to bring real-time communications (i.e., audio, video, and
direct data) to browsers. Specifically, the idea is that Web
applications should be able to use these capabilities directly. This
sort of functionality was of course already available either via
generic plugins such as Flash or via specific plugins such as Google
Talk, but the idea here was to have a standardized API that was built
into browsers.

In spite of this agreement about objectives, from the beginning there
was debate about the style of API that was appropriate, and in particular
how much of the complexity should be in the browser and how much in
the JavaScript The initial proposals broke down into two main flavors:

High-level APIs — essentially a softphone in the browser. The Web
application would request the creation of a call (perhaps with some
settings as to what kinds of media it wanted) and then each browser
would emit standardized signaling messages which the Web application
would arrange to transit to the other browser. The original WHATWG
HTML5/PeerConnection spec was of this type.

Low-level APIs — an API which exposed a bunch of primitive
media and transport capabilities to the JavaScript. A browser that
implemented this sort of API couldn't really do much by itself.
Instead, you would need to write something like a softphone in
JavaScript, including implementing the media negotiation, all the
signaling state machinery, etc. Matthew Kaufman from Microsoft
was one of the primary proponents of this design.

After a lot of debate, the WG ultimately rejected both of these and
settled on a protocol called JavaScript Session Establishment Protocol
(JSEP), which is probably best described as a mid-level API. That
design, embodied in the current specifications
[
http://tools.ietf.org/html/draft-ietf-rtcweb-jsep-01http://dev.w3.org/2011/webrtc/editor/webrtc.html],
keeps the transport
establishment and media negotiation in the browser but moves a fair
amount of the session establishment state machine into the JavaScript.
While it doesn't standardize signaling, it also has a natural mapping
to a simple signaling protocol as well as to SIP and Jingle, the two
dominant standardized calling protocols. The idea is supposed to be
that it's simple to write a basic application (indeed, a large number
of such simple demonstration apps have been written) but that
it's also possible to exercise advanced features by manipulating
the various data structures emitted by the browser. This is obviously
something of a compromise between the first two classes of proposals.

The decision to follow this trajectory was made somewhere around six
months ago and at this point Google has a fairly mature JSEP
implementation available in Chrome Canary while Mozilla has a less
mature implementation which you could compile yourself but hasn't been
released in any public build.

Disclaimer: I have been heavily involved with both the IETF and
W3C working groups in this area and have contributed significant
chunks of code to both the Chrome and Firefox implementations. I am
also currently consulting for Mozilla on their implementation. However,
the comments here are my own and don't necessarily represent those of any other
organization.

WHAT IS MICROSOFT PROPOSING?
What Microsoft is proposing is effectively a straight low level API.

There are a lot of different API points, and I don't plan to discuss
the API in much detail, but it's helpful to talk about the API
some to get a flavor of what's required to use it.

RealTimeMediaStream -- each RealTimeMediaStream represents a single
flow of media (i.e., audio or video).

RealTimeMediaDescription -- a set of parameters for the
RealTimeMediaStream.

RealTimeTransport -- a transport channel which a RealTimeMediaStream
can run over.

RealTimePort -- a transport endpoint which can be paired with a
RealTimePort on the other side to form a RealTimeTransport.

In order to set up an audio, video, or audio-video session, then, the JS
has to do something like the following:

Acquire local media streams on each browser via the getUserMedia()
API, thus getting some set of MediaStreamTracks.

Create RealTimePorts on each browser for all the local network
addresses as well as for whatever media relays are available/
required.

Communicate the coordinates for the RealTimePorts from each
browser to the other.

On each browser, run ICE connectivity checks for all combinations
of remote and local RealTimePorts.

Select a subset of the working remote/local RealTimePort pairs
and establish RealTimeTransports based on those pairs.
(This might be one or might be more than one depending on
the number of media flows, level of multiplexing, and the
level of redundancy required).

Determine a common set of media capabilities and codecs between
each browser, select a specific set of media parameters, and
create matching RealTimeMediaDescriptions on each browser
based on those parameters.

Attach the remote RealTimeMediaStreams to some local display
method (such as an audio or video tag).

For comparison, in JSEP you would do something like:

Acquire local media streams on each browser via the getUserMedia()
API, thus getting some set of MediaStreamTracks.

Create a PeerConnection() and call AddStream() for each of the
local streams.

Create an offer on one brower send it to the other side,
create an answer on the other side and send it back to the
offering browser. In the simplest case, this just involves
making some API calls with no arguments and passing the
results to the other side.

The PeerConnection fires callbacks announcing remote media
streams which you attach to some local display method.

As should be clear, the CU-RTC-Web proposal requires significantly
more complex JavaScript, and in particular requires that JavaScript to
be a lot smarter about what it's doing. In a JSEP-style API, the Web
programmer can be pretty ignorant about things like codecs and
transport protocols, unless he wants to do something fancy, but with
CU-RTC-Web, he needs to understand a lot of stuff to make things work
at all. In some ways, this is a much better fit for the traditional
Web approach of having simple default behaviors which fit a lot of
cases but which can then be customized, albeit in ways that
are somewtimes a bit clunky.

Note that it's not like this complexity doesn't exist in JSEP,
it's just been pushed into the browser so that the user doesn't have
to see it. As discussed below, Microsoft's argument is that this
simplicity in the JavaScript comes at a price in terms of flexibility
and robustness, and that libraries will be developed (think jQuery)
to give the average Web programmer a simple experience, so that
they won't have to accept a lot of complexity themselves. However,
since those libraries don't exist, it seems kind of unclear how
well that's going to work.

ARGUMENTS FOR MICROSOFT'S PROPOSAL
Microsoft's proposal and the associated blog post makes a number of
major arguments for why it is a superior choice (the proposal just came
out today so there haven't really been any public arguments for why
it's worse). Combining the blog posts, you would get something like
this:

That the current specification violates "fit with key web tenets",
specifically that it's not stateless and that you can only make
changes when in specific states. Also, that it depends on
the SDP offer/answer model.

That it doesn't allow a "customizable response to changing network
quality".

That it doesn't support "real-world interoperability" with
existing equipment.

That it's too tied to specific media formats and codecs.

That JSEP requires a Web application to do some frankly inconvenient
stuff if it wants to do something that the API doesn't have explicit
support for.

That it's inflexible and/or brittle with respect to new applications
and in particular that it's difficult to implement some specific
"innovative" applications with JSEP.

Below we examine each of these arguments in turn.

FITTING WITH "WEB TENETS"
MS writes:

Honoring key Web tenets-The Web favors stateless interactions which
do not saddle either party of a data exchange with the
responsibility to remember what the other did or expects. Doing
otherwise is a recipe for extreme brittleness in implementations;
it also raises considerably the development cost which reduces the
reach of the standard itself.

This sounds rhetorically good, but I'm not sure how accurate it is.
First, the idea that the Web is "stateless" feels fairly anachronistic
in an era where more and more state is migrating from the server. To
pick two examples, WebSockets involves forming a fairly long-term stateful
two-way channel between the browser and the server, and localstore/localdb
allow the server to persist data semi-permanently on the browser.
Indeed, CU-RTC-Web requires forming a nontrivial amount of state on
the browser in the form of the RealTimePorts, which represent actual
resource reservations that cannot be reliably reconstructed if
(for instance) the page reloads. I think the idea here is supposed
to be that this is "soft state", in that it can be kept on the
server and just reimposed on the browser at refresh time, but as
the RealTimePorts example shows, it's not clear that this is the case.
Similar comments apply to the state of the audio and video devices
which are inherently controlled by the browser.

Moreover, it's never been true that neither party in the data exchange
was "saddled" with remembering what the other did; rather, it used
to be the case that most state sat on the server, and indeed, that's
where the CU-RTC-Web proposal keeps it. This is the first time we have
really built a Web-based peer-to-peer app. Pretty much all previous
applications have been client-server applications, so it's hard to
know what idioms are appropriate in a peer-to-peer case.

I'm a little puzzled by the argument about "development cost"; there
are two kinds of development cost here: that to browser implementors
and that to Web application programmers. The MS proposal puts
more of that cost on Web programmers whereas JSEP puts more of
the cost on browser implementors. One would ordinarily think that
as long as the standard wasn't too difficult for browser implementors
to develop at all, then pushing complexity away from Web programmers
would tend to increase the reach of the standard. One could of course
argue that this standard is too complicated for browser implementors
to implement at all, but the existing state of Google and Mozilla's
implementations would seem to belie that claim.

Finally, given that the original WHATWG draft had even more state in
the browser (as noted above, it was basically a high-level API), it's
a little odd to hear that Ian Hickson is out of touch with the "key
Web tenets".

Real time media applications have to run on networks with a wide
range of capabilities varying in terms of bandwidth, latency, and
noise. Likewise these characteristics can change while an
application is running. Developers should be able to control how the
user experience adapts to fluctuations in communication quality. For
example, when communication quality degrades, the developer may
prefer to favor the video channel, favor the audio channel, or
suspend the app until acceptable quality is restored. An effective
protocol and API will have to arm developers with the tools to
tailor such answers to the exact needs of the moment, while
minimizing the complexity of the resulting API surface.

It's certainly true that it's desirable to be able to respond to
changing network conditions, but it's a lot less clear that the
CU-RTC-Web API actually offers a useful response to such changes. In
general, the browser is going to know a lot more about the
bandwidth/quality tradeoff of a given codec is going to be than most
JavaScript applications will, and so it seems at least plausible that
you're going to do better with a small number of policies (audio is
more important than video, video is more important than audio, etc.)
than you would by having the JS try to make fine-grained decisions
about what it wants to do. It's worth noting that the actual
"customizable" policies that are proposed here seem pretty simple.
The idea seems to be not that you would impose policy on the browser
but rather that since you need to implement all the negotiation
logic anyway, you get to implement whatever policy you want.

Moroever, there's a real concern that this sort of adaptation will
have to happen in two places: as MS points out, this kind of network
variability is really common and so applications have to handle it.
Unless you want to force every JS calling application in the universe
to include adaptation logic, the browser will need some (potentially
configurable and/or disableable) logic. It's worth asking whether
whatever logic you would write in JS is really going to be enough
better to justify this design.

it shows no signs of offering real world interoperability with
existing VoIP phones, and mobile phones, from behind firewalls and
across routers and instead focuses on video communication between
web browsers under ideal conditions. It does not allow an
application to control how media is transmitted on the network.

I wish this argument had been elaborated more, since it seems like
CU-RTC-Web is less focused on interoperability, not more. In
particular, since JSEP is based on existing technologies such as SDP
and ICE, it's relatively easy to build Web applications which gateway
JSEP to SIP or Jingle signaling (indeed, relatively simple prototypes
of these already exist). By contrast, gatewaying CU-RTC-Web signaling
to either of these protocols would require developing an entire
SDP stack, which is precisely the piece that the MS guys are implicitly
arguing is expensive.

Based on Matthew Kaufman's mailing list postings, his concern seems to
be that there are existing endpoints which don't implement some of the
specifications required by WebRTC (principally ICE, which is used to
set up the network transport channels) correctly, and that it will be
easier to interoperate with them if your ICE implementation is written
in JavaScript and downloaded by the application rather than in C++ and
baked into the browser. This isn't a crazy theory, but I think there are
serious open questions about whether it is correct. The basic problem
is that it's actually quite hard to write a good ICE stack (though
easy to write a bad one). The browser vendors have the resources to
do a good job here, but it's less clear that random JS toolkits that
people download will actually do that good a job (especially if they
are simultaneously trying to compensate for broken legacy equipment).
The result of having everyone write their own ICE stack might be good
but it might also lead to a landscape where cross-Web application interop
is basically impossible (or where there are islands of noninteroperable
de facto standards based on popular toolkits or even popular toolkit
versions).

A lot of people's instincts here seem to be based on an environment
where updating the software on people's machines was hard but
updating one's Web site was easy. But for about half of the population
of browsers (Chrome and Firefox) do rapid auto-updates, so they
actually are generally fairly modern. By contrast, Web applications
often use downrev version of their JS libraries (I wish I had survey
data here but it's easy to see just by opening up a JS debugger
on you favorite sites). It's not at all clear that the
JS is easy to upgrade/native is hard dynamic holds up any more.

TOO TIED TO SPECIFIC MEDIA FORMATS AND CODECS
The proposal says:

A successful standard cannot be tied to individual codecs, data
formats or scenarios. They may soon be supplanted by newer versions,
which would make such a tightly coupled standard obsolete just as
quickly. The right approach is instead to to support multiple media
formats and to bring the bulk of the logic to the application layer,
enabling developers to innovate.

I can't make much sense of this at all. JSEP, like the standards that
it is based on, is agnostic about the media formats and codecs that
are used. There's certainly nothing in JSEP that requires you to use
VP8 for your video codec, Opus for your audio codec, or anything
else. Rather, two conformant JSEP implementations will converge on a
common subset of interoperable formats. This should happen
automatically without Web application intervention.

Arguably, in fact, CU-RTC-Web is *more* tied to a given codec because
the codec negotiation logic is implemented either on the server or in
the JavaScript. If a browser adds support for a new codec, the Web
application needs to detect that and somehow know how to prioritize it
against existing known codecs. By contrast, when the browser
manufacturer adds a new codec, he knows how it performs compared to
existing codecs and can adjust his negotiation algorithms accordingly.
Moreover, as discussed below, JSEP provides (somewhat clumsy)
mechanisms for the user to override the browser's default choices.
These mechanisms could probably be made better within the JSEP
architecture.

Based on Matthew Kaufman's interview with Janko Rogers
[http://gigaom.com/2012/08/06/microsoft-webrtc-w3c/],
it seems like
this may actually be about the proposal to have a mandatory to
implement video codec (the leading candidates seem to be H.264 or
VP8). Obviously, there have been a lot of arguments about whether
such a mandatory codec is required (the standard argument in favor
of it is that then you know that any two implementations have
at least one codec in common), but this isn't really a matter
of "tightly coupling" the codec to the standard. To the contrary,
if we mandated VP8 today and then next week decided to mandate
H.264 it would be a one-line change in the specification.
In any case, this doesn't seem like a structural argument about
JSEP versus CU-RTC-Web. Indeed, if IETF and W3C decided to ditch
JSEP and go with CU-RTC-Web, it seems likely that this wouldn't
affect the question of mandatory codecs at all.

THE INCONVENIENCE OF SDP EDITING
Probably the strongest point that the MS authors make is that if the
API doesn't explicitly support doing something, the situation is kind
of gross:

In particular, the negotiation model of the API relies on the SDP
offer/answer model, which forces applications to parse and generate
SDP in order to effect a change in browser behavior. An application
is forced to only perform certain changes when the browser is in
specific states, which further constrains options and increases
complexity. Furthermore, the set of permitted transformations to SDP
are constrained in non-obvious and undiscoverable ways, forcing
applications to resort to trial-and-error and/or browser-specific
code. All of this added complexity is an unnecessary burden on
applications with little or no benefit in return.

What this is about is that in JSEP you call CreateOffer() on a
PeerConnection in order to get an SDP offer. This doesn't actually
change the PeerConnection state to accomodate the new offer; instead,
you call SetLocalDescription() to install the offer. This gives
the Web application the opportunity to apply its own preferences
by editing the offer. For instance, it might delete a line containing
a codec that it didn't want to use. Obviously, this requires a lot
of knowledge of SDP in the application, which is irritating to say
the least, for the reasons in the quote above.

The major mitigating factor is that the W3C/IETF WG members intend to
allow most common manipulations to made through explicit settings
parameters, so that only really advanced applications need to know
anything about SDP at all. Obviously opinions vary about how good a
job they have done, and of course it's possible to write libraries
that would make this sort of manipulation easier. It's worth noting
that there has been some discussion of extending the W3C APIs to have
an explicit API for manipulating SDP objects rather than just editing
the string versions (perhaps by borrowing some of the primitives in
CU-RTC-Web). Such a change would make some things easier while not
really representing a fundamental change to the JSEP model. However,
it's not clear if there are enough SDP-editing tasks to make this
project worthwhile.

With that said, that in order to have CU-RTC-Web interoperate with
existing SIP endpoints at all you would need to know far more about
SDP than would be required to do most anticipated transformations in a
JSEP environment, so it's not like CU-RTC-Web frees you from SDP if
you care about interoperability with existing equipment.

SUPPORT FOR NEW/INNOVATIVE APPLICATIONS
Finally, the MSFT authors argue that CU-RTC-Web is more flexible
and/or less brittle than JSEP:

On the other hand, implementing innovative, real-world applications
like security consoles, audio streaming services or baby monitoring
through this API would be unwieldy, assuming it could be made to
work at all. A Web RTC standard must equip developers with the
ability to implement all scenarios, even those we haven't thought
of.

Obviously the last sentence is true, but the first sentence provides
scant support for the claim that CU-RTC-Web fulfills this requirement
better than JSEP. The particular applications cited here, namely audio
streaming, security consoles, and baby monitoring, seem not only
doable with JSEP, but straightforward. In particular, security
consoles and baby monitoring just look like one way audio and/or video
calls from some camera somewhere. This seems like a trivial subset of
the most basic JSEP functionality. Audio streaming is, if anything,
even easier. Audio streaming from servers already exists without any
WebRTC functionality at all, in the form of the audio tag, and audio
streaming from client to server can be achieved with the combination
of getUserMedia and WebSockets. Even if you decided that you wanted to
use UDP rather than WebSockets, audio streaming is just a one-way
audio call, so it's hard to see that this is a problem.

In e-mail
to the W3C WebRTC mailing list, Matthew Kaufman mentions the
use case of handling page reload:

An example would be recovery from call setup in the face of a
browser page reload... a case where the state of the browser must be
reinitialized, leading to edge cases where it becomes impossible with
JSEP for a developer to write Javascript that behaves properly in all
cases (because without an offer one cannot generate an answer, and
once an offer has been generated one must not generate another offer
until the first offer has been answered, but in either case there is
no longer sufficient information as to how to proceed).

This use case, often called "rehydration" has been studied a fair bit
and it's not entirely clear that there is a convenient solution with
JSEP. However, the problem isn't the offer/answer state, which is actually
easily handled, but rather the ICE and cryptographic state, which
are just as troublesome with CU-RTC-Web as they are with JSEP
[for a variety of technical reasons, you can't just reuse the
previous settings here.] So, while rehydration is an issue, it's
not clear that CU-RTC-Web makes matters any easier.

This argument, which should be the strongest of MS's arguments, feels
rather like the weakest. Given how much effort has already gone into
JSEP, both in terms of standards and implementation, if we're going to
replace it with something else that something else should do something
that JSEP can't, not just have a more attractive API. If MS can't
come up with any use cases that JSEP can't accomplish, and if in fact
the use cases they list are arguably more convenient with JSEP than
with CU-RTC-Web, then that seems like a fairly strong argument that we
should stick with JSEP, not one that we should replace it.

What I'd like to see Microsoft do here is describe some applications
that are really a lot easier with CU-RTC-Web than they are with
JSEP. Depending on the details, this might be a more or less convincing
argument, but without some examples, it's pretty hard to see
what considerations other than aesthetic would drive us towards
CU-RTC-Web.

Acknowledgement
Thanks to Cullen Jennings, Randell Jesup, Maire Reavy,
and
Tim Terriberry for early comments on this draft.

July 19, 2012

The other day I went to Home Depot to buy some party supplies
(incidentally, check out the party invitation
here
and the bonus Web site
here. It's some of my better work.).
One of the things
I wanted was a set of rope lights. I eventually picked up
three sets of 48' lights for $36.48. However, when I went to ring them up (you know
Home Depot is almost all self-check, right?) two rang up at
$62.48.

Looking closely, what happened is that the lights were packaged
in clear plastic clamshell packaging with two paper labels, one in
the front and one in the back. The paper label in the front showed
the 48' lights listed above. The back label (the one with the bar code)
showed 27' LED lights
(LEDs are cooler and cool == expensive).
It took a while for Home Depot to sort the problem out. Customer
service's initial reaction was that someone had returned a set
of the cheap lights but swapped the back labels so that they
could get a larger refund. But then they had some more
lights pulled off the shelf and they were mismatched as well,
so things started to look a bit confused. Eventually, they
just pulled the back pages out of the package (I guess to make
it hard for me to do a return) and sent me
on my way.

Here's the screwed up thing: nobody in this entire transaction was
sure which set of actual lights I had in my hand. The matching
package (the one which had rung up as expected) looked a lot like
the other two packages, but really these things look pretty similar
and after all we didn't know that any of the packages was right.
I offered to take them out and measure them for length, but nobody
seemed interested. So, at the time I walked out the door it seemed
quite possible that Home Depot had sold me $188 worth of lights for
$109. Of course, I assured them that I would bring them back
if they turned out to be the LED lights, but they had no way
of knowing I actually would (or of verifying if I did or not).
I actually tried to explain this several times, but nobody
seemed to care and eventually I gave up and left.

July 16, 2012

One of the most common responses to the Rizzo/Duong "BEAST" attack was
why
not just deploy TLS 1.1. See, for instance, this incredibly long
Bugzilla
bug about TLS 1.1 in
Network
Security Services (NSS), the SSL/TLS stack used by both Chrome and
Firefox. Unfortunately, while TLS 1.1 deployment is a good idea in and
of itself, it turns out not to be a very useful defense against this
particular attack. The problem isn't that servers don't support TLS 1.1
(though most still don't) but rather that the attacker can force
a client and server which both implement TLS 1.1 to negotiate TLS 1.0
(which is vulnerable).

Background: Protocol Negotiation and Downgrade Attacks
Say we are designing a new protocol to remotely control toasters,
the Toaster Control Protocol (TCP). TCP has a client
controller, a Toaster Control Equipment (TCE), and a
device responsible for toasting the bread, or Toaster Heading
Equipment (THE). We'll start by developing
TCP 1.0, but we expect that as time goes on we'll want to add
new features and eventually we'll want to deploy TCP 2.0. So,
for instance, maybe TCP 1.0 will only support toasters up to
two slots, but TCP 2.0 will add toaster ovens (as has been
widely observed, TCP 3.0 will allow you to send and receive
e-mail). We may also change the protocol encoding between
versions, so TCP 1.0 could have an ASCII representation whereas
TCP 2.0 added a binary encoding to save bits on the wire.
For obvious reasons, each version doesn't roll out all at
once, so I might want TCP 2.0 TCE to talk to my TCP 1.0 THE.
Obviously, that communication will be TCP 1.0, but if I
later add a TCP 2.0 toaster oven, I want that to communicate
with my TCE using TCP 2.0.

One traditional way to address this problem is to have some sort
of initial handshake in which each side advertises its capabilities
and they converge on a common version (typically the most recent common
version). So, for instance, my TCE would say "I speak 2.0"
but if the says "I only speak 1.0" then you end up
with 1.0. On the other hand if the TCE advertises 2.0 and the
THE speaks 2.0, then you end up with 2.0. As in:

Another common approach is to have individual feature negotiation
rather than version numbers. For instance, the TCE might say
"do you know how to make grilled cheese" and the THE would say
"yes" or "no". In that case, you can roll out individual features
rather than have a big version number jump.
Sometimes, systems will have both types of negotiation,
with the version number indicating a pile of features that
go together and also being able to negotiate individual
features. TLS is actually one such protocol, though the
features are called "extensions" (not an uncommon name for this). So you
get something like:

TCETCETHETHEHello, I do "toaster oven", "grilled cheese", "bagels"I can do "bagels"OK, let's toast some bagels

For non-security protocol, or rather ones where you
don't need to worry about attackers, or rather those where you don't
think you need to worry about attackers, this kind of approach mostly works
pretty well, though there's always the risk that someone will
screw up their side of the negotiation. With protocols
that are security relevant, however, things are a little
different. Let's say that in TCP 2.0 we decide to add
encryption. So the negotiation looks pretty much the same as before:

This is what's called a downgrade attack or a bid-down attack.
Even though in principle both sides could do version 2.0 (and an
encrypted channel), the attacker has forced them down to 1.0
(and a clear channel). Similar attacks can be mounted
against negotiation of cryptographic features. Consider,
for instance, the case where we are negotiating cryptographic
algorithms and each side supports both AES (a strong algorithm)
and DES (a weak algorithm), and the attacker forces both sides
down to DES:

TCETCEAttackerAttackerTHETHEI can do AES, DESI can do DES OK, let's do DESTraffic encrypted with DES

There are two basic defenses against this kind of downgrade attack.
The first is for sides to remember the other side's capabilities
and complain if those expectations are violated. So, for instance,
the first time that the TCE and THE communicate, the TCE
notices that the THE can do TCP 2.0 and from then on it refuses
to do TCP 1.0. Obviously, an attacker can downgrade you on the
first communication, but if you ever get a communication without
the attacker in the way, then you are immune from attack
thereafter (at least until both sides upgrade again). This
isn't a fantastic defense for a number of reasons, but it's
more or less the best you can do in the non-cryptographic setting.
In the setting where you are building a security protocol, however,
there's a better solution. Most association-oriented security
protocols (SSL/TLS, IPsec, etc.) have a handshake phase where
they do version/feature negotiation and key establishment, followed
by a data transfer phase where the actual communications happen.
In most such protocols, the handshake phase includes an integrity
check over the handshake messages. So, for instance, in SSL/TLS,
the Finished messages include a Message Authentication
Code (MAC) computed over the handshake and keyed with the
exchanged master secret:

Any tampering with any of the handshake values causes the handshake to
fail. This makes downgrade attacks more difficult: as long as the
weakest share key exchange protocol and the weakest shared MAC are
sufficiently strong (both of these things are true for TLS), then
pretty much everything else can be negotiated safely, including
features and version numbers.
[Technical note: SSL version 2 didn't have anti-downgrade defenses
and so there's some other anti-downgrade mechanisms in
SSL/TLS as well.]
This is why it's so important to establish a baseline level of
cryptographic security in the first level of the protocol, so
you can prevent downgrade attack to the nonsecure version.

Attacks on TLS 1.1 Negotiation
Based on what I said above, it would seem that rolling out TLS 1.1
securely would be no problem. And if everything was perfect, then
that would indeed be true. Unfortunately, everything is not perfect.
In order for version negotiation to work properly, a version X
implementation needs to accept offers of version Y > X
(although of course it will negotiate version X).
However, some nontrivial number of TLS servers and/or intermediaries
(on the order of 1%) will not complete the TLS handshake if TLS 1.1 is offered
(I don't mean they negotiate 1.0 but instead an error is observed).
There are similar problems (though less extensive with TLS extensions
and offering TLS 1.0 as opposed to SSLv3).

No browser wants to break on 1% of the sites in the world, so
instead when some browser clients (at least Chrome and Firefox)
encounter a server which throws some error with a modern
ClientHello, they seamlessly fall back to older
versions. I.e., something like this (the exact details of the fallback order depend on the browser):

It seems very likely that browsers will continue this behavior for
negotiating TLS 1.1 and/or 1.2.
Here's the problem: this fallback happens outside of the ordinary
TLS version negotiation machinery, so it's not protected by any of
the cryptographic checks designed to prevent downgrade attack.
Any attacker can forge a TCP FIN or RST, thus forcing clients
back to SSLv3, TLS 1.0, or whatever the lowest version they support
is. The attack looks like this:

The underlying problem here is that the various extension mechanisms
for TLS weren't completely tested (or in some cases, specified; extensions in particular weren't
part of SSLv3), and so the browsers have to fall back on ad hoc
feature/version negotiation mechanisms. Unfortunately, those mechanisms,
unlike the official mechanisms, aren't secure against downgrade
attack.1

There is, however, one SSL/TLS negotiation mechanism that
is extremely reliable: cipher suite negotiation. In TLS,
each cipher suite is rendered as a 16-bit number: the client
offers a pile of cipher suites and the server selects the
one it likes. Because new cipher suites are introduced
fairly regularly, and ignoring unknown suites is so easy,
this mechanism has gotten a lot of testing, and it works
pretty well, even through nearly all intermediaries. The result
is that if you really need to have downgrade attack resistance,
you need to put something in the cipher suites field. This is
the idea behind the Signaling Cipher Suite Value used
by the TLS Renegotiation Indication Extension [RFC 5746].
Recently, there have been
severalproposals that are intended
to indicate TLS 1.1 and/or extension support in the cipher suite
field. The idea here is to allow detection of version rollback
attacks. Once you can detect version rollback, then you can
use the ordinary handshake anti-tampering mechanisms to detect
removal of extensions.2

The bad news about these mechanisms is that they require upgrading
the server to detect the new cipher suite. On the other hand, they
can be incrementally deployed.
(Yngve Pettersen has a client-side only proposal which leverages
the RI SCSV to a similar end, but relies on the assumption that
any server which does RI is modern enough to handle extensions
properly).

What's the lesson here? Minimally, this kind of negotiation facility
needs to be clearly specified from the start and then extensively
tested (and hopefully exercised as soon as possible). Once you've
got a significant installed base of noncompliant implementations,
it gets very difficult to distinguish a noncompliant peer and
a downgrade attack and thus problematic to refuse to connect to
apparently noncompliant peers.

1 Note that this isn't always a big deal. Consider, for
instance, the TLS Server Name Indication message, which allows a server
to host multiple HTTPS sites on the same IP. The attacker could force
an SNI downgrade, but this will generally just cause a connection
failure, which they could have easily have done by forging an
RST for every connection. Downgrade attacks are mostly an issue
when the attacker is forcing you to a weaker security posture, rather
than just breaking stuff.

April 9, 2012

The IETF RTCWEB WG has been operating on a fast track with
an interim meeting between each IETF meeting.
Since we needed to schedule a lot of meetings,
thought it might be instructive to try to analyze a bunch
of different locations to figure out the best strategy. Here's
a lightly edited version of my post to the RTCWEB WG trying to
address this issue.

Note that I'm not trying to make any claims about what the best set of
venues is. It's obviously easy to figure out any statistic we want
about each proposed venue, but how you map that data to "best" is a
much more difficult problem. The space is full of Pareto optima,
and even if we ignore the troubling philosophical question of
interpersonal utility comparisons, there's some tradeoff
between minimal total travel time and a "fair" distribution of travel
times (or at least an even distribution).

METHODOLOGY
The data below is derived by treating both people and venues as
airport locations and using travel time as our primary instrument.

For each responder for the current Doodle poll, assign a home
airport based on their draft publication history. We're missing a
few people but basically it should be pretty complete. Since
these people responded before the venue is known, it's at
least somewhat unbiased.

Compute the shortest advertised flight between each home airport
and the locations for each venue by looking at the shortest
advertised Kayak flights around one of the proposed interim
dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares".
[Thanks to Martin Thomson or helping me gather these.]

This lets us compute statistics for any venue and/or combination
of venues, based on the candidate attendee list.

The three proposed venues:

San Francisco (SFO)

Boston (BOS)

Stockholm (ARN)

Three hubs not too distant from the proposed venues:

London (LHR)

Frankfurt (FRA)

New York (NYC) (treating all NYC airports as the same location)

Also, Calgary (YYC), since the other two chair locations (BOS and SFO)
were already proposed as venues, and I didn't want Cullen to feel
left out.

RESULTS
Here are the results for each of the above venues, measured in total
hours of travel (i.e., round trip).

XXX/YYY/ZZZ is a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean
and median are intended to be some sort of aggregate measure of travel
time. I don't have any way to measure "fairness", but SD is intended
as some metric of the variation in travel time between attendees.

This was a quick hack, so there may be errors here, but nobody has pointed
out any yet.

OBSERVATIONS
Obviously, it's hard to know what the optimal solution is without
some model for optimality, but we can still make some observations
based on this data:

If we're just concerned with minimizing total travel time, then we
would always in New York, since it has both the shortest mean travel
time and the shortest median travel time, but as I said above, this
arguably isn't fair to people who live either in Europe or California,
since they always have to travel.

Combining West Coast, East Coast, and European venues has
comparable (or at least not too much worse) mean/median values than
NYC with much lower SDs. So, arguably that kind of mix is more fair.

There's a pretty substantial difference between hub and non-hub
venues. In particular, LHR has a median travel time 7 hours less than
ARN, and the SFO/NYC/LHR combination has a median/mean travel time
about 2 hours less than SFO/BOS/ARN (primarily accounted for by the
LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs
here, but you'd probably get similar results if, for instance, you
used AMS instead of LHR.]

Obviously, your mileage may vary based on your location and feelings
about what's fair, but based on this data, it looks to me like a
three-way rotation between West Coast, East Coast, and European hubs
offers a good compromise between minimum cost and a flat distribution
of travel times.

March 3, 2012

Something annoying but also instructive happened during my build of
Chromium today. Everything started when I checked out a clean version
and went to do a build, only to be greeted with the following
exciting error:

Luckily, I've run into this problem before so I know what the problem is.
The script third_party/WebKit/Source/WebCore/WebCore.gyp/mac/adjust_visibility.sh,
which does some library mangling, uses file to determine
what kind of library it's dealing with. Unfortunately, it invokes
file with an unqualified name, and since MacPorts
wants to put itself at the beginning of PATH this
means that you get the file implementation from MacPorts which
has a slightly different output than the system file. The result
is that adjust_visibility.sh decides that you
have a thin version of libWebKit...a and tries
to run ar on it. When ar fails, so does
the build.

The fix here is to move MacPorts below /usr/bin in your
path. I'd already done this—or so I thought— but it
turned out that MacPorts had inserted itself twice in .cshrc
so I had to edit .cshrc and then run source .cshrc.
I did this, and after correcting a typo things looked good and I
and went to rerun the build, only to be greeted with:

I know what you're thinking here—or at least what I thought—someone
forgot to #include <set> and for some reason the automated
builds didn't catch it, perhaps due to some conditional compilation problem
getting triggered on Lion. But checking the source quite clearly showed
that set was being included. Moreover, other STL containers
like vector work fine. Changing from clang to GCC didn't
help here, so eventually I reverted to gcc -E. For those of
you who don't know, this runs the preprocessor but not the compiler and
so is really useful for diagnosing this kind of include error. Here's
the relevant portion of the result:

# 18 "./base/file_util.h" 2
# 1 "./set" 1
# 24 "./base/file_util.h" 2

It's a little hard to read, but if you know what to look for, it's telling you
that instead of including set from /Developer, where
the system include files live, the compiler is getting it from the
local directory. Now, you might ask what the heck a file named
set is doing in the local directory, especially as
when I looked it was totally empty. Naturally, it was
my fault, but it took a minute to realize what. Remember I said that
I had to correct a typo in .cshrc but now what the typo
was. Well, the problem was that I had written:

>set OSVER=`uname -r`

Instead of

set OSVER=`uname -r`

Of course, when I ran this it create a file called set in
the current directory and since the compile flags included the
current directory in the include path, the compiler duly included
it instead of the system include file. And since the file was
empty, there wasn't any definition of std::set
and we got a compile error. Time wasted by this error: 11 minutes
(not including writing this up).

February 25, 2012

As long-time EG readers will know, I've complained in the past that my
Prius has a feeble starter/electronics battery which is easy to
run down even by leaving the interior lights on. This despite the fact that the
Prius has a huge battery running the hybrid system to draw on. But I
certainly didn't want this. Michael DeGusta reports
that if you leave your Tesla parked for a long time (like months),
then the car bleeds enough power off of the battery to run the
auxilary vehicle systems [parasitic load]
to drain it down into deep discharge
(and hance damage to the battery) territory:

A Tesla Roadster that is simply parked without being plugged in will
eventually become a "brick". The parasitic load from the car's
always-on subsystems continually drains the battery and if the
battery's charge is ever totally depleted, it is essentially
destroyed. Complete discharge can happen even when the car is plugged
in if it isn't receiving sufficient current to charge, which can be
caused by something as simple as using an extension cord. After
battery death, the car is completely inoperable. At least in the case
of the Tesla Roadster, it's not even possible to enable tow mode,
meaning the wheels will not turn and the vehicle cannot be pushed nor
transported to a repair facility by traditional means.

The amount of time it takes an unplugged Tesla to die varies. Tesla's
Roadster Owners Manual [Full Zipped PDF] states that the battery
should take approximately 11 weeks of inactivity to completely
discharge [Page 5-2, Column 3: PDF]. However, that is from a full 100%
charge. If the car has been driven first, say to be parked at an
airport for a long trip, that time can be substantially reduced. If
the car is driven to nearly its maximum range and then left unplugged,
it could potentially "brick" in about one week.1 Many other scenarios
are possible: for example, the car becomes unplugged by accident, or
is unwittingly plugged into an extension cord that is defective or too
long.

When a Tesla battery does reach total discharge, it cannot be
recovered and must be entirely replaced. Unlike a normal car battery,
the best-case replacement cost of the Tesla battery is currently at
least $32,000, not including labor and taxes that can add thousands
more to the cost.

There's been a lot of controversy about this report
(see, for instance, this defense), but Tesla's response seems to by consistent
with DeGusta's basic argument, as does the letter
that Jalopnik reproduces above:

All automobiles require some level of owner care. For example,
combustion vehicles require regular oil changes or the engine will be
destroyed. Electric vehicles should be plugged in and charging when
not in use for maximum performance. All batteries are subject to
damage if the charge is kept at zero for long periods of
time. However, Tesla avoids this problem in virtually all instances
with numerous counter-measures. Tesla batteries can remain unplugged
for weeks (even months), without reaching zero state of charge. Owners
of Roadster 2.0 and all subsequent Tesla products can request that
their vehicle alert Tesla if SOC falls to a low level. All Tesla
vehicles emit various visual and audible warnings if the battery pack
falls below 5 percent SOC. Tesla provides extensive maintenance
recommendations as part of the customer experience.

At present, then, the agreed upon facts seem to be that:

If you leave the Tesla's batteries at zero charge, battery
damage occurs.

If you leave a Tesla unplugged for long enough, even
with a charged battery, parasitic load from the vehicle
systems will eventually consume the battery's charge,
leaving you in state (1) above. [Note that this appears
to exceed the Lithium-Ion self-discharge rate, so it
likely is parasitic load.]

The controversy really seems to be about who's fault
this is, namely whether the customer should have known better,
whether Tesla notified them correctly, etc. I don't have
a Tesla so I don't care about that. I'm much more interested
in the engineering question of what's going on and what,
if anything, can be done about it.

The parasitic load thing isn't totally unfamiliar territory, of
course. Any modern vehicle has electronics and those need
power, which they get from the battery. Some do a better
job than others.
My BMW R1200GS motorcycle, for instance, has this
problem and the manual explicitly tells you to connect it to
a trickle charger (an expensive BMW model, of course, though
you can use a standard one if you're willing to do a tiny
bit of work) if you're not going to drive it for a while,
and I duly plug it into the wall whenever I get home.
If you don't do that, however, the worst you're going to be
out is new lead-acid battery, which depending on what
vehicle you have, leaves you out something like
$50-$200, not $40,000.

However, the level of load we're talking about here
seems awful high. Remember that we're talking about a
battery capable of powering your car for 200 miles or
so on a single charge (53 kWh). In order to deplete
the battery in 11 weeks (~2000 hrs) you would need
continuous battery consumption of around 30 W.
For comparison, a Macbook Air has a 50Wh battery
and gets something like 5 hours on a charge, so it's
like the Tesla is running 5 Airs at once 24x7.
It's natural to ask where all that power is
going, since you don't need anywhere near that
much to keep a vehicle on standby. One likely source seems
to be the battery cooling system, of which Wikipedia
says
"Coolant is pumped continuously through the ESS both when the car is running and when the car is turned off if the pack retains more than a 90% charge. The coolant pump draws 146 watts."
[Original reference and long discussion here.
Note that this post is due to Martin Eberhard, one of the Tesla
Founders but apparently no longer with the company at the time he wrote it. Thanks
Wayback Machine for preserving this!].

Obviously, if you have a load this high, then you're going
to deplete the battery. The question then becomes whether
there is some way of avoiding permanent battery damage as
the depletion gets to dangerous levels. The natural
thing to do is install some sort of cutoff that turns
off all power drain once you get close to that level.
This may end up blowing away a bunch of the car's
configuration (though really, it's not that hard to
store that stuff in flash memory, even though
historically manufacturers have tended not to), but
surely it's cheaper to reboot your car than replace
the entire battery pack. However, if the power is
going to the cooling system and the cooling system
is doing something important, like keeping the
battery from being damaged by excessive heat, then
this may not help.

Oh, one more thing. DeGusta claims that Tesla has the capability
to remotely monitor the battery and locate the car, and has
sent people out to fix it:

In at least one case, Tesla went even further. The Tesla service
manager admitted that, unable to contact an owner by phone, Tesla
remotely activated a dying vehicle's GPS to determine its location and
then dispatched Tesla staff to go there. It is not clear if Tesla had
obtained this owner's consent to allow this tracking5, or if the owner
is even aware that his vehicle had been tracked. Further, the service
manager acknowledged that this use of tracking was not something they
generally tell customers about.

February 11, 2012

Cryptography is great, but it's not so great if you get arrested and
forced
to give up your cryptographic keys. Obviously, you could claim that you've forgotten it
(remember that you need a really long key to thwart exhaustive
search attacks, so this isn't entirely implausible.) However, since
you also need to regularly be able to decrypt your data,
this means you need to be able remember your password, so it's
not entirely plausible either, which means that you might end up
sitting in jail for a long time due to a contempt citation.
This general problem has been floating around the cryptographic
community for a long term, where it's usually referred to as
"rubber hose cryptanalysis", with the idea being that the attacker
will torture you (i.e., beat you with a rubber hose) until you
give up the key. This xkcd comic
sums up the problem. Being technical people, there's been a lot
of work on technical solutions, none of which are really
fantastic. (see the Wikipedia
deniable encryption
page for one summary).

Threat model
As usual, it's important to think about the threat model, which in
this case is more complicated than it initially seems. We assume
that you have some encrypted data and that the attacker has
a copy of that data and of the encryption software you have
used. All they lack is the key. The attacker insists you
hand over the key and has some mechanism for punishing you
if you don't comply. Moreover, we need to assume that the attacker
isn't a sadist, so as long as there's no point in punishing you
further they won't. It's this last point that is the key to
all the technical approaches I know of, namely convincing the
attacker that they are unlikely to learn anything more by
punishing you further, so they might as well stop. Of course,
how true that assumption is probably depends on the precise
nature of the proceedings and how much it costs the attacker
to keep inflicting punishment on you. If you're being waterboarded
in Guantanamo, the cost is probably pretty low, so you probably
need to be pretty convincing.

Technical Approaches
Roughly speaking, there seem to be two strategies for dealing with
the threat of being legally obliged to give up your cryptographic
keys:

Apparent Compliance/Deniable Encryption.

Verifiable Destruction

Apparent Compliance/Deniable Encryption
The idea behind an apparent compliance strategy is that you
pretend to give up your encryption key, but instead you give
up another key that decrypts the message to an innocuous
ciphertext. More generally, you want a cryptographic
scheme which produces a given ciphertext C which maps onto a series of
plaintexts M_1, M_2, ... M_n via a set of keys
K_1, K_2, ... K_n. Assume for the moment that
only M_n is and M_1, ... M_n-1 are either fake
or real (but convincing) non-sensitive data. So, when you
are captured, you reveal K_1 and claim that you've
decrypted the data. If really pressed, you reveal
K_2 and so on.

The reason that this is supposed to work is that the
attacker is assumed to not know n. However,
since they have a copy of your software, they presumably
know that it's multilevel capable, so they know that
there may be more than one key. They just don't know
if you've given them the last key. All the difficult
cryptographic problems are about avoiding revealing
n. There are fancy cryptographic ways to do this
(the original paper on this is by
Canetti,
Dwork, Naor, and Ostrovsky), but
consider one simple construction. Take each message
M_i and encrypt it with K_i to form
C_i and then
concatenate all the results to form C. The
decryption procedure given a single key is to decrypt
each of the sub-ciphertexts in turn and discard any
which don't decrypt correctly (assume there is some
simple integrity check.) Obviously, if you have
a scheme this trivial, then it's easy for an attacker
to see how many keys there are just by insisting you
provide keys for all the data, so you also pad C
with a bunch of random-appearing data which you really
can't decrypt at all, which in theory creates plausible
deniability. This is approximately what
TrueCrypt
does):

Until decrypted, a TrueCrypt partition/device appears to consist of
nothing more than random data (it does not contain any kind of
"signature"). Therefore, it should be impossible to prove that a
partition or a device is a TrueCrypt volume or that it has been
encrypted (provided that the security requirements and precautions
listed in the chapter Security Requirements and Precautions are
followed). A possible plausible explanation for the existence of a
partition/device containing solely random data is that you have wiped
(securely erased) the content of the partition/device using one of the
tools that erase data by overwriting it with random data (in fact,
TrueCrypt can be used to securely erase a partition/device too, by
creating an empty encrypted partition/device-hosted volume within it).

How well this works goes back to your threat model. The attacker
knows there is some chance that you haven't revealed
all the keys and maybe if they punish you further you will give them
up. So, whether you continue to get punished depends on their
cost/benefit calculations, which may be fairly unfavorable to
you. The problem is worse yet if the attacker has any way
of determining what correct data looks like. For instance,
in one of the early US court cases on this,
In re Boucher,
customs agents
had seen (or at least claimed to had seen) child pornography
on the defendant's hard drive and so would presumably have known
a valid decryption from an invalid one. Basically, in any setting
where the attacker has a good idea of what they are looking for
and/or can check the correctness of what you give them, a deniable
encryption scheme doesn't work very well, since the
whole scheme relies on uncertainty about when you have actually
given up the last key.

Verifiable Destruction
An alternative approach that doesn't rely on this kind of ambiguity is
to be genuinely unable to encrypt the data and to have some way
of demonstrating this to the attacker. Hopefully, a rational
attacker won't continue to punish you once you've demonstrated that you
cannot comply. It's demonstrating part that's the real problem here.
Kahn and Schelling famously
sum up the problem of how to win at "chicken":

Some teenagers utilize interesting tactics in playing "chicken."
The "skillful" player may get into the car quite drunk, throwing
whiskey bottles out the window to make it clear to everybody
just how drunk he is. He wears dark glasses so that it is
obvious that he cannot see much, if anything. As soon as the
car reaches high speed, he takes the steering wheel and throws
it out the window. If his opponent is watching, he has won.
If his opponent is not watching, he has a problem;

Of course, as Allan Schiffman once pointed out to me, the really
skillful player keeps a spare steering wheel in his car and throws
that out the window. And our problem is similar: demonstrating that
you have thrown out the data and/or key and you don't have a spare
lying around somewhere.

The technical problem then becomes constructing a system that
actually works. There are a huge variety of potential technical
options here, but at a high-level, it seems like solutions
fall into two broad classes, active and passive. In an active
scheme, you actively destroy the key and/or the data.
For instance, you could have the key written on a piece of paper
which you eat, or there is a thermite charge on your computer
which melts it to slag when you press a button. In a passive
system, by contrast, no explicit action is required by you,
but you have some sort of deadman switch which causes the
key/data to be destroyed if you're captured. So, you might
store the data in a system like Vanish (although there are real questions about the security of Vanish per se),
or you have the key stored offsite with some provider who promises
to delete the key if you are arrested or if you don't check in every
so often.

I'm skeptical of how well active schemes can be made to work:
once it becomes widely known how any given commercial scheme works,
attackers will take steps to circumvent it. For instance,
if there is some button you press to destroy your data,
they might taser you and ask questions later to avoid you
pressing it. Maybe someone can convince me otherwise, but this
leaves us mostly with passive schemes (or semi-passive schemes
as discussed in a bit.) Consider the following strawman scheme:

Your data is encrypted in the usual way, but part of the
encryption key is stored offsite in some location inaccessible
to the attacker (potentially outside their legal jurisdiction
if we're talking about a nation-state type attacker).
The encryption key is stored in a hardware security module,
and if the key storage provider doesn't hear from you
(and you have to prove possession of some key) every
week (or two weeks or whatever), they zeroize the HSM, thus
destroying your key. It's obviously easy to build a system
like this where the encryption software automatically contacts
the key storage provider, proves possession, and thus resets
their deadman timer, so as long as you use your files every
week or so, you're fine.

So, if you're captured, you just need to hold out until the
deadman timer expires and then the data really isn't recoverable
by you or anyone else. Of course, "not recoverable" isn't the
same as "provably not recoverable", since you could have kept
a backup copy of the keys somewhere—though the software could
be designed in a way that this was inconvenient, thus giving
some credibility to the argument that you did not. Moreover,
this design is premised on the assumption that there is actually
somewhere that you could store your secret data that the attacker
couldn't get it from. This may be reasonable if the attacker
is the local police, but perhaps less so if the attacker
is the US government. And of course any deadman system is
hugely brittle: if you forget your key or just don't refresh
for a while, your data is gone, which might be somewhat
inconvenient.

One thing that people often suggest is to have some sort of
limited-try scheme. The idea here is that the encryption
system automatically erases the data (and/or a master key)
if the wrong password/key is entered enough times. So, if you
can just convincingly lie N times and get the attacker
to try those keys, then the data is gone. Alternately, you
could have a "coercion" key which deletes all the data.
It's clear that you can't build anything like this in a software-only
system: the attacker will just image the underlying encrypted
data and write their own decryption software which doesn't
have the destructive feature. You can, however, build such
a system using hardware security modules (assume for now that
the HSM can't be broken directly.)
This is sort of a semi-passive scheme in that you are intentionally
destroying the data, but the destruction is produced by the
attacker keying in the alleged encryption key.

The big drawback with any verifiable destruction system is that
it leaves evidence that you could have complied but didn't;
in fact, that's the whole point of the system. But this means
that the attacker's countermove is to credibly commit to punishing
you for noncompliance after the fact. I don't think this question
has ever been faced for crypto, but it has been faced in other
evidence-gathering contexts. Consider, for instance, the case
of driving under the influence: California requires you to
take a breathalyzer or blood test as a condition of driving
[*],
and refusal carries penalties comparable to those for being
convicted of DUI. One could imagine a more general legal regime
in which actively or passively allowing your encrypted data
to be destroyed once you have been arrested was itself illegal,
and with a penalty that was large enough that it would almost
never be worth refusing to comply
(obviously the situation would be different in extra-legal
settings, but the general idea seems transferable.) I'll defer
to any lawyers reading this about how practical such a law
would actually be.

Bottom Line
Obviously, neither of these classes of solution seems entirely
satisfactory from the perspective of someone who is trying to keep
their data secret. On the other hand, it's not clear that this is
really a problem that admits of a good technical solution.

January 23, 2012

[16] git checkout f4a56
Note: checking out 'f4a56'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b new_branch_name
HEAD is now at f4a560b... Foo

As you may have gathered from this long warning,
you most likely don't want to be in a detached
head setting, you probably just meant to create a branch or wanted
to rollback a commit but typed the wrong thing. Which is why
there are lotsofpages about
what this means and how to get yourself out. My contribution to this literature can be
found below the fold.

January 22, 2012

On my way to Red Rock
today to do some work, I looked in my wallet to see if I had enough
money to afford my hot chocolate (paying for a $3.50 drink with a
credit card is a pretty lame move). Here's what I found:

After some sorting, it comes out as follows...

Currency

Count

Value (nominal)

Value (USD)

USD

3

3

3

CAD

7

100

98.55

CZK

2

2100

106.40

GBP

1

10

15.55

EUR

1

20

25.79

INR

1

100

1.99

RUB

9

1570

49.97

Total

24

-

301.25

In other words, out of 24 total pieces of paper valued at over $300,
I had three spendable pieces of paper valued at $3. Oh, and a couple
of United beverage vouchers which expire in 9 days.
I ended up going to the ATM.

January 21, 2012

You've of course heard by now that much of the Internet community
thinks that SOPA
and PIPA are bad, which is why on January 16, Wikipedia shut
itself down, Google had a black bar over their logo, etc. This
opinion is shared by much of the Internet technical
community, and in particular much has been made of the argument made
by Crocker et al. that
DNSSEC and PIPA are incompatible. A number of the authors of the statement linked above
are friends of mine, and I agree with much of what they write in
it, but I don't find this particular line of argument that
convincing.

Background
As background, DNS has two kinds of resolvers:

Authoritative resolvers which host the records for a given
domain.

Recursive resolvers which are used by end-users for
name mapping. Typically they also serve as a cache.

A typical configuration is for end-user machines to use
DHCP to
get their network configuration data, including IP address
and the DNS recursive resolvers to use. Whenever your
machine joins a new network, it gets whatever resolver that
network is configured for, which is frequently whatever
resolver is provided by your ISP. One of the requirements
of some iterations of PIPA and SOPA has been that recursive
resolvers would have to block resolution of domains
designated as bad. Here's the relevant text from PIPA:

(i) IN GENERAL- An operator of a nonauthoritative domain name system server shall take the least burdensome technically feasible and reasonable measures designed to prevent the domain name described in the order from resolving to that domain name's Internet protocol address, except that--

(I) such operator shall not be required--

(aa) other than as directed under this subparagraph, to modify its network, software, systems, or facilities;
(bb) to take any measures with respect to domain name lookups not performed by its own domain name server or domain name system servers located outside the United States; or
(cc) to continue to prevent access to a domain name to which access has been effectively disable by other means; and
...

(ii) TEXT OF NOTICE.-The Attorney General shall prescribe the text of the notice displayed to users or customers of an operator taking an action pursuant to this subparagraph. Such text shall specify that the action is being taken pursuant to a court order obtained by the Attorney General.

This text has been widely interpreted as requiring operators of recursive
resolvers to do one of two things:

Simply cause the name resolution operation to fail.

Redirect the name resolution to the notice specified in (ii).

The question then becomes how one might implement these.

Technical Implementation Mechanisms
Obviously if you can redirect the name, you can cause the
resolution to fail by returning a bogus address, so let's look
at the redirection case first. Crocker et al. argue that DNSSEC is designed
to secure DNS data end-to-end to the user's computer. Thus, any
element in the middle which modifies the DNS records to redirect
traffic to a specific location will break the signature.
Technically, this is absolutely correct. However, it is mitigated
by two considerations.

First, the vast majority of client software doesn't do DNSSEC
resolution. Instead, if you're resolving some DNSSEC-signed
name and the signature is being validated at all it's most likely
being validated by some DNSSEC-aware recursive resolver,
like the ones Comcast has
recently deployed.
Such a resolver can easily modify whatever results it is
returning and that change will be undetectable to the vast
majority of client software (i.e., to any non-DNSSEC software).1. So, at present, a rewriting requirement looks
pretty plausible.

Crocker et al. would no doubt tell you that this is a transitional
stage and that eventually we'll have end-to-end DNSSEC, so
it's a mistake to legislate new requirements
that are incompatible with that. If a lot of
endpoints start doing DNSSEC validation, then ISPs can't
rewrite undetectably. They can still make names fail to
resolve, though, via a variety of mechanisms. About this,
Crocker et al. write:

Even DNS filtering that did not contemplate redirection would pose
security challenges. The only possible DNSSEC-compliant response to a
query for a domain that has been ordered to be filtered is for the
lookup to fail. It cannot provide a false response pointing to another
resource or indicate that the domain does not exist. From an
operational standpoint, a resolution failure from a nameserver subject
to a court order and from a hacked nameserver would be
indistinguishable. Users running secure applications have a need to
distinguish between policy-based failures and failures caused, for
example, by the presence of an attack or a hostile network, or else
downgrade attacks would likely be prolific.[12]

..

12. If two or more levels of security exist in a system, an attacker
will have the ability to force a "downgrade" move from a more secure
system function or capability to a less secure function by making it
appear as though some party in the transaction doesn't support the
higher level of security. Forcing failure of DNSSEC requests is one
way to effect this exploit, if the attacked system will then accept
forged insecure DNS responses. To prevent downgrade attempts, systems
must be able to distinguish between legitimate failure and malicious
failure.

I sort of agree with the first part of this, but I don't really agree
with the footnote. Much of the problem is that it's generally easy
for network-based attackers to generate situations that simulate
legitimate errors and/or misconfiguration. Cryptographic authentication
actually makes this worse, since there are so many ways to
screw up cryptographic protocols.
Consider the case where the attacker overwrites
the response with a random signature. Naturally the signature
is unverifiable, in which case the resolver's only response is
to reject the records, as prescribed by the DNSSEC standards.
At this point you have effectively blocked resolution of the
name. It's true that the resolver knows that something is wrong
(though it can't distinguish between attack and misconfiguration),
but so what? DNSSEC isn't designed to allow name resolution in the
face of DoS attack by in-band active attackers. Recursive
resolvers aren't precisely in-band, of course, but
the ISP as a whole is in-band, which
is one reason people have talked about ISP-level
DNS filtering for all traffic, not just filtering at recursive
resolvers.

Note that I'm not trying to say here that I think that SOPA and PIPA
are good ideas, or that there aren't plenty of techniques for people
to use to evade them. I just don't think that it's really the case
that you can't simultaneously have DNSSEC and network-based DNS
filtering.

1. Technical note: As I understand it, if
you're a client resolver who wants to validate signatures
itself needs to send the DO flag (to get the recursive
resolver to return the DNSSEC records) and the CD flag
(to suppress validation by the recursive resolver).
This means that the recursive resolver can tell when its
safe to rewrite the response without being detected.
If DO isn't set, then the client won't be checking signatures.
If CD isn't set, then the recursive resolver can claim
that the name was unvalidatable and generate whatever error
it would have generated in that case (Comcast's deployment
seems to generate SERVFAIL for at least some types of misconfiguration.)