Recently in Networking Category

August 7, 2012

EXECUTIVE SUMMARY
Yesterday, Microsoft published their CU-RTC-Web WebRTC API proposal as an
alternative to the existing W3C WebRTC API being implemented in Chrome
and Firefox. Microsoft's proposal is a "low-level API" proposal which
basically exposes a bunch of media- and transport-level primitives to
the JavaScript Web application, which is expected to stitch them
together into a complete calling system. By contrast to the current
"mid-level" API, the Microsoft API moves a lot of complexity from the
browser to the JavaScript but the authors argue that this makes it more
powerful and flexible. I don't find these arguments that convincing,
however: a lot of them seem fairly abstract and rhetorical and when
we get down to concrete use cases, the examples Microsoft gives seem
like things that could easily be done within the existing framework.
So, while it's clear that the Microsoft proposal is a lot more work
for the application developer; it's a lot less clear that it's
sufficiently more powerful to justify that additional complexity.

Microsoft's arguments for the superiority of this API fall into
three major categories:

It allows the development of applications that would otherwise
be difficult to develop with the existing W3C API.

It will be easier to make it interoperate with existing VoIP
endpoints.

Like any all-new design, this API has the significant advantage (which
the authors don't mention) of architectural cleanliness. The existing
API is a compromise between a number of different architectural
notions and like any hybrid proposals has points of ugliness where
those proposals come into contact with each other (especially in the
area of SDP.) However, when we actually look at functionality rather
than elegance, the advantages of an all-new design---not only one
which is largely not based on preexisting technologies but one which
involves discarding most of the existing work on WebRTC itself---start
to look fairly thin.

Looking at the three claims listed above: the first seems more
rhetorical than factual. It's certainly true that in the early
years of the Web designers strove to keep state out of the Web
browser, but that hasn't been the case with rich Web applications
for quite some time. To the contrary, many modern HTML5 technologies
(localstore, WebSockets, HSTS, WebGL) are about pushing state onto the
browser from the server.

The interoperability argument is similarly weakly supported.
Given that JSEP is based on existing VoIP technologies, it
seems likely that it is easier to make it interoperate with
existing endpoints since it's not first necessary to implement
those technologies (principally SDP) in JavaScript before
you can even try to interoperate. The idea here seems to be that
it will be easier to accomodate existing noncompliant endpoints
if you can adapt your Web application on the fly, but given the
significant entry barrier to interoperating at all, this
seems like an argument that needs rather more support than
MS has currently offered.

Finally, with regard to the question of the flexibility/JavaScript
complexity tradeoff, it's somewhat distressing that the specific
applications that Microsoft cites (baby monitoring, security cameras,
etc.) are so pedestrian and easily handled by JSEP. This isn't of
course to say that there aren't applications which we can't currently
envision which JSEP would handle badly, but it rather undercuts this
argument if the only examples you cite in support of a new design are
those which are easily handled by the old one.

None of this is to say that CU-RTC-Web wouldn't be better in some
respects than JSEP. Obviously, any design has tradeoffs and as I
said above, it's always appealing to throw all that annoying legacy
stuff away and start fresh. However, that also comes with a lot of
costs and before we consider that we really need to have a far
better picture of what benefits other than elegance starting
over would bring to the table.

BACKGROUND
More or less everyone agrees about the basic objectives of the WebRTC
effort: to bring real-time communications (i.e., audio, video, and
direct data) to browsers. Specifically, the idea is that Web
applications should be able to use these capabilities directly. This
sort of functionality was of course already available either via
generic plugins such as Flash or via specific plugins such as Google
Talk, but the idea here was to have a standardized API that was built
into browsers.

In spite of this agreement about objectives, from the beginning there
was debate about the style of API that was appropriate, and in particular
how much of the complexity should be in the browser and how much in
the JavaScript The initial proposals broke down into two main flavors:

High-level APIs — essentially a softphone in the browser. The Web
application would request the creation of a call (perhaps with some
settings as to what kinds of media it wanted) and then each browser
would emit standardized signaling messages which the Web application
would arrange to transit to the other browser. The original WHATWG
HTML5/PeerConnection spec was of this type.

Low-level APIs — an API which exposed a bunch of primitive
media and transport capabilities to the JavaScript. A browser that
implemented this sort of API couldn't really do much by itself.
Instead, you would need to write something like a softphone in
JavaScript, including implementing the media negotiation, all the
signaling state machinery, etc. Matthew Kaufman from Microsoft
was one of the primary proponents of this design.

After a lot of debate, the WG ultimately rejected both of these and
settled on a protocol called JavaScript Session Establishment Protocol
(JSEP), which is probably best described as a mid-level API. That
design, embodied in the current specifications
[
http://tools.ietf.org/html/draft-ietf-rtcweb-jsep-01http://dev.w3.org/2011/webrtc/editor/webrtc.html],
keeps the transport
establishment and media negotiation in the browser but moves a fair
amount of the session establishment state machine into the JavaScript.
While it doesn't standardize signaling, it also has a natural mapping
to a simple signaling protocol as well as to SIP and Jingle, the two
dominant standardized calling protocols. The idea is supposed to be
that it's simple to write a basic application (indeed, a large number
of such simple demonstration apps have been written) but that
it's also possible to exercise advanced features by manipulating
the various data structures emitted by the browser. This is obviously
something of a compromise between the first two classes of proposals.

The decision to follow this trajectory was made somewhere around six
months ago and at this point Google has a fairly mature JSEP
implementation available in Chrome Canary while Mozilla has a less
mature implementation which you could compile yourself but hasn't been
released in any public build.

Disclaimer: I have been heavily involved with both the IETF and
W3C working groups in this area and have contributed significant
chunks of code to both the Chrome and Firefox implementations. I am
also currently consulting for Mozilla on their implementation. However,
the comments here are my own and don't necessarily represent those of any other
organization.

WHAT IS MICROSOFT PROPOSING?
What Microsoft is proposing is effectively a straight low level API.

There are a lot of different API points, and I don't plan to discuss
the API in much detail, but it's helpful to talk about the API
some to get a flavor of what's required to use it.

RealTimeMediaStream -- each RealTimeMediaStream represents a single
flow of media (i.e., audio or video).

RealTimeMediaDescription -- a set of parameters for the
RealTimeMediaStream.

RealTimeTransport -- a transport channel which a RealTimeMediaStream
can run over.

RealTimePort -- a transport endpoint which can be paired with a
RealTimePort on the other side to form a RealTimeTransport.

In order to set up an audio, video, or audio-video session, then, the JS
has to do something like the following:

Acquire local media streams on each browser via the getUserMedia()
API, thus getting some set of MediaStreamTracks.

Create RealTimePorts on each browser for all the local network
addresses as well as for whatever media relays are available/
required.

Communicate the coordinates for the RealTimePorts from each
browser to the other.

On each browser, run ICE connectivity checks for all combinations
of remote and local RealTimePorts.

Select a subset of the working remote/local RealTimePort pairs
and establish RealTimeTransports based on those pairs.
(This might be one or might be more than one depending on
the number of media flows, level of multiplexing, and the
level of redundancy required).

Determine a common set of media capabilities and codecs between
each browser, select a specific set of media parameters, and
create matching RealTimeMediaDescriptions on each browser
based on those parameters.

Attach the remote RealTimeMediaStreams to some local display
method (such as an audio or video tag).

For comparison, in JSEP you would do something like:

Acquire local media streams on each browser via the getUserMedia()
API, thus getting some set of MediaStreamTracks.

Create a PeerConnection() and call AddStream() for each of the
local streams.

Create an offer on one brower send it to the other side,
create an answer on the other side and send it back to the
offering browser. In the simplest case, this just involves
making some API calls with no arguments and passing the
results to the other side.

The PeerConnection fires callbacks announcing remote media
streams which you attach to some local display method.

As should be clear, the CU-RTC-Web proposal requires significantly
more complex JavaScript, and in particular requires that JavaScript to
be a lot smarter about what it's doing. In a JSEP-style API, the Web
programmer can be pretty ignorant about things like codecs and
transport protocols, unless he wants to do something fancy, but with
CU-RTC-Web, he needs to understand a lot of stuff to make things work
at all. In some ways, this is a much better fit for the traditional
Web approach of having simple default behaviors which fit a lot of
cases but which can then be customized, albeit in ways that
are somewtimes a bit clunky.

Note that it's not like this complexity doesn't exist in JSEP,
it's just been pushed into the browser so that the user doesn't have
to see it. As discussed below, Microsoft's argument is that this
simplicity in the JavaScript comes at a price in terms of flexibility
and robustness, and that libraries will be developed (think jQuery)
to give the average Web programmer a simple experience, so that
they won't have to accept a lot of complexity themselves. However,
since those libraries don't exist, it seems kind of unclear how
well that's going to work.

ARGUMENTS FOR MICROSOFT'S PROPOSAL
Microsoft's proposal and the associated blog post makes a number of
major arguments for why it is a superior choice (the proposal just came
out today so there haven't really been any public arguments for why
it's worse). Combining the blog posts, you would get something like
this:

That the current specification violates "fit with key web tenets",
specifically that it's not stateless and that you can only make
changes when in specific states. Also, that it depends on
the SDP offer/answer model.

That it doesn't allow a "customizable response to changing network
quality".

That it doesn't support "real-world interoperability" with
existing equipment.

That it's too tied to specific media formats and codecs.

That JSEP requires a Web application to do some frankly inconvenient
stuff if it wants to do something that the API doesn't have explicit
support for.

That it's inflexible and/or brittle with respect to new applications
and in particular that it's difficult to implement some specific
"innovative" applications with JSEP.

Below we examine each of these arguments in turn.

FITTING WITH "WEB TENETS"
MS writes:

Honoring key Web tenets-The Web favors stateless interactions which
do not saddle either party of a data exchange with the
responsibility to remember what the other did or expects. Doing
otherwise is a recipe for extreme brittleness in implementations;
it also raises considerably the development cost which reduces the
reach of the standard itself.

This sounds rhetorically good, but I'm not sure how accurate it is.
First, the idea that the Web is "stateless" feels fairly anachronistic
in an era where more and more state is migrating from the server. To
pick two examples, WebSockets involves forming a fairly long-term stateful
two-way channel between the browser and the server, and localstore/localdb
allow the server to persist data semi-permanently on the browser.
Indeed, CU-RTC-Web requires forming a nontrivial amount of state on
the browser in the form of the RealTimePorts, which represent actual
resource reservations that cannot be reliably reconstructed if
(for instance) the page reloads. I think the idea here is supposed
to be that this is "soft state", in that it can be kept on the
server and just reimposed on the browser at refresh time, but as
the RealTimePorts example shows, it's not clear that this is the case.
Similar comments apply to the state of the audio and video devices
which are inherently controlled by the browser.

Moreover, it's never been true that neither party in the data exchange
was "saddled" with remembering what the other did; rather, it used
to be the case that most state sat on the server, and indeed, that's
where the CU-RTC-Web proposal keeps it. This is the first time we have
really built a Web-based peer-to-peer app. Pretty much all previous
applications have been client-server applications, so it's hard to
know what idioms are appropriate in a peer-to-peer case.

I'm a little puzzled by the argument about "development cost"; there
are two kinds of development cost here: that to browser implementors
and that to Web application programmers. The MS proposal puts
more of that cost on Web programmers whereas JSEP puts more of
the cost on browser implementors. One would ordinarily think that
as long as the standard wasn't too difficult for browser implementors
to develop at all, then pushing complexity away from Web programmers
would tend to increase the reach of the standard. One could of course
argue that this standard is too complicated for browser implementors
to implement at all, but the existing state of Google and Mozilla's
implementations would seem to belie that claim.

Finally, given that the original WHATWG draft had even more state in
the browser (as noted above, it was basically a high-level API), it's
a little odd to hear that Ian Hickson is out of touch with the "key
Web tenets".

Real time media applications have to run on networks with a wide
range of capabilities varying in terms of bandwidth, latency, and
noise. Likewise these characteristics can change while an
application is running. Developers should be able to control how the
user experience adapts to fluctuations in communication quality. For
example, when communication quality degrades, the developer may
prefer to favor the video channel, favor the audio channel, or
suspend the app until acceptable quality is restored. An effective
protocol and API will have to arm developers with the tools to
tailor such answers to the exact needs of the moment, while
minimizing the complexity of the resulting API surface.

It's certainly true that it's desirable to be able to respond to
changing network conditions, but it's a lot less clear that the
CU-RTC-Web API actually offers a useful response to such changes. In
general, the browser is going to know a lot more about the
bandwidth/quality tradeoff of a given codec is going to be than most
JavaScript applications will, and so it seems at least plausible that
you're going to do better with a small number of policies (audio is
more important than video, video is more important than audio, etc.)
than you would by having the JS try to make fine-grained decisions
about what it wants to do. It's worth noting that the actual
"customizable" policies that are proposed here seem pretty simple.
The idea seems to be not that you would impose policy on the browser
but rather that since you need to implement all the negotiation
logic anyway, you get to implement whatever policy you want.

Moroever, there's a real concern that this sort of adaptation will
have to happen in two places: as MS points out, this kind of network
variability is really common and so applications have to handle it.
Unless you want to force every JS calling application in the universe
to include adaptation logic, the browser will need some (potentially
configurable and/or disableable) logic. It's worth asking whether
whatever logic you would write in JS is really going to be enough
better to justify this design.

it shows no signs of offering real world interoperability with
existing VoIP phones, and mobile phones, from behind firewalls and
across routers and instead focuses on video communication between
web browsers under ideal conditions. It does not allow an
application to control how media is transmitted on the network.

I wish this argument had been elaborated more, since it seems like
CU-RTC-Web is less focused on interoperability, not more. In
particular, since JSEP is based on existing technologies such as SDP
and ICE, it's relatively easy to build Web applications which gateway
JSEP to SIP or Jingle signaling (indeed, relatively simple prototypes
of these already exist). By contrast, gatewaying CU-RTC-Web signaling
to either of these protocols would require developing an entire
SDP stack, which is precisely the piece that the MS guys are implicitly
arguing is expensive.

Based on Matthew Kaufman's mailing list postings, his concern seems to
be that there are existing endpoints which don't implement some of the
specifications required by WebRTC (principally ICE, which is used to
set up the network transport channels) correctly, and that it will be
easier to interoperate with them if your ICE implementation is written
in JavaScript and downloaded by the application rather than in C++ and
baked into the browser. This isn't a crazy theory, but I think there are
serious open questions about whether it is correct. The basic problem
is that it's actually quite hard to write a good ICE stack (though
easy to write a bad one). The browser vendors have the resources to
do a good job here, but it's less clear that random JS toolkits that
people download will actually do that good a job (especially if they
are simultaneously trying to compensate for broken legacy equipment).
The result of having everyone write their own ICE stack might be good
but it might also lead to a landscape where cross-Web application interop
is basically impossible (or where there are islands of noninteroperable
de facto standards based on popular toolkits or even popular toolkit
versions).

A lot of people's instincts here seem to be based on an environment
where updating the software on people's machines was hard but
updating one's Web site was easy. But for about half of the population
of browsers (Chrome and Firefox) do rapid auto-updates, so they
actually are generally fairly modern. By contrast, Web applications
often use downrev version of their JS libraries (I wish I had survey
data here but it's easy to see just by opening up a JS debugger
on you favorite sites). It's not at all clear that the
JS is easy to upgrade/native is hard dynamic holds up any more.

TOO TIED TO SPECIFIC MEDIA FORMATS AND CODECS
The proposal says:

A successful standard cannot be tied to individual codecs, data
formats or scenarios. They may soon be supplanted by newer versions,
which would make such a tightly coupled standard obsolete just as
quickly. The right approach is instead to to support multiple media
formats and to bring the bulk of the logic to the application layer,
enabling developers to innovate.

I can't make much sense of this at all. JSEP, like the standards that
it is based on, is agnostic about the media formats and codecs that
are used. There's certainly nothing in JSEP that requires you to use
VP8 for your video codec, Opus for your audio codec, or anything
else. Rather, two conformant JSEP implementations will converge on a
common subset of interoperable formats. This should happen
automatically without Web application intervention.

Arguably, in fact, CU-RTC-Web is *more* tied to a given codec because
the codec negotiation logic is implemented either on the server or in
the JavaScript. If a browser adds support for a new codec, the Web
application needs to detect that and somehow know how to prioritize it
against existing known codecs. By contrast, when the browser
manufacturer adds a new codec, he knows how it performs compared to
existing codecs and can adjust his negotiation algorithms accordingly.
Moreover, as discussed below, JSEP provides (somewhat clumsy)
mechanisms for the user to override the browser's default choices.
These mechanisms could probably be made better within the JSEP
architecture.

Based on Matthew Kaufman's interview with Janko Rogers
[http://gigaom.com/2012/08/06/microsoft-webrtc-w3c/],
it seems like
this may actually be about the proposal to have a mandatory to
implement video codec (the leading candidates seem to be H.264 or
VP8). Obviously, there have been a lot of arguments about whether
such a mandatory codec is required (the standard argument in favor
of it is that then you know that any two implementations have
at least one codec in common), but this isn't really a matter
of "tightly coupling" the codec to the standard. To the contrary,
if we mandated VP8 today and then next week decided to mandate
H.264 it would be a one-line change in the specification.
In any case, this doesn't seem like a structural argument about
JSEP versus CU-RTC-Web. Indeed, if IETF and W3C decided to ditch
JSEP and go with CU-RTC-Web, it seems likely that this wouldn't
affect the question of mandatory codecs at all.

THE INCONVENIENCE OF SDP EDITING
Probably the strongest point that the MS authors make is that if the
API doesn't explicitly support doing something, the situation is kind
of gross:

In particular, the negotiation model of the API relies on the SDP
offer/answer model, which forces applications to parse and generate
SDP in order to effect a change in browser behavior. An application
is forced to only perform certain changes when the browser is in
specific states, which further constrains options and increases
complexity. Furthermore, the set of permitted transformations to SDP
are constrained in non-obvious and undiscoverable ways, forcing
applications to resort to trial-and-error and/or browser-specific
code. All of this added complexity is an unnecessary burden on
applications with little or no benefit in return.

What this is about is that in JSEP you call CreateOffer() on a
PeerConnection in order to get an SDP offer. This doesn't actually
change the PeerConnection state to accomodate the new offer; instead,
you call SetLocalDescription() to install the offer. This gives
the Web application the opportunity to apply its own preferences
by editing the offer. For instance, it might delete a line containing
a codec that it didn't want to use. Obviously, this requires a lot
of knowledge of SDP in the application, which is irritating to say
the least, for the reasons in the quote above.

The major mitigating factor is that the W3C/IETF WG members intend to
allow most common manipulations to made through explicit settings
parameters, so that only really advanced applications need to know
anything about SDP at all. Obviously opinions vary about how good a
job they have done, and of course it's possible to write libraries
that would make this sort of manipulation easier. It's worth noting
that there has been some discussion of extending the W3C APIs to have
an explicit API for manipulating SDP objects rather than just editing
the string versions (perhaps by borrowing some of the primitives in
CU-RTC-Web). Such a change would make some things easier while not
really representing a fundamental change to the JSEP model. However,
it's not clear if there are enough SDP-editing tasks to make this
project worthwhile.

With that said, that in order to have CU-RTC-Web interoperate with
existing SIP endpoints at all you would need to know far more about
SDP than would be required to do most anticipated transformations in a
JSEP environment, so it's not like CU-RTC-Web frees you from SDP if
you care about interoperability with existing equipment.

SUPPORT FOR NEW/INNOVATIVE APPLICATIONS
Finally, the MSFT authors argue that CU-RTC-Web is more flexible
and/or less brittle than JSEP:

On the other hand, implementing innovative, real-world applications
like security consoles, audio streaming services or baby monitoring
through this API would be unwieldy, assuming it could be made to
work at all. A Web RTC standard must equip developers with the
ability to implement all scenarios, even those we haven't thought
of.

Obviously the last sentence is true, but the first sentence provides
scant support for the claim that CU-RTC-Web fulfills this requirement
better than JSEP. The particular applications cited here, namely audio
streaming, security consoles, and baby monitoring, seem not only
doable with JSEP, but straightforward. In particular, security
consoles and baby monitoring just look like one way audio and/or video
calls from some camera somewhere. This seems like a trivial subset of
the most basic JSEP functionality. Audio streaming is, if anything,
even easier. Audio streaming from servers already exists without any
WebRTC functionality at all, in the form of the audio tag, and audio
streaming from client to server can be achieved with the combination
of getUserMedia and WebSockets. Even if you decided that you wanted to
use UDP rather than WebSockets, audio streaming is just a one-way
audio call, so it's hard to see that this is a problem.

In e-mail
to the W3C WebRTC mailing list, Matthew Kaufman mentions the
use case of handling page reload:

An example would be recovery from call setup in the face of a
browser page reload... a case where the state of the browser must be
reinitialized, leading to edge cases where it becomes impossible with
JSEP for a developer to write Javascript that behaves properly in all
cases (because without an offer one cannot generate an answer, and
once an offer has been generated one must not generate another offer
until the first offer has been answered, but in either case there is
no longer sufficient information as to how to proceed).

This use case, often called "rehydration" has been studied a fair bit
and it's not entirely clear that there is a convenient solution with
JSEP. However, the problem isn't the offer/answer state, which is actually
easily handled, but rather the ICE and cryptographic state, which
are just as troublesome with CU-RTC-Web as they are with JSEP
[for a variety of technical reasons, you can't just reuse the
previous settings here.] So, while rehydration is an issue, it's
not clear that CU-RTC-Web makes matters any easier.

This argument, which should be the strongest of MS's arguments, feels
rather like the weakest. Given how much effort has already gone into
JSEP, both in terms of standards and implementation, if we're going to
replace it with something else that something else should do something
that JSEP can't, not just have a more attractive API. If MS can't
come up with any use cases that JSEP can't accomplish, and if in fact
the use cases they list are arguably more convenient with JSEP than
with CU-RTC-Web, then that seems like a fairly strong argument that we
should stick with JSEP, not one that we should replace it.

What I'd like to see Microsoft do here is describe some applications
that are really a lot easier with CU-RTC-Web than they are with
JSEP. Depending on the details, this might be a more or less convincing
argument, but without some examples, it's pretty hard to see
what considerations other than aesthetic would drive us towards
CU-RTC-Web.

Acknowledgement
Thanks to Cullen Jennings, Randell Jesup, Maire Reavy,
and
Tim Terriberry for early comments on this draft.

March 11, 2011

I've
complained before about Farhad Manjoo's shallow analysis of the social
implications of technical decisions, which seems to begin and end with what
would be convenient for him. His latest post
is an argument against anonymous comments on Internet forums/message boards/etc.
Manjoo writes:

I can't speak for my bosses, who might feel differently than I do. But
as a writer, my answer is no-I don't want anonymous
commenters. Everyone who works online knows that there's a direct
correlation between the hurdles a site puts up in front of potential
commenters and the number and quality of the comments it receives. The
harder a site makes it for someone to post a comment, the fewer
comments it gets, and those comments are generally better.

I can appreciate how Manjoo might feel like that. No doubt as
a writer it's annoying to get anonymous people telling you that you
suck (and much as I find Manjoo's writing annoying, I'm forced
to admit that even good writing gets that sort of reaction from
time to time). However, this claim simply isn't true—or
at least isn't supported by any evidence I know of—to the
contrary, the Slate comments section (which Manjoo endorses later in his
article) isn't really that great and one of the most highly regarded blog
comment sections, Obsidian Wings
is almost completely anonymous (though moderated), with the only
barrier to posting being a CAPTCHA. Similarly, some of the most entertaining
pure-comments sites such as Fark
only require e-mail confirm, which, as Manjoo admits, is virtually
anonymous. I don't really know everything
that makes a good comments section work, but it's a lot more complicated
than just requiring people to use their real names.

I think Slate's commenting requirements-and those of many other
sites-aren't stringent enough. Slate lets people log in with accounts
from Google and Yahoo, which are essentially anonymous; if you want to
be a jerk in Slate's comments, create a Google account and knock
yourself out. If I ruled the Web, I'd change this. I'd make all
commenters log in with Facebook or some equivalent third-party site,
meaning they'd have to reveal their real names to say something in a
public forum. Facebook has just revamped its third-party commenting
"plug-in," making it easier for sites to outsource their commenting
system to Facebook. Dozens of sites-including, most prominently, the
blog TechCrunch-recently switched over to the Facebook system. Their
results are encouraging: At TechCrunch, the movement to require real
names has significantly reduced the number of trolls who tar the site
with stupid comments.

This is an odd claim since Facebook actually makes no real attempt
to verify your full name. Like most sites, they just verify that
there is some e-mail addres that you can respond at. It's not
even clear how Facebook would go about verifying people's real
names. Obviously, they could prune out people who claim to be
Alan Smithee, (though consider this) but the world is full of real John Smiths, so
why shouldn't I be another one of them?

What's my beef with anonymity? For one thing, several social science
studies have shown that when people know their identities are secret
(whether offline or online), they behave much worse than they
otherwise would have. Formally, this has been called the "online
disinhibition effect," but in 2004, the Web comic Penny Arcade coined
a much better name: The Greater Internet Fuckwad Theory. If you give a
normal person anonymity and an audience, this theory posits, you turn
him into a total fuckwad. Proof can be found in the comments section
on YouTube, in multiplayer Xbox games, and under nearly every politics
story on the Web. With so many fuckwads everywhere, sometimes it's
hard to understand how anyone gets anything out of the Web.

I don't disagree that this is to some extent true, though I would observe
that (a) the link Manjoo points to doesn't actually contain any
studies as far as I can tell, just an article oriented towards the lay public
and (b) it's not clear to what extent people's bad online behavior is a
result of anonymity. Some of the most vicious behavior I've seen online
has been on mailing lists where people's real-world identities (and employers!)
are well-known and in some cases the participants actually know each other
personally and are polite face-to-face.

As I said above, I don't think anyone really knows exactly what makes
a good online community (though see here for some thoughts on it by others), but
my intuition is that it's less an issue of anonymity than of
getting the initial culture right, in a way that it resists
trolling, flamewars, etc., or at least has a way to contain them.
In comments sections that work, when someone shows up and starts
trolling (even where this is easy and anonymous), the posters mostly
ignore it and the moderators deal with it swiftly, so it never gets
out of hand. Once the heat gets above some critical point on a regular basis,
though, these social controls break down and it takes a really big hammer to get things back
under control. It's not clear to me that knowing people's real names
has much of an impact on any of that.

January 22, 2011

Slate has published another Farhad Manjoo screed against unlimited Internet service.

And say hooray, too, because unlimited data plans deserve to
die. Letting everyone use the Internet as often as they like for no
extra charge is unfair to all but the data-hoggiest among usand it's
not even that great for those people, either. Why is it unfair? For
one thing, unlimited plans are more expensive than pay-as-you-go plans
for most people. That's because a carrier has to set the price of an
unlimited plan high enough to make money from the few people who use
the Internet like there's no tomorrow. But most of us aren't such
heavy users. AT&T says that 65 percent of its smartphone customers
consume less than 200 MB of broadband per month and 98 percent use
less than 2 GB. This means that if AT&T offered only a $30 unlimited
iPhone plan (as it once did, and as Verizon will soon do), the 65
percent of customers who can get by with a $15 planto say nothing of
the 98 percent who'd be fine on the $25 planwould be overpaying.

This seems extremely confused. First, it's generally true that
whenever a business offers a limited number of product offerings
with each at a fixed price that some people overpay because they
only want some cheaper offering that the company doesn't provide.
For instance, when I bought my last car, Audi insisted on
selling me the "winter sports package" (heated seats and a ski
bag). Now, I don't do a lot of skiing and I didn't want either
but thats the way the thing came. Now by Manjoo's logic, it was unfair
that I had to pay more for a ski bag I would never use (the heated
seats are great, by the way) but that's just the way the product
comes. Sure, I'd rather the company offered exactly the package
I wanted but a limited number of offerings is just a standard
feature of capitalism.

Its worth observing that there's nothing special about
the "unlimited" plan in Manjoo's logic (It's not really
unlimited anyway, since the network has some finite
amount of bandwidth available so that provides a hard upper
limit on how much data you can transfer in a month; it's just
that that limit is really high.) Say Verizon offered only
a 2GB plan, would he be whining that he only used 200 MB
of bandwidth and so he was being made to overpay so Verizon
can make money on the 2GB-using bandwidth hogs?
So, this objection is pretty hard to take seriously.

Manjoo goes on:

But it's not just that unlimited plans raise prices. They also ruin
service. Imagine what would happen to your town's power grid if
everyone paid a flat rate for electricity: You and your neighbors
would set your thermostats really high in the winter and low in the
summer, you'd keep your pool heated year-round, you'd switch to
plug-in electric cars, and you'd never consider replacing your
ancient, energy-hogging appliances. As a result, you'd suffer frequent
brownouts, you'd curse your power company, and you'd all wish for a
better way. Economists call this a tragedy of the commons, and it can
happen on data networks just as easily as the power grid--faced with no
marginal cost, it's in everyone's interest to use as much of the
service as they can. When that happens, the network goes down for
everyone.

So, first this is just wrong: it's actually reasonably common
for utilities to be included in people's leases and yet when
that happens people don't automatically switch to plug-in cars
or start up home aluminum refineries.
That isn't to say that at having to pay for each watt of power
doesn't have some impact on your consumption, but there is only
so much power that it's really convenient for people to use;
it's not like power being free causes consumption to spin off
into infinity. To take another example, it's absolutely standard
for local voice telephony service to be sold flat rate and yet practically
nobody leaves their phone line tied up 24x7 just in case they
want to say something to Mom and don't feel like taking the
trouble to dial the phone. (Full disclosure,
I actually have used dialup internet as a replacement for a leased
line this way, but that's a pretty rare use case.)

The second problem with this claim is that computer networks
don't behave the way the electrical grid does in the face of contention.
Like the electrical grid, computer networks are sized for a certain
capacity, but unlike the grid, computers aren't built with the
assumption that that capacity is effectively infinite. If the
electrical grid in your area is operating at full capacity,
and you turn on your AC, this can cause a brownout because there
is no way for the power company to tell everyone to use 1% less
power and even if there was, many of the devices in question are
just designed to operate in a way where they draw constant power.
By contrast, computer network protocols are already designed
to operate in conditions where they can't use as much bandwidth
as they would like because non-infinite bandwidth is a basic feature
of the system. Even if there is no contention for the network,
applications need to work behind a variety of connection types
so people who build applications typically build them to automatically
adapt to how much throughput they are actually getting. For
instance, Netflix has adaptive streaming which
means that it tries to detect how fast your network is and if
it's slow it compresses the media harder to reduce the amount of
data to send. What this means is that unlike the electrical grid
where your computer may just crash if it doesn't get enough power,
if the network suddenly gets slower, performance degrades relatively
smoothly.

The second thing you need to know is that in data networks congestion
is (almost) the only thing that matters. If nobody else is trying to
use the network right now then it's fairly harmless if you
decide to consume all the available capacity. What's important is
that when other people do want to use the network you back
off to give them room.
So, to the extent to which there is a scarce resource it's not
total download capacity but rather use of the network at
times when it's actually congested.
To a great extent network protocols (especially TCP) already do
attempt to back off in the face of congestion but there's also
nothing stopping
the provider from deliberately imposing balance on you
(cf. fair queueing).
In either case, this is a relatively orthogonal issue to the
volume of data transferred; a cap on total transfer is an
extremely crude proxy for the kind of externality Manjoo is
talking about. Not only is it crude, it's inefficient: it discourages
use of the network which would be cost-free for others and of value
to the customer using the network.

All this stuff has of course been hashed out endlessly in the networking
economics literature and the above is only the barest sketch. Suffice
to say that just applying this sort of naive "tragedy of the commons"
analysis doesn't really get you very far.

September 14, 2009

David Coursey complains about how long it took
IEEE to develop 802.11n:

802.11n is the poster child for a standards process gone wrong. Seven
years after it began and at least two years after 802.11 "draft"
devices arrived, the IEEE has finally adopted a final standard for
faster, stronger, more secure wireless.

Ideally, standards arrive before the products that implement
them. However, the IEEE process moved so slowly that vendors adopted a
draft standard and started manufacturing hardware. After a few little
glitches, the hardware became compatible and many of us have--for
years--been running multivendor 802.11n networks despite the lack of
an approved standard.

...

If standards bodies expect to be taken seriously, they need to do
their work in reasonable periods. Releasing a "final" standard long
after customer adoption has begun is not only anti-climatic but
undercuts the value of the standards making process.

In this case, the process failed. The IEEE should either improve its
process or get out of the way and left industry leaders just create de
facto standards as they see fit. That is not preferable, but if the
IEEE process is stuck, it will be what happens.

My experience with IEEE standards making is limited, but I have
extensive experience with IETF's process, and I'm a little puzzled
as to what Coursey thinks the problem is here. Developing standards
is like developing any other technical artifact: you start out
with an idea, do some initial prototypes, test those prototypes,
modify the design in response to the testing, and iterate till you're satisfied.
Now, in the case of a protocol standard, the artifact is the
document that defines how implementations are supposed to behave,
and the testing phase, at least in part, is implementors building
systems that (nominally) conform the the spec and seeing how
well they work, whether they interoperate, etc.
With any complicated system, this process needs to include building
systems which will be used by end-users and seeing how they function
in the field. If you don't do this, you end up with systems which
only work in the lab.

There's not too much you can do to avoid going through these steps;
it's just really hard to build workable systems without a certain
level of testing. Of course, that still leaves you with the question
of when you call the document done. Roughly speaking, there are
two strategies: you can stamp the document "standard" before it's
seen any real deployment and then churn out a revision a few
years later in response to your deployment experience. Alternately,
you can go through a series of drafts, refining them in response
to experience, until eventually you just publish a finished standard,
but it's based on what people have been using for years. An intermediate
possibility is to have different maturity levels. For instance,
IETF has "proposed standards", "draft standards", and then "standards".
This doesn't work that well in practice: it takes so long to develop
each revision that many important protocols never make it past
"proposed standard."
In all three cases, you go through mostly the same system development
process, you just label the documents differently.

With that in mind, it's not clear to me that IEEE has done anything
wrong here: if they decided to take the second approach and
publish a really polished document and
802.11n is indeed nice and polished and the new document
won't need a revision for 5+ years, then this seems like a fairly
successful effort. I should hasten to add that I don't know that this
is true: 802.11n could be totally broken. However, the facts that
Coursey presents sound like pretty normal standards development.

September 13, 2009

One of the results of Joe Wilson (R-South Carolina) calling President
Obama a liar on national TV was that money started pouring in, both
to Wilson and his likely opponent in 2010 (Rob Miller). Piryx,
who hosts Wilson's site, claims that on Friday and Saturday
they were then subject to a 10 hour DoS attack against their
systems:

Yesterday (Friday) around 3:12pm CST we noticed the bandwidth spike on
the downstream connections to Piryx.com server collocation
facility. Our bandwidth and packet rate threshold monitors went off
and we saw both traditional DOS bandwidth based attacks as well as
very high packet rate, low bandwidth ICMP floods all destined for our
IP address.

...At this point we have spent 40+ man hours, with 10 external techs
fully monopolized in researching and mitigating this attack.

To give a sense of scale, the attacks were sending us 500+ Mbps of
traffic, which would run about $147,500 per month in bandwidth
overages.

I think most people would agree that technical attacks on candidates
Web sites, donation systems, etc. aren't good for democracy—just
as it would be bad if candidates were regularly assassinated—and
it would be good if they didn't happen. While there are technical
countermeasures against, DoS, they're expensive and only really
work well if you have a site with a lot of capacity so that you
can absorb the attack, which isn't necessarily something that
every HSP has.

This may turn out to be a bad idea, but it occurred to me that
one way to deal with this kind of attack might be for the
federal government to simply run its own HSP, dedicated solely
to hosting sites for candidates and to accepting payments on
their behalf. Such a site could be large enough—though
compared to big service providers, comparatively small—to
resist most DoS attacks. Also, to the extent to which everyone
ran their candidate sites there, it would remove the differential
effect of DoS attacks: sure you can DoS the site, but you're
damaging your own preferred candidate as much as the opposition.
Obviously, this doesn't help if the event that precipitates
the surge of donations massively favors one side, but in this
case, at least, both sides saw a surge. I don't know if this
is universally true though.

Of course, this would put the site operator (either the feds or
whoever they outsourced it to) in a position to know who donated
to which candidate, but in many cases this must be disclosed
anyway, and presumably if the operation was outsourced,
one could put a firewall in to keep the information not
subject to disclosure away from the feds.

July 24, 2009

Ed Felten writes
about the economic forces that drive cloud computing, arguing that a prime driver is the
desire to reduce administrative costs:

Why, then, are we moving into the cloud? The key issue is the cost of
management. Thus far we focused only on computing resources such as
storage, computation, and data transfer; but the cost of managing all
of this -- making sure the right software version is installed, that
data is backed up, that spam filters are updated, and so on -- is a
significant part of the picture. Indeed, as the cost of computing
resources, on both client and server sides, continues to fall rapidly,
management becomes a bigger and bigger fraction of the total cost. And
so we move toward an approach that minimizes management cost, even if
that approach is relatively wasteful of computing resources. The key
is not that we're moving computation from client to server, but that
we're moving management to the server, where a team of experts can
manage matters for many users.

This certainly is true to an extent and it's one of the driving factors
behind all sorts of outsourced hosting. Educated Guesswork, for instance,
is hosted on Dreamhost, in
large part because I didn't want the hassle of maintaining yet
another public Internet-accessible server. I'm not sure I would
call this "cloud computing", though, except retroactively.

That said, the term "cloud computing" covers a lot of ground
(see the Wikipedia
article), and I
don't think Felten's argument holds up as well when we look at
examples that look less like outsourced applications. Consider, for
example Amazon's Elastic Compute Cluster (EC2). EC2 lets you rapidly
spin up a large number of identical servers on Amazon's hardware and
bring them up and down as required to service your load. Now, there is
a substantial amount of management overhead reduction at the hardware
level in that you don't need to contract for Internet, power, HVAC,
etc., but since you're running a virtualized machine, you still have
all the software management issues Ed mentions, and they're somewhat worse since
you have to work within Amazon's infrastructure (see here
for some complaining about this). Much of the benefit of an
EC2-type solution is extreme resource flexibility: if you have a
sudden load spike, you don't need to quickly roll out a bunch of
new hardware, you just bring up some EC2 instances. When the spike
goes away, you shut them down.

A related benefit is that this reduces resource consumption via a crude
form of stochastic multiplexing: if EC2 is running a large number
of Web sites, they're probably not all experiencing spikes at
the same time, so the total amount of spare capacity required in
the system is a lot smaller.

Both of these benefits apply as well to applications
in the cloud (for instance, Ed's Gmail example). If you run your
own mail server, it's idle almost all the time. On the other hand,
if you use Gmail (or even a hosted service), then you are sharing
that resource with a whole bunch of different people and so
Amazon just needs enough capacity to service the projected
aggregate usage of all those people, most of whom aren't
using the system very hard (what, you thought that Amazon
really had 8G of disk for each user?). At the end of the day,
I suspect that the management cost Ed sites is the dominant
issue here, though, which, I suppose argues that lumping
outsourced applications ("software as a service") together
with outsourced/virtualized hardware as "cloud computing"
isn't really that helpful.

April 3, 2009

You may or may not have seen
this article
(Bill here courtesy
of Lauren Weinstein; þ Joe Hall):

Key lawmakers are pushing to dramatically escalate U.S. defenses
against cyberattacks, crafting proposals that would empower the
government to set and enforce security standards for private industry
for the first time.

OK, I'm going to stop you right there. I spend a large fraction
of my time with computer security people and I don't think I've
ever heard any of them use the term "cybersecurity", "cyberattacks",
or pretty much "cyber-anything", except for when they're making
fun of govspeak like this. Next they'll be talking about setting
up speed traps on the Information Superhighway. Anyway, moving
on...

The Rockefeller-Snowe measure would create the Office of the National
Cybersecurity Adviser, whose leader would report directly to the
president and would coordinate defense efforts across government
agencies. It would require the National Institute of Standards and
Technology to establish "measurable and auditable cybersecurity
standards" that would apply to private companies as well as the
government. It also would require licensing and certification of
cybersecurity professionals.

So, it's sort of credible that NIST would generate some computer
security standards. They've already done quite a few, especially
in cryptography and communications security, with, I think it's
fair to say, pretty mixed results. Some of their standards,
especially the cryptographic ones like DES, AES, and SHA-1 have
turned out OK, but as you start to move up the stack towards
protocols and especially systems, the standards seem increasingly
overconstrained and poorly matched to the kinds of practices
that people actually engage in. In particular, there have been
several attempts by USG to write standards about systems security
(e.g., Common Criteria,
and Rainbow Books)
I think it's fair to say that uptake in the private sector has been
minimal at best. Even more limited efforts like FIPS-140 (targeted
at cryptographic systems) are widely seen as incredibly onerous
and a hoop that developers have to jump through, rather than a
best practice that they actually believe in.

I haven't gone through the bill completely, but check out this
fun bit:

(4) SOFTWARE CONFIGURATION SPECIFICATION
LANGUAGE.--The Institute shall, establish standard
computer-readable language for completely specifying the configuration of software on computer systems widely used in the Federal government, by government contractors and grantees, and in private
sector owned critical infrastructure information systems and networks.

I don't really know what this means but it sounds pretty hard. Even
UNIX systems, which are extremely text-oriented, don't have what you'd
call a standard computer readable configuration language. More like
10 such languages, I guess. I'm definitely looking forward to hearing
about NIST's efforts to standardize sendmail.cf

The licensing and certification clause seems even sillier. There are
plenty of professional security certifications you can get, but most
people I know view them as more a form of rent seeking by the
people who run the certifying classes than as a meaningful
credential. I don't know of anyone that I know has one of these
certifications. I'm just imaginine the day when we're told Bruce
Schneier and Ed Felten aren't allowed to work on critical infrastructure
systems because they're not certified.

February 25, 2009

I've got some code that needs to convert an IP address
into a string. This is one of those cases where there's a
twisty maze of APIs, all slightly different. The traditional
API here is:

char *
inet_ntoa(struct in_addr in);

inet_ntoa() has two deficiencies, one important
and one trivial: it doesn't support IPv6 and it returns a
pointer to a statically allocated buffer, so it's not thread
safe (I'll let you figure out which is which). Luckily, there's another API: addr2ascii():

char *
addr2ascii(int af, const void *addrp, int len, char *buf);

If you pass buf=0,
addr2ascii() will return a pointer to a static
buffer like inet_ntoa(). However, if you pass it
an allocated buffer it will return the result in
buf. Unfortunately, if you actually try to
use addr2ascii() in threaded code you will
quickly discover something unpleasant, at least on FreeBSD: you occasionally
get the result "[inet_ntoa error]" or some fraction thereof.
The answer is hidden in the
EXAMPLES section of the man page:

In actuality, this cannot be done because addr2ascii() and
ascii2addr() are implemented in terms of the inet(3) functions, rather
than the other way around.

In other words, even though addr2ascii() doesn't
explicitly use a static buffer, since it depends on
inet_ntoa() it's still not thread safe.
In order to get thread safety, you need to use yet another
API: