Internet Evolution

Big Changes Ahead for Core Internet Protocols

Date: January 22, 2018

When the Internet started to become widely used in the 1990s, most traffic used just a few protocols: IPv4 routed packets, TCP turned those packets into connections, SSL (later TLS) encrypted those connections, DNS named hosts to connect to, and HTTP was often the application protocol using it all.

For many years, there were negligible changes to these core Internet protocols; HTTP added a few new headers and methods, TLS slowly went through minor revisions, TCP adapted congestion control, and DNS introduced features like DNSSEC. The protocols themselves looked about the same ‘on the wire’ for a very long time (excepting IPv6, which already gets its fair amount of attention in the network operator community.)

As a result, network operators, vendors, and policymakers that want to understand (and sometimes, control) the Internet have adopted a number of practices based upon these protocols’ wire ‘footprint’ — whether intended to debug issues, improve quality of service, or impose policy.

Now, significant changes to the core Internet protocols are underway. While they are intended to be compatible with the Internet at large (since they won’t get adoption otherwise), they might be disruptive to those who have taken liberties with undocumented aspects of protocols or made an assumption that things won’t change.

Why we need to change the Internet

There are a number of factors driving these changes.

First, the limits of the core Internet protocols have become apparent, especially regarding performance. Because of structural problems in the application and transport protocols, the network was not being used as efficiently as it could be, leading to end-user perceived performance (in particular, latency).

Second, the ability to evolve Internet protocols — at any layer — has become more difficult over time, largely thanks to the unintended uses by networks discussed above. For example, HTTP proxies that tried to compress responses made it more difficult to deploy new compression techniques; TCP optimization in middleboxes made it more difficult to deploy improvements to TCP.

Let’s have a look at what’s happened, what’s coming next, how it might impact networks, and how networks impact protocol design.

HTTP/2

HTTP/2 (based on Google’s SPDY) was the first notable change — standardized in 2015, it multiplexes multiple requests onto one TCP connection, thereby avoiding the need to queue requests on the client without blocking each other. It is now widely deployed, and supported by all major browsers and web servers.

From a network’s viewpoint, HTTP/2 made a few notable changes. First, it’s a binary protocol, so any device that assumes it’s HTTP/1.1 is going to break.

That breakage was one of the primary reasons for another big change in HTTP/2; it effectively requires encryption. This gives it a better chance of avoiding interference from intermediaries that assume it’s HTTP/1.1, or do more subtle things like strip headers or block new protocol extensions — both things that had been seen by some of the engineers working on the protocol, causing significant support problems for them.

Finally, HTTP/2 allows more than one host’s requests to be coalesced onto a connection, to improve performance by reducing the number of connections (and thereby, congestion control contexts) used for a page load.

Despite these changes, it’s worth noting that HTTP/2 doesn’t appear to suffer from significant interoperability problems or interference from networks.

TLS 1.3

TLS 1.3 is just going through the final processes of standardization and is already supported by some implementations.

Don’t be fooled by its incremental name; this is effectively a new version of TLS, with a much-revamped handshake that allows application data to flow from the start (often called ‘0RTT’). The new design relies upon ephemeral key exchange, thereby ruling out static keys.

This has caused concern from some network operators and vendors — in particular those who need visibility into what’s happening inside those connections.

For example, consider the datacentre for a bank that has regulatory requirements for visibility. By sniffing traffic in the network and decrypting it with the static keys of their servers, they can log legitimate traffic and identify harmful traffic, whether it be attackers from the outside or employees trying to leak data from the inside.

TLS 1.3 doesn’t support that particular technique for intercepting traffic, since it’s also a form of attack that ephemeral keys protect against. However, since they have regulatory requirements to both use modern encryption protocols and to monitor their networks, this puts those network operators in an awkward spot.

There’s been much debate about whether regulations require static keys, whether alternative approaches could be just as effective, and whether weakening security for the entire Internet for the benefit of relatively few networks is the right solution. Indeed, it’s still possible to decrypt traffic in TLS 1.3, but you need access to the ephemeral keys to do so, and by design, they aren’t long-lived.

At this point it doesn’t look like TLS 1.3 will change to accommodate these networks, but there are rumblings about creating another protocol that allows a third party to observe what’s going on— and perhaps more — for these use cases. Whether that gets traction remains to be seen.

QUIC

During work on HTTP/2, it became evident that TCP has similar inefficiencies. Because TCP is an in-order delivery protocol, the loss of one packet can prevent those in the buffers behind it from being delivered to the application. For a multiplexed protocol, this can make a big difference in performance.

QUIC is an attempt to address that by effectively rebuilding TCP semantics (along with some of HTTP/2’s stream model) on top of UDP. Like HTTP/2, it started as a Google effort and is now in the IETF, with an initial use case of HTTP-over-UDP and a goal of becoming a standard in late 2018. However, since Google has already deployed QUIC in the Chrome browser and on its sites, it already accounts for more than 7% of Internet traffic.

Besides the shift from TCP to UDP for such a sizable amount of traffic (and all of the adjustments in networks that might imply), both Google QUIC (gQUIC) and IETF QUIC (iQUIC) require encryption to operate at all; there is no unencrypted QUIC.

iQUIC uses TLS 1.3 to establish keys for a session and then uses them to encrypt each packet. However, since it’s UDP-based, a lot of the session information and metadata that’s exposed in TCP gets encrypted in QUIC.

In fact, iQUIC’s current ‘short header’ — used for all packets except the handshake — only exposes a packet number, an optional connection identifier, and a byte of state for things like the encryption key rotation schedule and the packet type (which might end up encrypted as well).

Everything else is encrypted — including ACKs, to raise the bar for traffic analysis attacks.

However, this means that passively estimating RTT and packet loss by observing connections is no longer possible; there isn’t enough information. This lack of observability has caused a significant amount of concern by some in the operator community, who say that passive measurements like this are critical for debugging and understanding their networks.

One proposal to meet this need is the ‘Spin Bit‘ — a bit in the header that flips once a round trip, so that observers can estimate RTT. Since it’s decoupled from the application’s state, it doesn’t appear to leak any information about the endpoints, beyond a rough estimate of location on the network.

Circumventing this kind of control with encryption has been discussed for a while, but it has a disadvantage (at least from some standpoints) — it is possible to discriminate it from other traffic; for example, by using its port number to block access.

DOH addresses that by piggybacking DNS traffic onto an existing HTTP connection, thereby removing any discriminators. A network that wishes to block access to that DNS resolver can only do so by blocking access to the website as well.

For example, if Google was to deploy its public DNS service over DOH on www.google.com and a user configures their browser to use it, a network that wants (or is required) to stop it would have to effectively block all of Google (thanks to how they host their services).

DOH has just started its work, but there’s already a fair amount of interest in it, and some rumblings of deployment. How the networks (and governments) that use DNS to impose policy will react remains to be seen.

Ossification and grease

To return to motivations, one theme throughout this work is how protocol designers are increasingly encountering problems where networks make assumptions about traffic.

For example, TLS 1.3 has had a number of last-minute issues with middleboxes that assume it’s an older version of the protocol. gQUIC blacklists several networks that throttle UDP traffic, because they think that it’s harmful or low-priority traffic.

When a protocol can’t evolve because deployments ‘freeze’ its extensibility points, we say it has ossified. TCP itself is a severe example of ossification; so many middleboxes do so many things to TCP — whether it’s blocking packets with TCP options that aren’t recognized, or ‘optimizing’ congestion control.

It’s necessary to prevent ossification, to ensure that protocols can evolve to meet the needs of the Internet in the future; otherwise, it would be a ‘tragedy of the commons’ where the actions of some individual networks — although well-intended — would affect the health of the Internet overall.

There are many ways to prevent ossification; if the data in question is encrypted, it cannot be accessed by any party but those that hold the keys, preventing interference. If an extension point is unencrypted but commonly used in a way that would break applications visibly (for example, HTTP headers), it’s less likely to be interfered with.

Where protocol designers can’t use encryption and an extension point isn’t used often, artificially exercising the extension point can help; we call this greasing it.

For example, QUIC encourages endpoints to use a range of decoy values in its version negotiation, to avoid implementations assuming that it will never change (as was often encountered in TLS implementations, leading to significant problems).

The network and the user

Beyond the desire to avoid ossification, these changes also reflect the evolving relationship between networks and their users. While for a long time people assumed that networks were always benevolent — or at least disinterested — parties, this is no longer the case, thanks not only to pervasive monitoring but also attacks like Firesheep.

As a result, there is growing tension between the needs of Internet users overall and those of the networks who want to have access to some amount of the data flowing over them. Particularly affected will be networks that want to impose policy upon those users; for example, enterprise networks.

In some cases, they might be able to meet their goals by installing software (or a CA certificate, or a browser extension) on their users’ machines. However, this isn’t as easy in cases where the network doesn’t own or have access to the computer; for example, BYOD has become common, and IoT devices seldom have the appropriate control interfaces.

As a result, a lot of discussion surrounding protocol development in the IETF is touching on the sometimes competing needs of enterprises and other ‘leaf’ networks and the good of the Internet overall.

Get involved

For the Internet to work well in the long run, it needs to provide value to end users, avoid ossification, and allow networks to operate. The changes taking place now need to meet all three goals, but we need more input from network operators.

If these changes affect your network — or won’t— please leave comments below, or better yet, get involved in the IETF by attending a meeting, joining a mailing list, or providing feedback on a draft.

The large majority of “network operators” are parents trying to protect their kids from Internet scum and filth. Google in particular has made it incredibly hard to do this. Now what Google does to prevent parents from blocking the garbage they host on youtube, images and other sites will be made available to all website operators? What Is the implications to children? Doesn’t this shift control inappropriately from one side to the other? Where is the middle ground?

One has to take a step back and look at the supply-demand clearing of networks today, which are the digital engines upon which everything else now rides. The reality is that none of these discussions include incentives and disincentives or inter-workings or value equilibration. Time to start bringing in some economists and financial types who understand what the protocols are trying to do.