Ilya Grigorik
Google Dev Evangelist on HTTP 2.0

August 27th, 2013 | 65 Min

Ilya Grigorik is a web performance engineer at Google, where he spends his days and nights on optimizing the web stack, and driving adoption of performance best practices. He has extensive technology background including experience running his own internet hosting company.

Want access
to updates?Sign Up Now

00:00:00

00:00:00

All right, you guys, so today I'm going to talk about HTTP 2.0, andspecifically why do we need it, and what's the problem that we're trying tosolve more so than anything. Because as technologists we like to solvethings. We like to write cool things. We like to make things fast. Butwhat's the actual problem that we're solving?

So I work at Google as the slide says. I actually work quite closely withthe Chrome team and the Make the Web Faster team at Google. It's kind of a cross-company initiative for both how do we make our services fast and how do wemake Internet fast as a whole.

Part of HTTP 2.0 actually has some roots at Google, because we startedour work on SPDY, which we will touch on in a second, and now of courseit's something much bigger and something I'm very excited about. I think itwill literally have a huge impact onweb performance. It will help make ourclients faster. It will help make service more efficient, and actuallyreduce latency for a lot of users. So it's a big deal.

Right off the bat, what are we trying to solve?

The point of HTTP 2.0 isbasically all about latency. So there's latency and bandwidth. They are thetwo components of speed.

Specifically, HTTP 2.0, if nothing else, isfocused on how do we minimize latency in the client. Today we do a lot ofinteresting tricks, hacks, if you want to call them that, in the browser totry and kind of game the system and figure out which requests do we send because wehave a limited number of requests, etc., or connections rather.So HTTP 2.0 tries to address all of that. So these are the high level goalsfor the protocol.

"A protocol designed for low-latency transport of content over the World Wide Web"

I think almost the first half of this talk is me tryingto convince you that this is in fact a big problem and something that weneed to address. And then in the second part we will actually look at whatdoes HTTP 2.0 do, how does it look, and how does it look on the wire, even.

So first of all, one of the big challenges that we have on the web today ismaking stuff faster. So a lot of user traffic is migrating to mobilephones, something that we see in spades at Google. A lot of our searchesare migrating towards phones, and every single team at Google is trying tofigure out, how do we make our product fast on mobile? This is a big,general problem across the web.

Specifically, what we're trying to figure out is, like, well, we know thatbased on all the user studies out there, despite the fact that everythingseems to be getting faster in our day to day life, there are some prettygood psychological constants. Like, no matter when you do the study- - these UX studies were done in the early'90s, you can do them today- - you'll basically find out that the userreaction time is within a 100 milliseconds.

Now, if you're a hardcore gamer and you're sitting there like, "Hey, dude,I can tell the difference between a 30 millisecond ping time and a 50 millisecond ping time," it's, like, fine, but that'sa slightly different use case. Right?

Here, basically, we're saying. If youclick on a button we want to respond to you within 100 milliseconds. It feelsinstant.If it goes above that, significantly above that, above one second, youbasically lose the context. All of a sudden you do a mental context switchof, like, "I pressed the button. It's not responding. I start thinking aboutsomething else. I've got to send an email," and you may have lost the user.So that's kind of the target, right? We want to have everything fast.

To keep the user engaged, the task must complete within 1000 milliseconds.

And this turns out to be a huge challenge on mobile, where just sending one requestcan take somewhere in the order of a second. And our pages are not onerequest. Our pages are much, much bigger than that.So that's the framework.

In parallel to that we have this challenge of everincreasing and ambitious web that we're building. Compared to what we werebuilding five or even ten years ago, the web today looks completelydifferent.

We're not just building documents. We're building entireapplications.

So this is data from HTTP Archive, http://archive.org. What it does is,twice a month we crawl about 300,000 sites or more for desktop and about10,000 sites on mobile. We don't record what's on the page. We record howthe page is constructed, like how many JavaScript files, CSS files,etc. So these are the stats.Basically, over the course of the last two years we not quite doubled theaverage size of a page, but we're getting there. You can see there's a verystrong correlation. Basically, we're just increasing the size of our pages.

If you actually break it out you'll find that a lot of that is images. Sowe're building more media-heavy sites.The second biggest one is JavaScript, which I guess is no surprise, becausewe're relying so much on JavaScript to deliver smarter applications now. Then there is CSS and HTML. So the good news is it is a little bit smalleron mobile, so we are optimizing for mobile, but nonetheless.

It's growing.

It's not going to stop.

We're building more ambitious apps.

This is justsomething we have to deal with.

And the interesting number here is, of course, this: 86 and 57; so 86 requests. That's what it takes to compose a webpage.

So when you'rethinking about mobile and I tell you that an average page takes 5 secondsto load on mobile, and you assume that an average request takes 500milliseconds or more on mobile, 57 requests, to some degree it's a miraclethat it even loads in 5 seconds. So that's kind of the problem that we're trying tofight.

So at Google we have been tracking this pretty closely, and there's somegood news. Every year, we actually use Google Analytics. Google Analyticscollects navigation timing data, which is basically the real user timing data from the clients when they access your page. We anonymize all thatdata. And once a year we basically run an analysis of what is the averageor the median page load time on mobile versus desktop.

So the cool part about this, these graphs are actually comparing 2012 to2013. You can see that in 2013, the latency, especially on mobile, hasdecreased significantly.

"It's great to see access from mobile is around 30% faster compared to last year."

It's kind of hard to draw conclusions from anaggregate as big as all of the Google data, but our theory, and I think wehave good reasons to believe that this is why this is true, is that this isdominated by North America. And specifically the fact that we have a verystrong rollout of 4G networks across North America. So most of the shift inthis latency is not represented across the world. It's very heavy in NorthAmerica, which is good news for us here. But nonetheless, it's stilldefinitely a problem.

Before we were like 2x compared to desktop versus mobile. Now we're gettingcloser. So that's good, right? Stuff is getting faster. I'm just going tosit back, and just like back in the old days when the CPU speeds just gotfaster, I didn't have to do a thing, this is not a problem anymore. Thenetwork will save us, the 4G. There are banners plastered everywhere. Thefastest, latest AT&T, most reliable, whatever, network. Done. Webperformance, solved deal.That's partially true.

If you actually look at the data, for example, for bandwidth that is in fact what's happening. The speeds are increasing.

So,for example, Akamai has a nice site, Akamai IO, where you can basically goin and type any country and look at average bandwidth, at least as seen byAkamai.So this is data from 2007 to basically the beginning of 2013, and you cansee that there is a strong trend towards basically increasing throughput, or bandwidth across the world.

Average connection speed in Q4 2012: 5000kbps+

We have Japan leading the pack. Butbasically most of the countries are near or well above the five megabit persecond limit. Not limit, just five megabit per second threshold here. Andwe'll see why that's important in a second.

So another component that is often forgotten is latency. When was the lasttime that you saw your ISP, whether that's mobile or whatever ISP, advertiselatency? Like, "Our last mile latency is x milliseconds." Right? Never,ever would they advertise such a thing. Mostly because it's actuallyterrible.

So FCC, for the past couple of years, has actually been doing areport or a yearly study, which has finally started to capture some of thisdata. Which is great.There are two parts of this. One, is now we actually have visibility to thisdata, and two, the reports over the last three years basically haven'tchanged. They're static.

This tells you a couple of different things,but before we get there, basically what they found out is that across allthe different providers in North America you have 18 milliseconds -- that'syour last mile latency. That's basically your router at home to the ISP,basically the POP box at the ISP. This is not even to your actual server.This is just your last mile latency.

18 milliseconds for Fiber -- That's surprising surprising to some people, that it's that large.

Cable 26.

DSL 43 milliseconds.

So just for context,

43 milliseconds is like me going, "I can send a packetfrom here to New York and back in 43 milliseconds." And this is just yourlast mile latency with DSL.

So that's significant. There's definitely roomfor improvement, and this is also a metric that we track quite closely atGoogle.

So worldwide we see that the RTT to Google is about 100 milliseconds, andunfortunately this number hasn't budged over the last couple of years. Ithas just been stable, which is not good. We would like to see it improve.In the U.S. the average latency is 50 to 60 milliseconds. So that kind of makessense, right? If you have 43 milliseconds of latency on your DSL, thenthere's another 10 or 15 milliseconds to actually get to the Googleservers. And Google servers, we try to position them as close as we can toall the ISPs for this exact reason. So this is kind of an optimistic scenario.

Sothe good news is, at least, compared to 2011, in the U.S. the RTT hasdecreased by 10 milliseconds, which is significant. But the rest of theworld is basically flat.

So all of this is to say we are going to continue to see improvements inbandwidth. And bandwidth matters for things like Youtube videos and HDvideos, you like to watch Netflix, what have you. Bandwidth really mattersthere, andthe good news is we can actually get more bandwidth. If wesaturate all of our links, we can just dig another tunnel and put morefiber; we can just bond the different links and get more throughput. That'sexpensive, but it's doable.

With latency it's kind of hard because we have this thing called the speedof light, and we have not figured out how to go faster than that, yet.

Sothere are some interesting examples of people innovating in this space.It's like, well, you know what? Latency matters. We know that, and we knowthat latency matters for traders where nanoseconds count.

So a cool projectthat has actually been started, and I think right now it's paused. I'mnot sure of the actual final outcome, but I know they have invested a lotof money to this, is Hibernia Express.Basically, this company has figured out that, hey, there are traders in NewYork, there are traders in London, that care about latency. So if we builda shorter cable, literally, there are a bunch of cables there, but if we takea slightly more direct route between these cities, specifically 300 milesshorter, then we can save about five milliseconds of latency. Which issignificant if you're a trading algorithm, if you have a trading algorithm.

So this project costs about half a billion dollars. You do the math andyou're like, well, I'm not sure if this is a proper unit, but $80 millionper millisecond. That's what they're saving, and there are plenty ofexamples like this. We are terraforming earth between New York and Chicagoto build faster links. There's crazy stuff going on.Basically, that's the only thing we can do. We can just do a shorter cablebetween two endpoints. But even with that we're fixed.

Sothat sucks, and that sucks because we are already within a small constantfactor of the maximum speed. The current latencies are about 1.5, 1.4 ofthe maximum. So that tells you that even if we do everything to do a directroute and have a perfect link between these two, we're going to get animprovement of 30%, which would be great, but it's not going to transformthe world of web performance.

This is the graph that is the key graph that got Google to start thinkingabout this seriously and actually start the work on SPDY back in, 2009, I guess, or 2008 even. So this is a very simple experiment that we set up.Basically, we picked a bunch of representative web pages on the web and wesaid,

"Look, there's two components. There's bandwidth and there's latency.So let us vary these two things independently. We will just keep oneconstant and we will just increase bandwidth and see how that affects thepage load time."

So page load time here is in milliseconds. You start with one megabit ofthroughput, and the page loads in about three seconds. We increase that totwo megabits per seconds, it's not quite 2x improvement, but it's close,which is what you'd like to see. You continue increasing that, and you findthat after about five megabits per second, you're into single digit percentimprovements.

If you go from five megabits to ten megabits per second, you're going toget your pages loading faster by 4%, which is bad news, right? Because we canincrease bandwidth,and we continue to increase bandwidth, but it's nothelping us build a faster web. It helps us stream video better becausethat's where this stuff matters, but it's not loading pages faster.So that sucks.

But then you look at latency. This is the exact graph thatyou would like to see, which is, the lower the latency, you have thislinear function that just basically says, "You saved one millisecond, you'regoing to get this improvement in your page load time."So that's pretty awesome, except that, as we found out, it's very hard tochange latency.

And as you saw previously in Akamai slides that I showedyou earlier, most of the people, an average in the US, is over fivemegabits per second. So that tells you that

if you want to run out andupgrade your connection and buy into the advertising of newest, fastestwhatever offered by your local provider, your page is not going to loadfaster.

Your video will be streaming better. You may get an upgrade in yourquality, but your page is not going to load faster. Which I think is asurprise to many people, engineers included. This was definitely a surpriseeven to us when we ran these experiments.

So based on this we basically said, "Look, so why is this problem? How dowe solve it at the protocol level?" Because it turns out we need to tweakour protocols to make them better to work around this problem.

Mobile Latency

So everything I've said so far aboutlatency is just two or three or four times worse for mobile. So this is anentire talk on its own. Like, how do the mobile networks work? I thinkeverybody here would agree with the general statement of, like, "Oh, mobilenetworks are so unpredictable. Latency is so variable." It's just very hardto design fast apps that leverage the mobile network.

Let me walk you through this. This is literally a talk on its own, but juststay with me. Let's say we want to send a packet from the external network,like you have a push notification or what have you, and you want to notifythe client on the phone that there is such a thing.So you send the packet from the external network. It comes in into themobile carrier, and the mobile carrier basically has one global router,which is the packet gateway. It has a couple of these kind of big ones, sothat's the PGW. It's the same thing as your router at home. It justterminates the connection.So it terminates your TCP connection right there at the PGW, and the PGWactually looks at a bunch of rules, like, should I be forwarding the siteof traffic and etc.

So it's pseudo firewall, pseudo basic router. Itdoesn't do much more than that. It sends the packet to the serving gateway,and the role of the serving gateway is to figure out where you are on themobile network. Because one of the nice properties of the mobile networkis, I'm currently or I was in Mountain View earlier today. I hopped in acar. Now I'm driving and now I'm in San Francisco, and the local tower hasno idea that I've changed.Basically there needs to be a mechanism to figure out where youcurrently are and which tower is currently servicing you. The servinggateway has no idea.

So what it does is it says, "Look, I'm going to talkto this MMEinstancewhich is basically like a user database." Auser database stores basic things like, you have an account, you've paidyour bills. I should actually forward you this packet, and it also storeswhere you are currently within the network. Except sometimes it doesn'tactually know.It just knows that generally this person seems to be in the San Franciscoarea. So my phone is sleeping right now. It's not notifying. It's nottalking to the tower. It just knows that I'm in this general area, andthere may be multiple towers.

Okay, so we've gotten to the serving gate. We've talked to MME. MME says,"Look, I think he's in San Francisco. Let me flood all the towers in SanFrancisco and get them to send out a broadcast for all the towers thatbasically says, 'Hey user blah blah blah, please wake up, because I have apacket for you'."The towers broadcast the signal. my phone wakes up every once in awhile.They text that there's a message waiting for him, and then sends a messageback to the local tower. It basically starts a negotiation with a localtower.

Once this negotiation is complete the local tower says, "Hey, I'veregistered this user." It gets updated in the user database, user databasegets back to serving gateway, serving gateway can then forward the packetto the actual tower, the tower delivers it to your phone.

If you follow all of that, that has to happen within, well, ideallymilliseconds. Clearly that's very hard.

So this table right here, these are actually numbers straight from the AT&T FAQ. If you actually dig deep they will show you these numbers in there. And for HSPA+, which is the current, like, when mobile providers today advertise 4G, they're actually advertising HSPA+. There's a standard called LT Advanced, which is the true 4G, if you will.

So for HSPA+, the latency, just within the core network, basically doing this kind of flow, and this is the most complicated flow.The flow outbound from your device is a little bit simpler. It's also part of this.But in any case, this is just to illustrate that if 43 milliseconds in DSL seems like it's high, if you look at the mobile numbers, it's hundreds of milliseconds. If you happen to get into the edge zone, which, every once in awhile it still pops up in my phone, and it's scary when it does, you're basically looking at a second of just getting a packet from your phone and out to the external network.Then you actually have to route it on the external network.

So this is a big problem on mobile. We're trying to figure out how to makeit go faster. Thankfully, as I mentioned, 4G and LT deployment, for once,North America is leading deployment of this. We're at the leading edge ofthis stuff. We have the best performance. We're getting down into sub-hundred milliseconds, but nonetheless, big problem.

And all of this is just to send a single TCP packet. We didn't talk aboutsending a webpage. This is just one TCP packet to send a notification.

So hopefully by now I have convinced you thatlatency is, in fact, aproblem. We can continue increasing bandwidth, but latency is a problem. So why does this affect HTTP in particular?

There are actually multiple problems at multiple layers. One, we need totalk about how TCP works. First of all, we have TCP congestion control andavoidance, and specifically we have this feature called TCP Slow Start.

Howmany people here are familiar with Slow Start? All right. Maybe half.

TCP Slow Start is a feature, not a bug.

So, the basic idea behind Slow Start is, we don't know what is the capacityof the link between your node and the destination node. There could be anintermediate node that is saturated, like the ISP is currently servicing alot of traffic for whatever reason and it can't handle more load. So wedon't want to overwhelm the network.

If everybody just woke up and started sending megabytes of data, we wouldsaturate the network and we would just get into the state of congestioncollapse, which is exactly what happened in the mid '80s.

Basically, thenetwork just collapsed, and you couldn't get out of it. You had to rebootthe whole thing. There were instances, reports at the time, when thiscongestion collapse was reached that some packets would literally take aday to get to the other person on the other end.

So Slow Start is that fix, if you will, and the idea of the Slow Start iswhen we start a new TCP connection, we're not going to use all of theavailable bandwidth. We're going to send you a little bit of data, youacknowledge that data if it's delivered successfully, and then, if that issuccessful, we will increase the window size of how much data we send.

So how do we pick this number? Very simple. The original specs actuallysaid you sent one packet. So you sent roughly 1,400 bytes. You acknowledgethat. I'll send you 2,800 bytes.So this is the CWND, and that's basically what I'm showing you here. Sothis is the exponential growth.

That number has been updated. Most recentlyit has actually been updated just in the last year to 10 packets. So we cansend up to 15 kilobytes of data, which is significant improvement over theprevious value, which was three or four packets.

So we can send you 15 kilobytes of data, then we have to pause. I don'tcare if you're on fiber or what have you. 15 kilobytes is all you get.We're going to wait a full RTT, and then we're going to increase that to 30kilobytes and then to 60 and so forth. At some point, packet loss willhappen, at which point we will restart this algorithm. It's a slightlydifferent algorithm. It's congestion avoidance, but that's TCP Slow Startin a nutshell.

So this is surprising to a lot of people, because basically what this tellsyou is

no matter what is the speed, whether you're on 4G network or a 3Gnetwork, latency is actually the bigger problem of the two at the beginningof that connection.

A TCP is optimized for bulk and long transfers of data, whereas a lot ofour actual traffic is short and bursty. So here's an example:

Let's say wewant to transfer a 20-kilobyte file over a low latency link, or arelatively low latency link. So in this case I'm saying we're going totransfer from New York City to London. I'm going to assume that that's 56 milliseconds of round trip time. There's going to be 40 millisecondsof server processing time, which is very fast, and let's just say that wehave five megabytes per second, which is actually irrelevant, but good tohave.

So what happens?First we have to open to TCP connection, which is the SYNand SYN ACK. So that's one round trip. We haven't sent any data. We're justopening a connection. That's already 56 milliseconds. Then we send therequest. We incur the server processing time. But we can't send the actualresponse. We only send 20 kilobytes. We can only send, in this case, four,which is the previous value for this congestion window.So we send four kilobytes. We wait. We get an acknowledgement. We sendeight kilobytes, and then we send the rest.

So you do all the flow and youfigure out that to send 20 kilobytes of data on a fairly low latency link,it's going to take us 264 milliseconds, which sucks, frankly.This does not take into account DNS, and the fact that if you had to do aTLS handshake, that's another two roundtrips, or more, even. So it kind ofsucks.

With HTTP specifically, we have HTTP 1.0 and HTTP 1.1. One of thethings in HTTP 1.1 was that we focused a lot on performance. At least, weclarified a lot of the caching. We actually added this feature calledpipelining. And some of those things have worked out and have been great,and some of them have not.Unfortunately, one of the things that has not worked out is HTTPpipelining.

So the basic idea with HTTP pipelining is, by default, HTTPprovides no multiplexing in the sense that you send a request and you mustblock and wait until you get the response.

So this is this graph right here.Let's say that I have one connection and I want to request three files,it's completely sequential. I send you one. I wait. Then I get it back. Isend you the next request. Which kind of sucks, right?

Pipelining said,"Hey, this sucks." Especially in the case if you have server processing timeand others. This just creates more and more latency."

So what if it could send you all three requests at once? You could dowhatever you need to do to generate those three responses, and then youjust send us the data for three responses back.

So there's a little bit ofa gotcha here in the sense that, for example, let's say I send request one, two,three, and you generate the response for request three first. But the firstone is not finished. You can't send the answer for request three before youfinish response one. So that's head of line blocking. So basically, it'slimited, but it helps you address some of the problems and limitations.

In practice, this is what ends up happening very frequently. You have thatfirst request for whatever reason, like it's a dynamic file that takesawhile to generate, and then the following two requests are like staticassets, which means, in theory, could serve very fast. That request will blockfor a long time, and the other two are blocked too.So in part because of this, HTTP pipelining hasn't seen much adoption.

There's also the problem with a lot of the intermediate proxies justcompletely messed up how they implemented it, or they didn't implement itat all, so they would break, which sucked.Then thebrowser vendors actually had a tough time with this, because putyourself in the person's shoes who is designing this algorithm:

Let's sayyou have a JavaScript file, which is critical, because we need it to renderthe page, and we have a couple of image assets, which are not critical. Itwould be nice if we could get them fast, but I should be able to displaytext first.

Am I willing to gamble that if I send the JavaScript request first it's notgoing to block the others, or vice versa? If I send the image requestfirst, will I get them back quickly? Because you can have a pathologicalcase, or you can have an image file that takes a long time to generate, like aminute, and then all of your requests are piled up behind it.

Basically, allof that is to say that pipelining just hasn't worked out.

Today the web isbasically built on this model right here, which is sequential, and our onlywork around is to just open multiple connections.So this has been our hack of the decade. We just said, look, you can't domuch with HTTP 1.1 pipelining.

So basically all of the browsers over thepast decade kind of tried different variables, and we've all more or lessconverged on up to six parallel TCP connections. So when you connect to aserver we will open up to six parallel connections, which means that we cantransport, at most, six requests in parallel, or get six responses inparallel.

Now, that's only partially true because us web developers, we're aninventive bunch. We're like, "Hey, six requests in parallel. That's notenough. I have an image gallery of six images. Let me just shard thatacross ten different domains."

So that's the whole premise of domain sharding. Theonly reason it exists is to work around this limitation, which is imposedintentionally by the browser vendors to say that too many connectionsactually hurts you, all right, because it causes congestion. So that's whatwe do today.

Going end to end, we have the DNS lookup. We have to do the socket connect.We have to do HTTP request. Then there is the actual content download. Evenhere I'm not showing the TLS time, which takes another couple of roundtrips.

So all of this takes a lot of time, and what typically ends up happeningis, if you do the math for HTTP Archive, coming back to that original thingthat we looked at. An average page, let's say, is about 1,200 kilobytes, 86requests. Turns out that an average page ends up talking to about 15 distincthosts on the web today, which is quite large. That tells you approximatelythe number of connections that we open.

And if you do the math here, youwill figure out that an average request is about 14 kilobytes per request.There are definitely bigger requests for things like images, but most ofthe other assets that we download are very small, and we download themacross many connections.So what ends up happening is we end up opening a lot of these TCPconnections, and we never ever use the actual, or frequently, I should say,not never. Frequently we don't end up using the full throughput of thelink. We just end up doing a couple of round trips, which are veryexpensive. We get up to a window of, like, 45 kilobytes, or 60 kilobytes, andwe stop there. Then we abort the connection, which sucks.

So, we want to behere. We want to have one TCP connection, which has the pipe wide open andwe can just push as much data as we can. Instead we're stuck in this bottomtriangle right here.

To put this in context of mobile, because I think it's very relevant toa lot of companies and everybody, if you add up the latency just for asingle HTTP request... Right?

I didn't talk about control plane. So control plane,the idea here is, first before you can send anything from your mobilephone, you actually have to talk to a tower to get permission to send data.That communication with the tower actually takes anywhere from hundreds ofmilliseconds up to seconds. On 3G networks, on the old generation 3Gnetworks, it literally takes seconds to do that.

So this is just a one timestartup cost when your phone has been idle. It turns out it actually, inlarge part, explains the variability that a lot of people experience withhigh latency variability. But it's definitely a big problem.

So, you do that. You have the DNS lookup, that's the roundtrip TCPconnection. That's the round trip. We have the TLS handshake, which isoptional, HTTP request times four for that 20-kilobyte file, and all of a sudden you're looking at, on a 3G network,you're already one second in. We wanted to render our page in one second.For 4G it starts to get a little bit better, but nonetheless it's aproblem.

I've already mentioned this, but one good thing that has happened recentlyis that the latest Linux kernels have updated their CWNDs to start with 10packets. So this is true as of 2.6.33+, but really you should, if you're notrunning 2.6.33+, you should upgrade hopefully immediately because thatwill literally just double the performance, in terms of the startup performance of HTTP connections tothe client.

But for best performance, there's actually been a lot of otherTCP performance optimizations done since then. So really, you want 3.2+.A lot of this research has been done on Google. This specific paper is theone that argues for the increase in the CWND. There's other things like aproportional rate reduction and other things that are now part of 3.2+.

Soif you're not running 3.2+, you should definitely look into upgrading that.

So these are the current limitations. We have constraints at the TCP layer.We know that we have problems in the HTTP layer. What have we done overthe last decade?

We've come up with a number of glorious hacks to addressall of this. We said, look, HTTP requests are expensive, especially smallones. So what we're going to do is we're going to concatenate files,because that means that I can deliver multiple files without you making therequest. Grab all the JavaScript, stuff it into app.js or all.js and hereyou go. This is good performance-wise. It reduces the number of downloads but, as we're finding now, this actuallyhas a lot of negative implications too.

First of all, it's very expensive for caches because, for example, theGmail team has this big problem where we concatenate all of our files andthen every day we rev our code and there's like 15 bytes that have changedand now you have to download a megabyte of JavaScript, which kind of seemssilly.

So there's a trade off there. If you're concatenating all of yourJavaScript files today, even the static things, like jQuery, you'reprobably not revving jQuery in your app. Keep that as separate becausethat will help you.

But the other one that I think is also surprising to a lot of people isslower execution.

It turns out that for JavaScript and CSS, we can'tincrementally parse those things. We have to wait until the entire filearrives and only then can we start executing that file.

Just bysplitting a big 1megabyte chunk of JavaScript into smaller, let's say, 100 kilobyte increments, we can actually get an improvement in the executionspeed, in the initial startup speed of the app because we don't have towait for the 1megabyte file to download before we can process it. So it'sjust incremental execution, very basic stuff.

Another one that's very similar is spriting images, right? Same idea. Images areexpensive, or requests are expensive, so let's sprite especially the smallimages.

Problem number one is it's gloriously painful. Thankfully now wehave some automation for a lot of that, but I still know people who do thisby hand, which is kind of sad.

Two, it actually has negative implicationson the memory use. When you say you want to display the 16 by 16 icon onyour page, we have to decode the entire bitmap of the sprite, which iswidth x height x 4 bytes and that's your memory use.So these sprites are actually occupying quite a bit of memory on mobiledevices, which is a problem, actually, for a lot of mobile devices.Especially for image heavy apps, you'll actually find people talking aboutexceeding memory thresholds.

I already talked about this, but domain sharding. We are limited to 6connections. What the hell? Let's just shard everything N ways and we'regood to go! Turns out- - we've actually done studies on this as well- - in many cases you're actually hurtingperformance of mobile applications.

Sharding, in more ways, helps clients thathave more bandwidth, which are your desktop clients, but it hurts peopleon slower connections, like mobile phones, because it causescongestion, it causes more retransmissions.

Basically you're making it even worse. It already sucks for those people because it's so slow.You're making it even worse.

And there's no perfect number for what's the right number of shards foryour site. It's based on your app. It's based on your specific page evenbecause different pages have different numbers of assets, like you have animage gallery or something. So that sucks.

Then the last is, of course, crowd favorite, resource inlining. It'slike, what the hell, I'm just going to put this image right directly intothe file or maybe a CSS or JavaScript file. That also has negative costs.

Once again, it eliminates the request, which is good, but that resource can'tbe cached, because now you have to inline it into every single page, whichsucks. Two, of course, there's the overhead of base64 encoding. So you'reinflating the file, like an image file, by 30%.

These are kind of the core, these are the best practices that we preachthat every site should do. Concatenate your files, domain shard, inlineyour images, but then there's a sea of red.

When I look at any given siteI'm like, okay, you've over sharded, you've embedded too many images intoyour HTML page, and you should really split your JavaScript bundles, right? Whichis kind of sad.

That's what HTTP 2.0 is, in fact, all about.Once again, let's come back to this. We want to improve the end userperceived latency. So this is basically how do we make HTTP work betterwith TCP?

Two, is we want to address head of line blocking. So this is theproblem with pipelining, where even if we could send multiple requests, wecouldn't get multiple responses interleaved. So that's number two.

Not require multiple connections, so we want to eliminate the need to havedomain sharding. We just want to use one TCP connection because that is, infact, the best way to get the best throughput.

We're not here to changeHTTP 1.1 in a fundamental way. We're not going to change, I don't know,squiggly brackets or angle brackets or something. We want to preserve whatwe have. We want to preserve the ecosystem and make it as seamless aspossible, ideally, to migrate.

Here's where we are today. This effort actually predates January 2012.Basically what happened was in 2008, 2009, at Google, we looked at thislatency and bandwidth study. We realized that there's a problem and we started working on SPDY, whichbasically became the precursor to HTTP 2.0.

By January 2012, we had Chrome, we had Firefox supporting it. We had a lotof big sites. Google, of course, has been using SPDY for years now. Butthen also Facebook, Twitter and others started picking it up. So basicallyat that point it was becoming a de facto standard and we said, look, thereshould be a more formal spec around this.

At the beginning, we basically picked the latest SPDY draft and used thatas a base. But to clarify,

HTTP 2.0 is not SPDY. We've changed it in anumber of different ways and we made it better. It's based on SPDY, but it'snot SPDY.

So today, actually as of July 2013, we actually have the firstimplementation draft. Earlier this month, we actually had an interoptesting session in Hamburg.

So we have a Chrome implementation of HTTP 2.0.We have a Firefox implementation of HTTP 2.0. There's a bunch of serverimplementations. Actually, Microsoft has build a server implementation, soChrome and Firefox were testing against the Microsoft server.

There's a lotof work going on and it's pretty exciting to see that something of thissignificance is moving as fast as it is, because I think everybody feelslike this is a big problem and it's something we need to address.

Generally speaking, when you see a working group put out a timeline thatsays, "In two years, we will solve a ginormous problem on the web," youhave to take that with a grain of salt. But so far, we're on track and onschedule and I think we may actually hit it. And if we don't hit it, Ithink we'll be very close, which is impressive, given the size of theundertaking.

So, there's a growing list of clients and servers as well. There's nodeimplementations, etc., so you can actually use this stuff today, especiallyif you control both the client and server. Obviously you can't rely on,like Chrome advertising it today, Chrome has a branch where we implementHTTP 2.0, but we don't have it in a stable branch today. For that we haveSPDY. But if you control both the client and server, go for it. This isgoing to work.

First of all,one TCP connection. We want to get the best performance out of a single TCPconnection. We shouldn't need more than that.

We're introducing a new terminto the lexicon of HTTP, which is "stream". Whenever I say a "stream",just think of a request.

Multiple streams can flow over a single TCPconnection, which is the same thing as saying multiple requests can flowover a TCP connection. So streams are multiplexed and streams can beprioritized, so we're going to talk about that.

All of the magic basically happens, and the most important thing about HTTP2.0 is, previously, you could open a Telnet session and just type in abunch of text to say, "Get this page." Now we're using binary framing, andbinary framingallows us to split messages into different binary frames andmultiplex them across the same connection, so I'll show you some examplesof that.

The binary framing is the core change and that core change basicallytrickles out as a whole number of different features within HTTP 2.0. So aslong as you understand that, that's kind of the core of it. It allows newprioritization, flow control and server push, which we'll talk about.

This is just a note from Mark, who's the chair of the working group. We'renot replacing HTTP. We're just redefining how it's laid out on the wire, ifyou will.

One of the questions is, "Is 2.0 really warranted? Is this such abig change? Why not 1.2?"

The answer is, well, because we're changing thewire format, it is such a big difference, as far as we're concerned, thatit is a 2.0.

Because you can't just talk to a 1.1 server anymore. So that'sthe reason for 2.0.

What you need to know about the actual implementation, every frame in HTTP2.0 has a consistent header. There's 8 bytes of header. All of the framesare length prefixed. So if you're a parser guy, you'll be very happy. Thefirst thing that you read is the length of the frame, and at that point,you know exactly what you need to do to parse this. So efficiency isactually a big optimization concern here.

Once you know the length, you can figure out the type. The type basicallygives you what type of a frame is being communicated here. Is it a headersframe or a data frame or something else? I listed a couple of them here.

For example, priority. Each frame can have a number of custom flags thateach frame defines.Then there is a stream identifier. Each stream, the client and server, whenever theycreate a stream they declare an ID on it, like 1, 3, 5, 7, 9, etc.Whenever we split data into these packets, we always embed that ID suchthat on the other end we can figure out, "Oh, this thing that I receivedbelongs to that stream." That's how multiplexing works.Really, that's all there is to it. That's a consistent header.

If you canimplement this- - this is very simple to write in any sort of parser- - you can basically process the basics of HTTP 2.0You read the first 8 bytes and you're good to go.After that, we actually just extend it.

Here's an example of a headersframe. A headers frame is something that you send to open a new request.This is just like me sending a "get" string with a path of the requestpage. I will send a headers frame which identifies the stream ID. I canembed an optional priority.

So what this allows you to say is, "Hey, thisis a JavaScript file. This is very important to me, so serve this with ahigher priority than the image, which I sent you previously." We have 31bits of priority space reserved for that, so you can be very sophisticated,if you want, in how you prioritize this kind of thing.

Q: Do routers listen to this or just the servers?

A: That's a good question.Do routers listen to this or just the servers?It can be both. The server definitely needs to listen to it because it's the one providing the bytes.But if you have an intermediary, then it can be smart about it too. Part ofthis is also flow control, which we'll get to in a second.

Let's see, so culminate by header. One of the cool things about HTTP 2.0,we're going to talk about server push in a little bit. But you can actuallyopen streams from both ends. So, a client can send a request, like, "I wantto get this page" or "I want to get this image," but the server can alsoopen streams back at the client.

You make a request, for example, for apage and the server says, "Oh, you'll also need this fav icon, because youalways ask for it."The server can do that and the question is how do you negotiate the streamIDs? It's very simple. One keeps an even number. One keeps an odd number.So we just increment those. So there's never a race between them. You don'thave to coordinate it, rather.

Actually, let me go back. So that's a priority field. Then, finally, youembed the headers. The headers are like, here's the content length, here'smy user agent, what have you, all these other things.

One of the things that we've learned with HTTP is originally we startedwith a very simple protocol. It was just literally one line. It was like,get this resource, version number. And that's all we needed.

Later, we'veadded a whole lot of stuff, like, here's the useragent, here's the accept types or content types that I support, here's some other meta data. And basically, when you run analysis, you'llfind out that an average request and response add about 800 bytes ofoverhead.This is just in terms of HTTP headers, which is significant.

Because for alot of payloads, like for example, a lot of apps that we're building today,we're sending a JavaScript response back, it's a couple hundred bytes veryfrequently. Then on top of that we're adding this 800 bytes of headers,which is very expensive, which is a common complaint and one of the reasonswhy people are so excited about WebSockets, because it's very low overheadin terms of the actual protocol.

With HTTP 2.0, we actually looked at that and said look, we need toaddress this problem. It is a problem. So, there's actually a new algorithmfor doing header compression. Originally in SPDY, we actually started withjust straight upgzip.We just said look, just gzip through the damn thing. We know that this thing works. But then there wasa couple of attacks discovered against it, basically security problems.

Sowe had to throw that out and there's a new algorithm which the way to thinkabout it is both sides of the connection keep header tables, which arebasically key value pairs of things that have been sent before.On each request, you just toggle those bits. So you can say, "Hey, youdon't have this value in your table, so please add it add it to yourtable." Then in the next request you can just say, "Toggle that bit. I'msending that request again."

Here's anexample. You send request one. This is just a fresh request.It's going to be, "Get request for example. com, a resource file. It's aJPEG file. Here's my user agent string." So you actually send all these keyvalue pairs to the server.

On the second request, you're requesting just the resource. But you alreadyknow that the client, or the server in this case, already has all these values from aprevious request. And the way the algorithm works is basically you togglethe things that you don't want out and you can send new values.

BecauseI'm sending a new request for a resource file... And you know what?That's actually a bug. It should say "resource 2". So this is the onlyvalue that has changed, so in this case, I would only send this one keyvalue pair, which is great.

What this tells you is, for example, if I'm sitting in a loop and I'm justpolling the server for an update, the overhead of that request in terms ofheaders is zero because it's the exact same set of headers. I don't have todo anything. And over the lifetime of the connection, you build up thisheader space and you can basically be very efficient in how you encode anddecode this kind of thing.

This is another actually interesting opportunity for servers to do asmarter job, and intermediaries as well, in terms of what are the rightalgorithms for doing the eviction of these headers, so on and so forth. Sothis is a very important part about HTTP 2.0. It makes it very, veryefficient to transfer this meta data.

So we sent the headers, so we've had that block, the consistent header, I should say, theheader block, and then after that you actually send the data, or thepayload. So the header's frame just carries the meta data about therequest. Then in separate frames and data frames, you actually split themand put them into these data frames, which the whole thing consists of justthe consistent header, that's the 8 bytes, followed by the actual payload.There's nothing more to it. It's as simple as that.

The one interesting gotcha that has been added recently to the spec is, ifyou look at the length field of this frame, it's 16 bits, so in theory youcould send 64 kilobytes of data, but to reduce header line blocking, we'reactually limiting that.

In the spec, we're saying no frame should be biggerthan 16 kilobytes. Because otherwise, you just create more and morecontention or more blocking. So if you have data that is larger than 16kilobytes, you would just split it across multiple data frames. Then in thelast data frame you just toggle a flag that says this is the last frame ofthat sequence, and that's how you communicate larger payloads.

So all of that kind of in a simple picture here. The client opens multiplerequests. It has a single connection and it can split all of those requestsand responses into individual frames.

The server can interleave the framesand it can use one TCP connection to deliver them in parallel. So if yousent three requests, one for JavaScript and a bunch of images, and the imagesare just right there and they're available, you can start streamingimmediately. You can be then preempted by the server because the server cansay, "Hey, previously I was servicing the JavaScript request but theJavaScript request has higher priority. And now that I have bytes, I'mgoing to give it higher priority to stream it down this pipe." So you canoptimize, you can get the best bandwidth and you can also get the bestthroughput.

But of course, one of the gotchas here is the server needs to be smartabout this. Previously we've had a lot of logic on the client for when dowe schedule requests. One of the things that we're doing now, we just maybea month or so ago committed this into Chrome, where when we're using SPDY,we remove all that logic.

We just say, "Hey, there's a request thatneeds to be made. Send it to the server." So we're implicitly relying thatthe server will do that right thing, or the smart thing. It won't saturatethe link with JPEGs or large image files when we're asking for JavaScriptand CSS and images.In some instances we've discovered that doesn't actually work out.

Forexample, the current nGenx implementation does not respect priorities. Sowe've actually seen a degradation of performance in those cases. So theseare the kinds of things that need to be fixed at the server layer.Servers needs to get smarter.

Flow control is kind of interesting. One of the interesting properties ofthis, and you'll discover this, any time you layer a protocol within aprotocol, you'll end up running into this exact same problem.

Now thatwe're interleaving multiple streams or flows within a single TCPconnection, how do you rate limit or control the allocation of resourcesbetween those flows?

This is especially important for proxies andintermediaries where you may say, "Hey, I have a video stream and I havethis stream. The video stream can easily saturate my link, but I want tolimit it at this amount of throughput." So flow control allows you to dothat.

If you've ever studied TCP flow control, one thing you'll know is that thisis a problem that's been solved many times in the sense that there are newsolutions coming out for TCP. There's new proposals. There's new congestioncontrol mechanisms that are being proposed to this day.

We've been workingon this for 20+ years. So we're not introducing a new flow controlmechanism in HTTP 2.0. We're basically providing the building blocks to sayevery connection is going to start with a 64k window. Every time you send adata frame, we're going to decrement that window by the size of that frame.Then you have a special frame called the "window update frame", which willincrement the size of that window.And how your server implements the logic of when to increment that windowis completely up to you.

We're building you basically the shovels tobuild flow control. We're not providing an algorithm within HTTP 2.0. It'san interesting opportunity for, I think, innovation in the space. That'sintentional. We know this is a complicated space, we can't solve it in HTTP2.0, but we're going to provide the tools for you to do that.

The last thing that I'll mention is HTTP 2.0 push. I already talked about thisbriefly but the idea here is that, very frequently you request a page filelike a static homepage, what have you, and we give you the HTML and thenyou come back to us with all the resources that we've told you that you'regoing to need in that HTML. So why shouldn't we be able to send youmultiple responses to one request?

You say you want the page, we say,"Here's the HTML file. Here's a CSS file. Here's a JavaScript file andhere's a bunch of other assets that you may need. " So how does push work, CDNs and intermediate cache? That's a goodquestion. I don't think we have a good answer tothat.

Part of the thinking, right now at least, is that CDNs can actually beproviding this push.One of the things you can do, so let's take an actual scenario. I make arequest to your server, your server says, "Here's the indexed HTML file andhere's three other assets."

What if I have those three assets in a cache?Well, then you can actually cancel the stream. You can send a frame called"reset stream" and say, "No, no, I want to refuse this," or, "I don't wantit."If you have an intermediary in there, it can actually drop those streamsand just reset them back if it doesn't want to accept them.

So if your CDNdoesn't accept it, it should respond with a reset stream and that should bethe end of it. You may still end up transferring a little bit of data, buthopefully that's not such a big deal. So there's a bit of a risk conditionthere.

In a worst case, if you're intermediary refuses these push streams,we have the HTML parser, which are going to discover them and they'll sendthose things anyway, so this is an optimization.

The interesting thing about this is it's useful in the context of abrowser. It may even be more useful in the context of just general RPClayer where, don't think of HTTP 2.0 just as a browser client. Now what youcan do is, you can send a naked request. Like let's say you have your Javaclient, you send the request to the server, the server can push multipleresponses back to your client and you can do smart things with it. Sothat's push.

One of the problems with deployingsomething like HTTP 2.0 is, well, we have a lot of existing infrastructurethat can't be upgraded overnight. We have clients that we can rev.Something like Chrome, we can certainly release a new version and that'llbe nice and smooth. But, lots of old IE clients, lots of old servers thatwon't be updated, etc...

So, how do we make the switch as seamless as possible?

There's two standards or two ways that are proposedin the spec today.There's the typical HTTP upgrade flows, if you guys are familiar withWebSocket. We send the request. We want to request the page and we alsosend a connection upgrade header and we say, "Hey, we would like you toupgrade to HTTP 2.0, if you support it." And if you do, the server can say,"Okay, fine. I will do a one and one switching protocols." It respondsbasically with those headers and right after that it's sending HTTP 2.0data on that TCP connection. So there's no additional roundtrips, which isgood.If it does not support HTTP 2.0, it can just respond with a 1.1 response.So there's no penalty in this case, which is nice.

The more preferred way to actually do it, for a variety of reasons, isactually via TLS and ALPN.

ALPN adds a mechanism into TLS negotiation whereyou can actually negotiate the application protocol that you want to useduring the time of the handshake.

Basically, the way it works is when youopen the actual handshake for TLS you say, "Here's my private key," or"Here's my public key." Hopefully you're not sending your private key."Here's my public key and, by the way, I support these protocols."

Theserver then says, "Okay, fine. I'll sign this request and I like theprotocol that you're advertising, in this case HTTP 2.0,so I will run HTTP 2.0," and it responds back with aprotocol field called "ProtocolName" and says, "Okay, fine. We'll talk HTTP2.0."

So by the time the handshake is complete, we haven't added any moreroundtrips to do this kind of thing, but we know right at the end of the TLShandshake that we can use HTTP 2.0.

The reason this is better, at leasttoday, is that in practice, there's a lot of intermediary caches, proxies,what have you, on the web, even antivirus software running on clients thatsniffs traffic on port 80 and breaks in spectacular ways.For example, there's actually antivirus software out there that we'vediscovered. For example, with SPDY, it would sniff port 80 if we do it onencrypted, and it would say, "Look, this doesn't look like HTTP traffic.This looks malicious. Oh, this user is under attack. Let me close theconnection."

Or we have intermediate proxies which don't even parse HTTPproperly. They just look for strings in the byte stream and just swap themout. That breaks the protocol in spectacular ways.

Basically, there's all this infrastructure where if you do this HTTP 2.0 orwhen we did SPDY in the wild, we found that in 20% of the cases ourconnections just would fail randomly and we could never figure out whybecause there's some software running on a client, some intermediary orsomething.

If you're running a site like Google. com, having 20% of yourusers not being able to reach your site is kind of a problem. The way toaddress that is basically to run it over SSL, because then we bypass allthose intermediaries and then we're talking end to end. So encryption's notthe point here. It's the fact that we have this clean tunnel between thetwo ends.

In practice, that's what you're probably going to end up using for HTTP2.0. This is exactly the reason also why WebSockets work over a TLS formobile and other cases and they break for a lot of clients, especially onmobile, when running over vanilla HTTP. So if you're having troubles withWebSockets, run it over TLS, you'll be fine. That's what it is today.

If you're interested in this kind of stuff, I did write a book about it.It's actually not in print yet. It will be hopefully by the end of next month, butit's all online and it's free. So if you're interested in HTTP 2.0 or TCPand TLS and all that kind of stuff, I go into all the stuff in depth, so check that out, and I'll have a link atthe end.

With all the stuff going on with HTTP 2.0, theTCP performance part becomes even more important in many regards, so youshould definitely upgrade your Linux kernels, make sure that you have thelatest TCPU window or congestion control in place. All of the previousoptimizations like positioning your data closer to the user still applies.

The fundamental problem or limitation today is still latency, so the closeryou can get your data to the user, the better they're going to be, thebetter the performance is going to be. You want to compress the data, etc.

So TCP forms are very, very important.In fact, there's a little bit of a caveat with all the HTTP 2.0 work, which isTCP packet loss happens. That's how TCP works. TCP packet loss has tohappen for TCP to work properly.

But the problem with packet loss is whenthat does happen, we reduce the size of the window, the congestion window,in many cases, fairly significantly. So when you had 6 connections or 10connections open and you would have packet loss in one of thoseconnections, the throughput of that one connection out of, let's say, 10,would get decreased.

If you're running all the data through one speedconnection and packet loss happens, it affects you in a much moresignificant way.We know that's a limitation. It turns out that when we've run the studies- - that is a problem- - even despite that, HTTP 2.0 still delivers betterperformance.

It's something that we can address at the TCP layer, which isbasically why I'm saying you should upgrade to Linux 3.2. Because one ofthe things in 3.2 is we have proportional rate reduction, which improvesthis kind of fairness problem with running HTTP 2.0 over a singleconnection.

Because you're going to be likely deploying TLS, TLS optimization iscritical because a TLS handshake is actually very costly. It takes multipleroundtrips to do that, so you have to pay attention to your certificatesize, you have to optimize your record sizes. This is an entire frontierof performance that I think not a lot of people are paying attention totoday. We're going to have to get very, very good at optimizing TLS. Sothere are a lot of things that we can do can do in this space.

One of the nice things about migrating to HTTP 2.0 is we can undo a lot ofthe glorious hacks that we've been doing in our applications. And this isgreat because it'll make our application simpler. I don't have to tell youto concatenate your files, you don't have to sprite your images, you canjust do the right things.

We can keep our code modular. We can keep itnice. We can just make these requests. The server should do the right thingand you shouldn't have to inline assets. All the stuff should be handled inHTTP, which is how it should have been to begin with.

So if you are looking at deploying HTTP 2.0, the number one thing forperformance that you need to do is you need to unshard your assets.

This isstep number one, regardless of anything else, because running over multipleTCP connections with HTTP 2.0 will hurt your performance.

Basically, you'renot going to get the benefits of performance. You shouldn't be any worseoff, but you're not going to get the benefits. That's just empirically whatwe've seen.Then after you've done that, you can start undoing other glorious hacksthat you have in your code. Hopefully, now that we've convincedeverybody to concatenate all their files, we can tell them to undo all ofthat.

The end result of all of this is actually, well, simpler applications. Itshould result in faster delivery. It will deliver better caching because wedon't have to invalidate these giant BLOBs just because we changed fivebytes of data, and it'll actually have a significant impact on fewer serverresources.

Your servers will have to maintain fewer TCP connections, whichis a big deal for people that are running servers that have to handle a lotof TCP connections. Each of those TCP connections has a memory buffer,which is quite costly in many cases.Even in simple simulations with a proxyserver, we can show that there's significant improvements inthe overall throughput of the system in terms of number of clientsserved, latency of the system, etc. So this is a big win for everybody, clientsand servers, so it's something to be excited about.

Benefits, I think I've talked a lot about. Some of the opportunities andsome of the stuff that still is ongoing and needs to be done, smarterservers. This is a big thing.

We can write the spec, but the servers needsto get smarter. Servers needs to respect priorities. What we've done iswe're basically moving all of scheduling logic away from the browser andinto the server. We're placing our faith into the server. The serverneeds to be smart. If it just saturates the pipe with images, it's going todeliver a poor performance. This is something that we need to do a lot tooptimize.

Server push, there's a lot of opportunities there. What are theresources you should push? How do you determinethat? One cool strategy that the Jettyguys have implemented with SPDY is they listen toinbound traffic, they look at the referrer headers and, after some amount oftime, they basically build a map that says, "Hey, you've requested indexedHTML " and then the referrer header says that you've also requested, based onthat, after you receive the HTML, you've also requested these three imagefiles, the CSS file and other things. So they construct that map and thenthey start pushing these assets to future clients.

So this is completely automated, which is the nice part about it. I justlet it run, it just listens to traffic and starts adapting to traffic,which is pretty cool. For sure you could implement the manual strategy, maybeyou can really hand tune your application for a specific case and say,"Always send this file," or, "Always send it to this client ". There's avariety of different strategies for how to implement server push.

So onceagain, this is something that both web developers and server developersneed to really carefully think about, like how do we leverage this newthing that we just never had in HTTP before?

Clients - same thing. One thing that I'm familiar with is, I've built acouple of HTTP clients in the past. Most clients in most languages,especially, actually, the default HTTP clients are terrible. They don'tallow you to reuse the connection. They don't support connection reuse, soall that stuff needs to be replaced with something smarter.

If wejust build HTTP 2.0 clients which do the same thing as before, which ismake a request, throw away the TCP connection, this is all for nothing,this is completely useless. So all of that needs to be replaced.

One of the cool things about HTTP 2.0 is we've invented, over time, a lotof new RPC layers. At Google we use Stubby with protobufs, Facebook hastheir own, there's a whole number, like the MessagePack has its own RPC layer.There's a lot of different layers that other companies use internallybecause HTTP is not fast enough. We can't multiplex connections and all therest.

Now that this is here with HTTP 2.0, you can rip out all of that code andreplace it with this, because it's actually better in the sense that it'sbeen battle tested and a lot of people have thought hard about thisproblem.

This infrastructure is going to be built. There is going to becommercial support for this kind of stuff. You're going to get routersupporting it and all the rest, you don't have to roll your own RPC layerinside.

So if I was building a backend system today, for whatever company, I wouldstart with HTTP 2.0, because that's just the right long term bet, as opposedto using Stubby or something else within a company. I think that's a bigchange that needs to happen.

Finally, there's actually a lot of questions about, "How do we migrate allthe people from HTTP 1.0?" You've optimized your site, concatenated all your files, now I'm telling you to undo all of that. But the switch isnot going to happen overnight. We're going to have a lot of old HTTP 1.0clients, so how do we manage that transition?

One answer to that is if youuse a dynamic optimization service, mostly every CDN today has somethinglike this, where they'll rewrite your assets, they can actually be smart aboutit.

For example, a page speed product that we have at Google, we look at theincoming headers and we say, "Look, this client has HTTP 2.0. We won't doconcatenation, and that concatenation happens dynamically. Whereas, thisclient has HTTP 1.0, so we'll serve you the bundled asset."

This kind of thing needs to happen at the routing layer and all the otherlayers within the system. That also requires quite a bit of plumbing andarchitecture in terms of how do you deliver that to client.