Hello Slashdot Reader!

Due to the huge amount of load placed on my server, I'm redirecting
the story linked from slashdot here, so that I don't take a db/perl
hit with each page request. Thanks, and enjoy! --Brian Tiemann

(By the way, nothing crashed, however we were not
informed of the slashdot link and the massive traffic pegged the
server's processor until one of Brian's minions of the dark noticed and
fixed it. --Anonymous Minion of the Dark)

Friday, December 13, 2002

18:07 - What makes IE so fast?

Internet Explorer on Windows always seems either to run impossibly fast
(page requests are fulfilled almost before the mouse button has returned to
its original unclicked position), or ridiculously slow (as with the weird
stalling-on-connect problem that many people, including myself, have
noticed).

One possible explanation is something that my team and I
noticed a couple of years ago, in analyzing packet traces of IE's connection
setup procedure. Microsoft might have fixed this since then; I'm not sure.
But it's a possible culprit.

First of all, for those rusty on their
TCP/IP-- here's how a normal HTTP request over TCP should work:

Client

Server

1.

SYN ->

2.

3.

ACK ->

4.

Request ->

This is how the client and server synchronize their sequence
numbers, which is how a connection gets established. The client sends a
synchronization request, the server acknowledges it and sends a
synchronization request of its own, and the client acknowledges that.
Only then can the HTTP request proceed reliably.

The server's
SYN (synchronize) and ACK (acknowledgement) packets are combined for speed;
there's no reason to send two separate packets, when you're trying to get a
connection established as quickly as possible. Another speed enhancement
that Mac OS 9's stack uses, by the way, is to combine the client's ACK and
the HTTP request into a single packet; this is legal, but not frequently
done. The idea is that within the structure of TCP/IP, you want to minimize
the number of transactions that need to take place in setting up the two-way
handshake necessary before you can send the HTTP request.

When
tearing down a connection, it looks like this:

Client

Server

1.

2.

ACK ->

3.

FIN ->

4.

This generally takes four steps, and the FIN/ACK packets are usually
not consolidated because connection teardown is nowhere near as
speed-sensitive as startup is. (The FIN sequence can be initiated either by
the client or the server.)

Many very stupid companies have tried to
come up with overly clever ways to speed up TCP/IP. TCP, by its nature, is a
stateful and bidirectional protocol that requires all data packets to be
acknowledged; this makes the data flow reliable, by providing a mechanism
for dropped packets to be retransmitted; but this also makes for a more
strictly regimented flow structure involving more packets transmitted over
the wire than in simpler, non-reliable protocols like UDP-- and therefore
it's slower. One company that thought itself a lot smarter than it really
was, called RunTCP, came up with the idea of "pre-acking" TCP packets; it
would send out the acknowledgments for a whole pile of data packets in
advance, thus freeing them from the onerous necessity of double-checking
that each packet actually got there properly. And it worked great, speeding
up TCP flows by a significant margin-- in the lab, under ideal test
conditions. The minute you put RunTCP's products out onto the real Internet,
everything stopped working. Which stands to reason-- their "solution" was to
tear out all the infrastructure that made TCP work reliably, under
competing load and in adverse conditions, in the first place.
Dumbasses.

So then there's this thing we discovered in the lab. We
noticed that when you entered a URL in Internet Explorer 5, its sequence of
startup packets didn't look like the one shown above. Instead, it looked
like this:

Client

Server

1.

Request ->

Uh... what? Dunno what the hell this is. I'll ignore it, or
RST.

2.

Oh, you're a standard server. Okay: SYN ->

3.

4.

ACK ->

5.

Request ->

In other words, instead of sending a SYN packet like every other
TCP/IP application in the world, IE would send out the request packet
first of all. Just to check. Just in case the HTTP server was, oh, say, a
Microsoft IIS server. Because IIS' HTTP teardown sequence looked like
this:

Client

Server

1.

2.

ACK ->

...And that's it. The client doesn't FIN, and the server doesn't ACK. In
other words, the connection is kept "half-open" on the server end. The
reason for this? Why, to make subsequent connections from IE clients
faster. If the connection isn't torn down all the way, all IE has to do
is send an HTTP request, with no preamble-- and the server will immediately
respond. Ingenious!

(I may be
remembering this
incorrectly; it might be that the client does FIN, and the server simply
keeps the connection around after it ACKs it. Instead of shutting down the
connection entirely, it just waits to see if that client will come back, so
it can open the connection back up immediately instead of having to go
through that whole onerous SYN-SYN/ACK procedure. Damn rules!)

Now,
what does this mean for non-IIS servers? It means that if you use IE
to connect to them, it first tries to send that initial request packet,
without any SYNs-- and then it only proceeds with the standard TCP
connection setup procedure if the request packet gets a RST or no
response (either of which is a valid way for a legal stack to deal with
an unsynchronized packet). But IIS, playing by its own rules, would respond
to that packet with an HTTP response right away, without bothering to
complete the handshake. So IE to IIS servers will be nice and snappy,
especially on subsequent connections after the first one. But IE to non-IIS
servers waste a packet at the beginning of each request-- and depending on
how the server handles that illegal request, it might immediately RST it, or
it might just time out... which would make the browser seem infuriatingly
slow to connect to new websites.

This is only marginally less stupid
than RunTCP's "solution"-- and I say "marginally" only because in the grand
scheme of things, this probably makes sense to Microsoft's network
engineers. After all, eventually all clients will be Windows
platforms running IE, and all servers will be Windows platforms
running IIS. And then we can break all kinds of rules! Rules are only
there to hold us back and force us to play nice with other vendors. Well,
once the other vendors are all gone, who cares about some stupid
RFC?

I have to admire their arrogance and their confidence. But it'll
be some time before I can bring myself to admire their technical
integrity.

UPDATE: Since this post got Slashdotted, I've been getting a pretty fair
amount of e-mail, suggesting that the behavior we observed here might be
anything from T/TCP to HTTP/1.1
pipelining to delirium tremens. Well, I
should point out that this phenomenon was something we observed in 1997,
before HTTP/1.1 was in wide use; both the client and server were using
vanilla HTTP/1.0. As it turned out, it was actually the NT stack
that was causing this to happen-- it didn't matter what client or server
software you used. It even happened with our home-grown network test
tools.

It's entirely possible that Microsoft has changed the NT stack in recent
iterations so that this doesn't happen anymore. But if you're trying to
reproduce the behavior, use NT 4.0 machines for worst results.