Thursday, 3 June 2010

Websocket gets an update, and it breaks stuff.

Scroll to the bottom of this post for a cheat sheet of what has changed.

Unfortunately the change is not backward compatible. From the blog post:

"These changes make it incompatible with draft-hixie-thewebsocketprotocol-75; a client implementation of -75 can’t talk with a server implementation of -76, and vice versa."

So, lets take a look at the changes, and try to make sense of them.

The specification document is just not readable unless you want to go completely insane. Here's a few lovely bits from the document.

26. Let /key3/ be a string consisting of eight random bytes (or

equivalently, a random 64 bit integer encoded in big-endian

order).

EXAMPLE: For example, 0x47 0x30 0x22 0x2D 0x5A 0x3F 0x47 0x58.

What??? Wait. Let me read that again. a random 64 bit integer encoded in big-endian order. Sure. Make sure you don't use a random 64 bit integer encoded in LITTLE-endian order, that would completely mess up the protocol.

32. Let /fields/ be a list of name-value pairs, initially empty.

33. _Field_: Let /name/ and /value/ be empty byte arrays.

34. Read a byte from the server.

If the connection closes before this byte is received, then fail

the WebSocket connection and abort these steps.

Otherwise, handle the byte as described in the appropriate entry below:

-> If the byte is 0x0D (ASCII CR)

If the /name/ byte array is empty, then jump to the fields processing step. Otherwise, fail the

WebSocket connection and abort these steps.

-> If the byte is 0x0A (ASCII LF)

Fail the WebSocket connection and abort these steps.

-> If the byte is 0x3A (ASCII :)

Move on to the next step.

-> If the byte is in the range 0x41 to 0x5A (ASCII A-Z)

Append a byte whose value is the byte's value plus 0x20 to the /name/ byte array and redo this step for the next byte.

-> Otherwise Append the byte to the /name/ byte array and redo this step for the next byte.

NOTE: This reads a field name, terminated by a colon, converting upper-case ASCII letters to lowercase, and aborting if a stray CR or LF is found.

35. Let /count/ equal 0.

NOTE: This is used in the next step to skip past a space character after the colon, if necessary.

36. Read a byte from the server and increment /count/ by 1.

If the connection closes before this byte is received, then fail the WebSocket connection and abort these steps.

Otherwise, handle the byte as described in the appropriate entry below:

-> If the byte is 0x20 (ASCII space) and /count/ equals 1 Ignore the byte and redo this step for the next byte.

-> If the byte is 0x0D (ASCII CR) Move on to the next step.

-> If the byte is 0x0A (ASCII LF) Fail the WebSocket connection and abort these steps.

-> Otherwise Append the byte to the /value/ byte array and redo this step for the next byte. NOTE: This reads a field value, terminated by a CRLF, skipping past a single space after the colon if there is one.

37. Read a byte from the server. If the connection closes before this byte is received, or if the byte is not a 0x0A byte (ASCII LF), then fail the WebSocket connection and abort these steps. NOTE: This skips past the LF byte of the CRLF after the field.

38. Append an entry to the /fields/ list that has the name given by the string obtained by interpreting the /name/ byte array as a UTF-8 byte stream and the value given by the string obtained by interpreting the /value/ byte array as a UTF-8 byte stream.

39. Return to the "Field" step above.

Do you enjoy seeing basic HTTP header parsing code rewritten in English??? For me, it's beyond excruciating. It's verging on obfuscation. Where is the actual spec? Can we please just see an example packet dump conversation from client to server? You know, the 10 lines or so we actually need?

So, in the initial spec, things were reasonably sane, the client sent over "Hey can I be your friend and play websocket?", and the server sent back "hehe sure lets play bro". Then the two conversed using a reasonably sane binary protocol.

It seems that this was decided to have potential for misuse. Presumably if an insecure server (Not HTTP, something else), was out there, that you could get to say "hehe sure lets play bro", then potentially, you could establish a connection to it, and send fairly binary data to it from there on in (It would have packet headers, but you may still be able to do damage).

Note that the issue here doesn't seem to be anything within the expected usage, rather in forcing a browser to connect to some other say mail/irc server, and getting it to do bad stuff.

So in this new version of the spec, they've added a simple challenge / response. I say simple, but I mean needlessly complex.

Firstly, there's 2 new headers in the request. sec-websocket-key1 and sec-websocket-key2. These contain 2 integer keys. But for some reason, those keys are intersperced with random characters!

16. Let /spaces_1/ be a random integer from 1 to 12 inclusive.

Hickson Expires November 24, 2010 [Page 21]

Internet-Draft The WebSocket protocol May 2010

Let /spaces_2/ be a random integer from 1 to 12 inclusive.

EXAMPLE: For example, 5 and 9.

17. Let /max_1/ be the largest integer not greater than

4,294,967,295 divided by /spaces_1/.

Let /max_2/ be the largest integer not greater than

4,294,967,295 divided by /spaces_2/.

EXAMPLE: Continuing the example, 858,993,459 and 477,218,588.

18. Let /number_1/ be a random integer from 0 to /max_1/ inclusive.

Let /number_2/ be a random integer from 0 to /max_2/ inclusive.

EXAMPLE: For example, 777,007,543 and 114,997,259.

19. Let /product_1/ be the result of multiplying /number_1/ and

/spaces_1/ together.

Let /product_2/ be the result of multiplying /number_2/ and

/spaces_2/ together.

EXAMPLE: Continuing the example, 3,885,037,715 and

1,034,975,331.

20. Let /key_1/ be a string consisting of /product_1/, expressed in

base ten using the numerals in the range U+0030 DIGIT ZERO (0)

to U+0039 DIGIT NINE (9).

Let /key_2/ be a string consisting of /product_2/, expressed in

base ten using the numerals in the range U+0030 DIGIT ZERO (0)

to U+0039 DIGIT NINE (9).

EXAMPLE: Continuing the example, "3885037715" and "1034975331".

21. Insert between one and twelve random characters from the ranges

U+0021 to U+002F and U+003A to U+007E into /key_1/ at random

positions.

Insert between one and twelve random characters from the ranges

U+0021 to U+002F and U+003A to U+007E into /key_2/ at random

positions.

NOTE: This corresponds to random printable ASCII characters

other than the digits and the U+0020 SPACE character.

Hickson Expires November 24, 2010 [Page 22]

Internet-Draft The WebSocket protocol May 2010

EXAMPLE: Continuing the example, this could lead to "P388O503D&

ul7{K%gX(%715" and "1N?|kUT0or3o4I97N5-S3O31".

22. Insert /spaces_1/ U+0020 SPACE characters into /key_1/ at random

positions other than the start or end of the string.

Insert /spaces_2/ U+0020 SPACE characters into /key_2/ at random

positions other than the start or end of the string.

EXAMPLE: Continuing the example, this could lead to "P388 O503D&

ul7 {K%gX( %7 15" and "1 N ?|k UT0or 3o 4 I97N 5-S3O 31".

23. Add the string consisting of the concatenation of the string

"Sec-WebSocket-Key1:", a U+0020 SPACE character, and the /key_1/

value, to /fields/.

Add the string consisting of the concatenation of the string

"Sec-WebSocket-Key2:", a U+0020 SPACE character, and the /key_2/

value, to /fields/.

24. For each string in /fields/, in a random order: send the string,

encoded as UTF-8, followed by a UTF-8-encoded U+000D CARRIAGE

RETURN U+000A LINE FEED character pair (CRLF). It is important

that the fields be output in a random order so that servers not

depend on the particular order used by any particular client.

Oh sweet Jesus what were you smoking? So instead of just sending over a challenge, we're sending 2 challenges, interspersed with some random characters. OK, whatever. But wait, you're saying that the client should send the headers in a *random* order? That's just crazy. That's not a good solution to "Dumb ass server expects headers in specific order".

This 8 byte random key is sent after the initial headers. Don't forget, big-endian or it won't work ;)

OK, so we have the new request from the client, with THREE challenge keys. 2 of them as headers, and 1 of them after the headers.

To make our server work with this, all we now need to do is firstly, send back the headers Sec-WebSocket-Origin and Sec-WebSocket-Location (They were WebSocket-Origin and WebSocket-Location in previous version of protocol), and then after the headers have been sent, send the response to the challenge. Which is md5(BIG_ENDIAN_4byte(key1) + BIG_ENDIAN_4byte(key2)+key3).

Here's the cheat sheet for people who have better things to do than read endless English descriptions of code:

It's fairly simple to setup your server to support both WebSocket versions. If the new key1/key2 headers are present, proceed with the new version. Else use the old.

Why couldn't the spec just include the 'meat'. It's a simple protocol which can be summed up in a page or two. The current spec runs to 55 pages! I'd bet far more than any implementation of the spec.

Are 3 challenge keys more secure than 1? Is adding random characters into the middle of the keys more secure than not doing that? Will it work if we use little-endian for the random number instead of big-endian?

Sometimes these things seem to be ridiculously over engineered to me. It took me far more time to read the spec than it did to update the Mibbit server to support it.

I completely agree that language is painful. They need a nice summary so people can wrap their heads around with links to excruciating detail if wanted or needed.

However I'm not sure why you keep going on about big-endian. If both sides use the value in calculations it matters if you send it BE (ie. network order), otherwise you're going to be using different values. It's not about what's in memory or how the RNG works. If you get a random value and swap it that's fine as long as both sides agree which end is the big end. That's all that they mean by that language, no matter how you store it in memory when it hits the wire it has to be BE.

And you have not noticed that it breaks reverse proxies because it expects the client to send 8 bytes of data that are not advertised in a Content-Length header, and which will not be forwarded until the handshake completes (which won't). This is a shame because most target users of this protocol will use reverse proxies and load balancers !

One thing. In reading the spec I could parse their security keys and get the various numbers they were telling me I should, but when I md5'ed the concatenation of the three numbers (sec1 + sec2 + sec3) I do not get the ascii result they suggest I should. I've tried this in Ruby & Javascript paying attention to endianness and still have yet to get it all sussed out. Any suggestions?