Sunday, September 2, 2007

Example please...

I've been working hard recently on a native Cocoa XMPP library. (If you don't know what xmpp is, you can read the wikipedia article. It's the protocol behind jabber and google talk.) One of the difficulties I ran into was implementing SASL digest authentication. The XMPP RFC gives an example, but doesn't tell you how it created the proper response. After some googling, I stumbled upon the RFC for Digest Authentication as a SASL Mechanism. Here I found a rather cryptic (yet detailed) algorithm of how to create the proper responses. Of course, I couldn't apply the algorithm to the XMPP example in the RFC since the author didn't bother to tell us what password he used. Fortunately the SASL document had an example. The only problem was I couldn't get my code to match it. Where was I going wrong?

That's one of the problems with writing code. It's basically like a long math problem. And in the end if your answer doesn't match what's in the back of the book, the only thing you know is that there's a mistake somewhere between step 1 and step 80. Happy hunting!

So for the benefit of the community, I'm going to break down an example step-by-step.

First an overview of the stream communication, with data coming from the server in blue, data being sent from the client in orange, and my comments in gray. The authentication will be for user "test" and password "secret".

Where did the realm come from? Sometimes it's supplied by the server inside the challenge. In this particular example it wasn't, so we used the domain identifier of the server. Our server is an ejabberd server running on the local machine, which is why it has a funny name. But in your case it would be something like "deusty.com", "gmail.com", "jabber.org", etc. IE - if your JID is "johnDoe@deusty.com" then your realm (if not otherwise stated in the challenge) is "deusty.com".

The nonce and qop values were supplied by the server in the challenge. The nonce will be different for each challenge the server sends. In this case it looks like a random value.

cnonce is a random string we provide. To make one yourself you could simply use a random number generator. Mine is a UUID (universally unique identifier), and I choose to use UUID's because they're so simple to generate using Carbon's CFUUID class.

digest-uri is easy to make: xmpp/[domainID]So if you were johnDoe@deusty.com, then your digest-uri would be "xmpp/desuty.com"

And nc is the nonce count. It's the count of how many times you've sent info to the server. But since we're only going to be sending one packet, we really don't have to worry about it. Just make it 00000001 like I did.

Step 1: Combine username:realm:password and md5 hash them.So we hash "test:osXstream.local:secret" (without the quotes) and store the result in a variable called HA1data.

Here's the trick - normally when you hash stuff you get a result in hex values. But we don't want this result as a string of hex values! We need to keep the result as raw data! If you were to do a hex dump of this data you'd find it to be "3a4f5725a748ca945e506e30acd906f0". But remeber, we need to operate on it's raw data, so don't convert it to a string.

Step 2: We need to combine the result from step 1 with the nonce and cnonce.So we hash [HA1data]:nonce:cnonce

But wait, the result from step 1 is in raw data format, and the new stuff is a string. So we convert our string ":392616736:05E0A6E7-0B7B-4430-9549-0FE1C244ABAB" into raw utf-8 data, and append this to the end of HA1data.

Step 3: Hash the data from step 2, and store it's hex value in a string HA1.The value of the string will be "b9709c3cdb60c5fab0a33ebebdd267c4".

Step 4: Hash the string AUTHENTICATE:[digest-uri]. So we'll be hashing "AUTHENTICATE:xmpp/osXstream.local", and we store it's hex value in a string HA2.The value of the string will be "2b09ce6dd013d861f2cb21cc8797a64d".

Store it's hex value as the result. It should be:37991b870e0f6cc757ec74c47877472b

And we're done! That's the hard part. Now you just package up the response value along with all the other stuff, encode it as base64, and send it across the wire.

A few other things I should mention:

Sometimes the server sends back the rspauth in another challenge element. Servers do this because this is how the example is given in the original RFC. The client then has to send an empty response to this prior to receiving the success element. Hopefully, in the future, servers will put an end to this insanity and implement it like my example, since this is how it probably should be. 1, 2

After you've authenticated, you'll still need to bind your resource.

After binding, some servers require you to initiate a session before they'll communicate with you.

19 comments:

Anonymous
said...

I have any same trouble regarding making the XMPP clients response. I failed in the final step of the Challenge-response of SASL. XMPP Server sent "not authorized". I found one question. I can get same solution of your step 2 of making response. But I could not get same data in step 3 "b970....". Is same the utf-8 encoding data for pure ASCII code, for example, "3a4f5..."? In other words, "3a4f572..." raw data string is as same as "3a4f572..." utf-8 encoding data, because we use only a ASCII code.http://en.wikipedia.org/wiki/UTF-8

Could you explain how to encoding the raw data to the utf-8 encoding data?Why do we use the utf-8 encoding for the ASCII-code? Is contents of data same?Are the utf-8 encoding data different from pure raw data?Or could you tell us the nice utf-8 encoding sample?

Here is a better explanation of the "raw data" vs "hex string" difference.

First, note that hashing (such as an md5 hash) works on the raw binary data. Think 1's and 0's...

In step 1, the result (in raw data, printed in hex) from the hash is this:3a4f5725a748ca945e506e30acd906f0

However, if you convert this to a string, then you convert it to an array of characters "3a4f..." But wait! This is a problem. Because what is the raw data of a character? It's very different! The raw data, printed in hex, of the UTF-8 character 3 is 33. And a = 61. So if you convert it to a string, then it's raw data becomes this:33613466 35373235 61373438 63613934 35653530 36653330 61636439 30366630

(spaces added for readability)

This is the difference between "raw" and string values. Why did the SASL Digest RFC people decide to make it so complicated? I don't know. Digest access authentication in HTTP doesn't do any of this "raw" data stuff...

Oh it is very clear explanation.string "3" is actually "33" binary data=(ASCII code).

I will try it.Thank you very much.

For example, I saved "3a4f" strings in text file vv by editor, and convert "nkf -w80 vv", so I get same answer "3a4f".This means that the contents of file = 0x33 0x61 0x34 0x67.If we keep this concept, we go to the trouble. I'm in the loop in 2 months.

Great post, the only one I could find on the internet which explains SASL auth response for XMPP.

I know it's been a long time since you posted this, but..

I dont understand the critical part with the raw data. If you're not allowed to save the MD5 hash output as a String because it has to stay raw data, then how do you put it in a variable anyway? What is the data type then, so you can work with it?

I'm using this for an actionscript project and I'm saving it in a byteArray, but that doesn't seem to work for me. I always get the wrong hash output.

For the 'tricky part', could you please post a php version? I am still not coming up with the same values you are using and it breaks down when I combine the $HA1data with the nonce and cnonce strings. I come up with a completely different value. I am beginning to wonder if this can be done in php.

2011? Better late than never. Thank you for the nice example. It was still quite some trouble for me to figure out how to get it done with C++...no abraction whatsoever. Anybody attempting this with C++ should just store nonce+cnonce as unsigned char*, same thing goes for the intermediate HA1 result. Concatenate and hash.