What do some programming languages use?

PHP

PHP has two options. Use urlencode or rawurlencode. The difference between the two is the escaping of the “+” character.

rawurlencode claim to fame is to be compatible with RFC1738. In fact it is not. It encodes all characters as in RFC3986, and then also the “~”. So rawurlencode comes close. It escapes everything, except:

A-Za-z0-9\-_.

So the only difference here is the “~” character. The correct funtion will be:

It is a mess, what to do?

I assume that programmers are lazy and will re-use their good old trusty encoding functions. They should, it is how we train programmers. This will give some interesting problems with OAuth though, as OAuth insists on encoding and sometimes double encoding parameters. And uses those encoded values to calculate the signature of a message.

This insistance on using a very specific implementation of encoding is, in my opinion, the Achilles heel of OAuth.

When checking the signature we have basically two options:

Not decode whatever comes in before recalculating the signature. In that way we won't make any false assumptions about parameters being encoded one way or the other.

Recode all incoming parameter values and names, in that way we make sure that we use the correct encoding (according to the spec) for our signature calculation.

However, we must assume that the encoding used in calculating the signature base string and the key from the consumer secret and the token secret is correct according to RFC3986.

Conclusion

We, as a community using OAuth, need to create test sets that will test all edge cases, otherwise we are in for a rough ride. And, to help other programmers, there is an immediate need for correct implementations of the RFC3986 parameter encoding.

Update

The OAuth community has a nice set of testcases online at wiki.oauth.net/TestCases Let's all use them and make sure that our implementations are correct!