Generating secure cross site request forgery tokens (csrf)

I don’t talk much about security. This is mostly because it’s such a moving target. I’m also horrified that I might give bad advice and someone will be hacked because of me.

But in researching the second edition for the IBM i Programmer’s Guide to PHP Jeff and I decided to include a chapter on security since we really didn’t talk much about it in the first edition. I’m talking about cross site request forgeries right now and I wanted to make sure that what I was going to suggest would not break the internet in some way.

I did some Google searching to see what other people were recommending. Almost all of the pages I found for generating a CSRF token use code like this

1

$token=md5(uniqid(rand(),true));

$token = md5(uniqid(rand(), true));

On the pages for rand() and uniqid(), as well as looking at the C code, they specifically state that these functions should not be used for generating secure tokens. They tend to generate predictable values. And the documentation for md5() states that it should not be used for password hashing. Granted we’re not hashing passwords when creating a CSRF token, but with the tooling available shouldn’t we be using functions that are more cryptographically secure? Like this?

Am I missing something or wouldn’t something like this be a whole lot better?

[UPDATE]

padraicb validated my thought on the matter. The goal here is the random value. As such the hashing using hash_hmac() does not buy you a whole lot extra. The number of possible values in a 32 byte random string is 1.1579208923731619542357098500869e+77. That alone would seem to be enough for a CSRF prevention token. mt_rand() returns an integer which gives you about 4 billion possible numbers. While that will probably protect you, the other value will offer you better protection. There’s no sense in gambling with a smaller value if you have the ability to generate a larger value with virtually no additional cost.

So it would seem that, for generating a proper token the code that you would really need is this

1

$token=base64_encode( openssl_random_pseudo_bytes(32));

$token = base64_encode( openssl_random_pseudo_bytes(32));

The only reason for the base64_encode() call is to make sure that the value provided will not break your HTML layout.

The real security problem in generating a secure CSRF token is the randomness of the seed. MD5 or SHA512 are not so different in this case from a security point of view. The openssl_random_pseudo_bytes() is the most secure way to generate good random numbers in PHP. For instance, in ZF2 we used that function to generate CSRF token in ZendForm.

Thanks. That’s a good point. In other words, using md5() or sha512 is not as important as getting the actual random bits. The hashing, itself, is really only there to make sure that the bits that come out do not break the format. One could almost say that when using openssl_random_pseudo_bytes() you could use md5(), hash_hmac() or base64_encode() without a loss of security, something that would not be possible to say about uniqid().

@kschroeder The primary goal of the CSRF token is to be an unpredictable random string of sufficient length to defeat brute force attacks. So literally the OpenSSL PRNG is sufficient. 32 being a nice length (anything less than 8 being severely weak). Hashing or obscuring the token is unnecessary since the random number is itself is not a secret – what is sent to the user is. If that’s a hash then the attacker only needs the hash. Base64 encoding is merely to ensure the token is a simple ASCII compatible string.
Note: Tokens are generated securely as a standard practice. Also note the “pseudo” in the function name if concerned about entropy consumption ;).

I’d like to add (as I posted to pmjones’ blog) that it is a bit misleading to say that openssl_random_pseudo_bytes() is “better” (security-wise speaking) than any other method that relies on /dev/urandom (or the Windows equivalence on Windows). Reading straight from /dev/urandom, or fetching bytes some other way (which uses /dev/urandom) are all practically equal.
Care should be taken to make sure to avoid those quirks when fetching random bytes. For example, openssl_random_pseudo_bytes() blocking on certain versions, /dev/uradom not available on Windows and security issues with mcrypt_create_iv() (using DEV_URANDOM) on certain versions on Windows.

@timoh you are right but we compared mt_rand() or rand() with openssl_random_pseudo_bytes() and this is better from a secure point of view because it uses a pseudo random source like /dev/urandom. Moreover openssl_random_pseudo_bytes() is supported also on Windows where /dev/urandom is not available.

Yes, it would be predictable – presumably that’s why that code was removed. I’m just saying that is why you see that code all over the Internet (and in various open source projects) – it is because everyone originally copied it from the PHP manual.

The problem with uniqid(mt_rand(), true); is related with mt_rand() that is not cryptographically secure. A more secure way to generate a random token is to use md5(openssl_random_pseudo_bytes(32)); or hash($algo, openssl_random_pseudo_bytes(128)); where $algo is sha-*. If you don’t have the OpenSSL extension enabled you can use the mcrypt_create_iv($length, MCRYPT_DEV_URANDOM); where $length is the size of the random bytes. We implemented a random generator in ZF2 based on this considerations: https://github.com/zendframework/zf2/blob/master/library/Zend/Math/Rand.php#L25

I am not a security expert, so please be gentle.
What does the extra cryptographic security buy us? For long-lived hashes that get used over and over, I can see the point, but for what are short-lived tokens, it seems a bit of overkill.
Additionally, it seems like it would deplete the entropy available to the system more rapidly. Too many CSRF tokens that get used and thrown away means you don’t have the entropy when you need it for real security.

Cryptos (κρυπτός) and graphein (γράφειν) just means “secret writing”. When we’re generating a token what we want to do is give a secret to the person on the web page that will be extremely difficult to predict. The examples that I’ve found tend to rely on uniqid() which is based off of the time and, thus, predictable. So when you’re thinking about cryptography you are probably thinking about the actual act of encryption, which is not what we’re talking about. We are using the tool from one of the first steps in the chain for creating an “unpredictable” value.
The 32 bytes (256 bits) of data give us 1.1579208923731619542357098500869e+77 values, which is a pretty big set of values for you to use and so I doubt that you would deplete entropy.
However, mt_rand() returns an integer, not a series of bytes. That means that you have only 4 billion or so numbers to choose from. Compared to that other huge number, I would choose the latter.

You are making the assumption, though, that a CSRF token falls in the realm of “cryptography.” (Perhaps it is.)
Is not a random shared value, sent along with the form, enough to defeat CSRF attacks? You say the random value is predictable and this may be true, but I’d like to see a demonstration of it. How much time and effort is required to predict it?

There are parts of token generation that, on a basic level, do fall into the realm of cryptography since cryptography is about “writing secrets”. Beyond that the link to crypto is simply that the cryptographic tooling does a better job of providing more, better, pseudo-random values.
When we’re talking about predictability it will depend on which function we’re talking about. If you have a timestamp, uniqid() is actually pretty easy to guess. It was designed to be unique, not unpredictable. And mt_rand() isn’t so much predictable as it has a significantly smaller pool of values to choose from. In other words, mt_rand() is good, but openssl_random_pseudo_bytes() is better.

When using cryptographically strong random bytes, you don’t have to worry about possible edge cases and attack vectors etc. that may appear when using weak randomness. Ie. when the system is under an active attack. I’d make sure CSRF tokens are also generated using strong randomness (it is easy to make sure the system do not get vulnerable, in any situation (edge cases included), because of weak randomness). If strong randomness is not available, just exit with an error.
About “deplete the entropy available”, this is actually not the case with /dev/urandom and alike. System random number generators (like /dev/urandom) do not run out of entropy. Urandom _might_ be low on entropy immediately after a fresh OS install, but this is insignificant when talking about web apps.

I just sent an email to the author of PHP_CSRF_Guard suggesting to use openssl_random_pseudo_bytes() instead of mt_rand(). I agree with @padraicb, the random number provided by OpenSSL is enough for a CSRF token, you can just use it without an hash function.

The MD5 hashes of all outputs from mt_rand() are online. SHA256 hashes can be brute forced at some incredible speeds on a GPU making it fairly pointless for minimal entropy inputs – it’s only a number between 0 and 2^31 (mt_getrandmax()). SHA512 is much much slower that SHA256 but I can’t help wonder if it’s so slow as to take TOO long running only 2.147B comparisons – most hashing tools have GPU support these days and the last GPU generation were marvellous for this task. It wouldn’t surprise me if it took

Based on comments elsewhere, I see the point. Looks like I have to modify https://github.com/auraphp/Aura.Session to use SSL when available, and only fall back to mt_rand when SSL is not available. Thanks, gentlemen.

In ZF2 we used a chain of tests for random generation:
1) if OpenSSL is installed we used the openssl_random_pseudo_bytes();
2) if Mcrypt is installed we used mcrypt_create_iv($length, MCRYPT_DEV_URANDOM); that uses ‘/dev/urandom’ source.
3) mt_rand() as a fallback, but only for not cryptographic purpose.
More details here: https://github.com/zendframework/zf2/blob/master/library/Zend/Math/Rand.php#L25

I don’t like the reliance on random numbers.
I actually think your first suggestion of a HMAC is on the right path, but again not hashing random bytes.
The $data argument to hash_hmac should be made up from serialised data. This should include the full uri to where the form is to be posted, session id, and any hidden values in the form ().
This provides not only CSRF protection, but also another layer of validation to parts of the form.
The $key parameter for the CSRF could be a site wide secret, and do away with needing to use $_SESSION at all.

If it’s 100% deterministic for the server (has no random per-session data), then it’s 100% deterministic for the client. And that means it’s 100% deterministic for an attacker as well. Which basically means that the protection is useless at stopping CSRF style attacks…

Well, you do disclose the derivative of it (via the HMAC), so if they know what goes into the left side, they can attempt to brute force the right side. Not a huge issue, but something to think about.
But in the end, what does this gain you? Nonce is a proven technique that does not require storing cryptographic secrets (which is what your key really is), and has good forward security (breaching today implies nothing towards breaching tomorrow). Your method requires a cryptographic secret, and has poor forward security (a breach today means a breach tomorrow). The rest of the security industry recommends using a random nonce, typically per-request (but at least per session). So what major benefit does this add to that paradigm that it’s worth going against the rest of the industry?

Additionally, 70% of all successful attacks come from inside an organization. Having a configurable value a) requires you to manage the key, and b) is something that an internal attacker may have knowledge of. Using a large pseudo-random number requires no configuration management and is not known by an internal individual. Defense in Depth, baby!

If an attacker that can access the HMAC secret key on your server, you have more worrying concerns. Like credentials to access databases directly.
I wouldn’t say it was going against the rest of industry. The wider security field has created Message Authentication Codes as means to provide assurances about messages. The message in this case is a HTTP POST request.
Benefits:
It’s stateless.
Having multiple forms on the same page, or the user have multiple pages with multiple forms open will work, and each would have different token.
It’s trivial to combine an expiration time within the token, [expires.hmac(expires + data)] so you can shorten the time that a token remains valid. Closing the window on replay attacks.

I have some questions. if we imagine like this. I have function createToken(), then deleteToken() and validationToken() .

createToken() = to make $_SESSION[‘token’] with a unique code..
deleteToken() = to remove $_SESSION[‘token’] with unset() and there are functions createToken. so once $_SESSION[‘token’] with unset() removed, then automatically will make code for NEW $_SESSION[‘token’]..
validationToken() = to match the token code from $_POST[‘token’] in ajax , with $_SESSION[‘token’]. And if successful , it deleteToken() will active. That means , a new $_SESSION[‘token’] will appear..

My question, how to make the code NEW $_SESSION[‘token’] appear in index.php without the need to refresh?