I would like to understand something about security. Say two parties have a private piece of data (e.g. a password or license key) and the receiver needs to know that the sender has the same key and they can only exchange information over a public connection.

I can think of two ways this can be accomplished which seem secure to me:

The receiver send the sender a random text string. The sender concatenates this string with his private data and sends the SHA1 back. The receiver then does the same and compares the results;

The sender creates a random text string, concatenates it with the private data and sends both the random text string and the SHA1 of the concatenation to the receiver.

The only thing wrong with the second one is that the receiver doesn't have control over how random the salt is and would need to implement a system to ensure that it's random.

I know security is very tricky, so I would like to understand what is wrong with the above scenarios.

5 Answers
5

The first thing to define is the context. Usually, proving knowledge of a shared password is called authentication. It really makes sense only if linked to some other data. Imagine that the two involved parties (let's call them Andrew and Betty) run the protocol over a TCP connection, or something HTTP-based, or whatever. Andrew wants to prove to Betty that he knows the password; assume that he just did; then what ? At that point, Betty is possibly convinced that Andrew has been involved. But if some data comes just after that exchange over the same connection, then Betty cannot be sure that the data is really from Andrew. An evil attacker (Carter) could hijack the connection in some way, just after the exchange: that's possible, since we assumed that the protocol was done over an unprotected TCP connection, or any similarly insecure transport medium. There are several flavours for such hijacking, e.g. Man-in-the-Middle.

So the authentication makes sense only if it can be somehow be tied to subsequently exchanged data, in a cryptographically strong way. There are two main ways to do that:

Play the authentication protocol within a strong tunnel which guarantees overall integrity, and has already authenticated Betty (i.e. Andrew is sure to be already talking to Betty). That's basically what happens with HTTPS Web sites. Andrew (the client) establishes a secure tunnel with Betty (the server), and Andrew knows that he is talking to the genuine Betty because he validated Betty's certificate. Note that in that scenario, it is not difficult to have a tunnel which also guarantees confidentiality (by encrypting exchanged data), at which point Andrew can simply send the password as-is (it is simpler, and no weaker).

Turn the authentication protocol into a key exchange protocol, in which Andrew and Betty derive from their knowledge of the shared secret P (the password) a common key K, which is then used to build a secure tunnel which guarantees data integrity (and, there again, the tunnel could also provide confidentiality).

Without such a binding between authentication and data, the authentication protocol can achieve anything worthwhile only if it is played in a context where Carter cannot alter the ongoing exchange between Andrew and Betty. We then assume that Carter can spy the messages of Andrew and Betty, and we want to prevent Carter from learning the password, because it would allow him to come back later, as a new protocol instance, under the guise of Andrew. Please note that this is a restrictive scenario, which applies to practical computer-over-network situations only insofar as attackers are lazy and do not build full-scale active attacks (i.e. the attackers are not really motivated, so whatever hides behind the password must not be something valuable like, say, your bank account).

In the passive-only attack scenario, we can say the following about your protocols:

Protocol 1 is a classic challenge-response. The general principle is sound, but there are details in the realization to be taken care of. Namely, if you just use a hash function like SHA-1, then it relies on the hash function to be indistinguishable from a random oracle, a mythical theoretical object. Unfortunately, it is known that SHA-1 is not indistinguishable from a random oracle. The right way to make a challenge-response protocol is to have Andrew compute a Message Authentication Code over the challenge, using the shared secret (password) as key. A standard MAC construction is HMAC, which is itself based upon a hash function (you can use SHA-1 if you wish), but HMAC invokes the underlying hash function twice, in a smart way, so as to avoid the shortcomings of the hash function not being a random oracle.

Protocol 2 is weak because it is susceptible to a replay attack. Carter could record a message sent by Andrew and replay it as is later on. The message is a hash of a combination of a random string chosen by Andrew and the shared secret (it could also be a MAC, as explained above -- it does not change things with regards to replay attacks). Betty accepts that message as a proof. If Carter sent the very same message again, why would Betty reject it ? To make this protocol resistant to replay attacks, Betty must remember past messages, and reject an authentication message which it already got beforehand. Since this may lead to unbounded memory requirements for Betty, you would have to include a time stamp in the random string: Betty would reject any message where the "random string" is "too old". This is tricky to do in practice: keeping two distant computers with synchronized clocks is a hard requirement. Also, one must take into account reboots (if Betty reboots, she looses her RAM contents) and multi-frontend servers (if Betty is two servers, both must share the same memory of already seen authentication messages).

When the shared secret is a password, additional problems rise. A password is the specific kind of shared secret that a human will be able to remember, and accept to type on a keyboard. As such, it cannot be too long or complex, and it is possible for Carter to setup a very long, but not intractably huge, list of potential passwords, and try them all. This is called a dictionary attack. The scariest dictionary attacks are those where Carter can "try" a password by merely computing things on his machines, without interacting any further with Andrew and/or Betty: these are offline dictionary attacks and are deadly because Carter could be a bored student who temporarily "borrows" the computing power of 1000 machines at his university. With your protocols, spying one challenge, and the hash-with-password computed over it, is enough for Carter to do an offline dictionary attack (he just tries potential passwords until he finds one which is consistent with the messages he eavesdropped). Offline dictionary attacks can be partially hindered by using salted, slow hashing like bcrypt or PBKDF2. A more thorough solution is the use of a Password-Authenticated Key Exchange protocol.

Therefore, practically, for authentication, you should either:

Use a secure tunnel (SSL) with an appropriate certificate by one party (the server, aka Betty), and have Andrew send the password within the tunnel "as is". That's how the Web works today. Code is available.

Or use a secure tunnel established over the output of a PAKE protocol: Andrew and Betty run the PAKE protocol, using their shared password, to establish a shared session key, with which the tunnel is integrity-protected and encrypted. There again, this is SSL/TLS, specifically TLS with SRP (GnuTLS is a software library which supports that).

Otherwise, if you are in one of these extremely rare situations where active attackers are not to be feared, or if your hardware is so cheap that it cannot cope with strong protocols anyway (e.g. you are designing a protocol for RFID tags), then you might get away with a challenge-response protocol like your "protocol 1", provided that you use a proper MAC (i.e. HMAC) and take care to use a strong enough password (i.e. a sequence of 12 random characters at least, and not at all a four-digit PIN).

Your question is unclear whether the two parties are actually exchanging the secret data or merely want to know if the other person HAS the secret data. If the former, then your question is the very reason why public/private key cryptography was invented.

A public key can encrypt the secret key such that only the recipient can decrypt it with their private key. The private key is never transmitted over the public communication network.

Given that you only want a "yes" or "no" answer, you can use a salted hash as you state in your original question. The license-key lifetime will have to be set to the half-life of however long it will take to brute force the hash if the hash and salt are intercepted.

Alternatively, there are interesting cryptographic problem called the "Millionaire's Problem" which can be used to verify values a and b such that a >= b. You basically need to check that a >= b and b >= a.

Do you mean that what I describe here is public/private key cryptography?
–
Pieter van GinkelJan 17 '12 at 16:01

No. What you (potentially) describe depending on the interpretation of your question is the motivation for why public/private key crypto is necessary.
–
logicalscopeJan 17 '12 at 16:03

No, I only want to know whether the other party has the same key. The data itself can be publicly transmitted and I just need to verify a license code.
–
Pieter van GinkelJan 17 '12 at 16:53

I'm confused. What is private if "the data itself can be publicly transmitted"?
–
logicalscopeJan 17 '12 at 16:58

Sorry for the confusion. No, no data has to be secretly transmitted. I only need a "Yes" or "No" whether the two parties have the private data I refer to in the question.
–
Pieter van GinkelJan 17 '12 at 17:02

1. The receiver send the sender a random text string. The sender concatenates this string with his private data and sends the SHA1 back. The receiver then does the same and compares the results;

This is vulnerable to a man-in-the-middle attack: when the receiver sends the sender a random string N1, Mallory (the attacker) intercepts this request and forwards it to the sender. The sender replies SHA1(N1||K)†, which Mallory also intercepts and forwards to the receiver. The receiver now thinks Mallory knows the secret.

If the receiver and the sender are both willing to initiate the protocol (from their point of view) with anyone (which is presumably the case, since we're assuming the receiver trusts nothing about the sender initially), then Mallory does not need to intercept any communication. Mallory merely needs to ask the receiver to send a random string (pretending to be a sender), then forward that string to the sender (pretending to be a receiver) and continue being a man-in-the-middle without needing to have any control over the network.

If the participants are symmetric in this setting, specifically if the receiver can also act as a sender, then Mallory may forward the receiver's request to itself, so Mallory may be able to pretend to know the secret even though only the receiver is present.

2. The sender creates a random text string, concatenates it with the private data and sends both the random text string and the SHA1 of the concatenation to the receiver.

This protocol is as vulnerable to a man-in-the-middle attack as the previous one.

Additionally, this protocol is vulnerable to a replay attack. Eve (the attacker) eavesdrops on one message sent by a sender. Eve later sends that message again, and the receiver will accept it. The first protocol is not vulnerable to such an attack, provided the receiver never sends the same random text twice.

You can protect against a passive replay attack by having the receiver never accept the same random text twice. The protocol would still be vulnerable to a man-in-the-middle attack; an attacker who is not capable of suppressing packets could nonetheless exploit an accidental packet loss.

You are looking to prove that the sender knows a shared secret that is known to a receiver. This is an authentication problem, not a key exchange problem. Your protocol 1 is at heart a challenge-response protocol. What you're missing is a way to tie the challenger and the responder to their respective roles: if your protocol 1 completes successfully, all the challenger knows is that a responder is present. You are presumably performing this protocol in order to exchange data between the two parties; how are you going to do this?

You can salvage this approach if you realize that it doesn't matter if Mallory is relying packets, provided that you do not consider “who am I connected to” to be trustworthy. In other words, let Mallory participate all she wants, but ensure that she can only relay the actual conversation, not change messages. Then Mallory is no longer more than a router.

You can do that if you authenticate each and every message in the conversation; including the HMAC of each message (using the shared secret as the key) is a good way to do that. Authenticating each message is not sufficient: all it proves is that a participant knowing the shared secret once sent such a message. Each message must also contain a reliable indication of its place in the conversation. In broad terms, a good way to do that is to have each side include a random string as part of their first message (to protect against replay attacks), and include a digest of message N in message N+1.

(Final warning: as usual, this message is for discussion only. I do not condone writing your own protocols. As a practical matter, use a well-known, generally-approved protocol such as SSL/TLS or PGP/GPG-signed email.)

Your first scenario is actually used in some protocols (e.g. APOP) to verify passwords. The problem with using it for passwords is that it requires the server to store the password in plaintext. But in your scenario, both parties already have the same secret, so I don't see any problem with using that method. (Note: I am not a security expert, so there may be other factors I'm overlooking.)

Basically, the way to have two parties verify they have the same piece of data without transmitting it over an insecure channel is to use hashing.

If the secret text is Pieter, both parties hash it with SHA1 to ec0a7fb5e54734c9bccf9f827892f5e14fddd953 and then compare that string. If they agree, both parties have the same secret text.

This prevents a straightforward eavesdropping attack. If I listen in to the conversation, I only get the hash, from which it is hard to go back to Pieter.

That's the heart of this problem.

However, there are more sophisticated attacks people can try, so you have to add more complexity. In your example, you're salting - adding an agreed non-secret text to the secret text before hashing it. This helps prevent certain types of more sophisticated attacks (e.g. rainbow tables). It doesn't help with other types of attack (e.g. replay attack).

So, and here comes security professional stock answer #1, it all depends on what attacks you expect.

Security professional stock answer #2 might come in useful too: NEVER attempt to write your own security algorithm. Find a license key checking library that is widely used and tested and implement that. Or, establish a secure transmission channel and send the secret text over that.