Are there any security implications for hashing and storing sensitive data like this?

Is it more or less secure than using the full SHA1 hash?

Is there an increased risk of hash collision when using the truncated version?

This is sparked by a debate over somebody doing this, where their reason was that it would be better to use the SHA512 version because it is a more secure algorithm, and by truncating it a potential hacker wouldn't know which algorithm was used in the first place.

My understanding was that SHA1 would always produce a unique value, whereas there is a chance that the first 40 characters of a SHA512 output could appear many times.

Just a note regarding terminology; hash output is in bits, not 'characters'. Your example above is hex-encoded (ie, 2 hexadecimal characters per byte), but that's just a representation... the underlying data is always in bits. You might consider editing your question to something like "truncating sha512 to the first 20 bytes...".
–
hunterJul 26 '13 at 12:31

3

@hunter A hex string is a series of characters, representing binary values in a human-readable, non-binary format. So, when he says "character", he's not that incorrect. But even if that were wrong, I think you should be gentle on him and remember he says he has a web-dev background. I mean, let's be realistic: only people like us pro coders and crypto-ninjas really think in bits and bytes. For the rest of the world, whatever looks like an ASCII character is a "character". ;)
–
e-sushiJul 26 '13 at 16:06

3

@e-sushi ... thanks for explaining hex to me - I'm actually a web-dev too O_o. My point was that the OP's questions changes entirely if the hash were to be Base64 encoded (as an example). As such, it's helpful if people phrase crypto questions in terms of bits and bytes, to ensure that everyone is on the same page (so to speak). I know all those 1s & 0s are confusing for us simpleton web-devs, but I think it's important for anyone implementing crypto to understand basic concepts and terminology.
–
hunterJul 26 '13 at 16:56

Could you indicate if you are OK with changing the title to 160 bits? Hash output is normally given in bits... Don't forget to hit accept on the answer as well, I think it cannot be explained too much better than Reid already did.
–
Maarten BodewesAug 6 '13 at 23:47

1 Answer
1

As a general rule, you should avoid SHA1 for new applications and instead go with one of the hash functions from the SHA-2 family.

As far as truncating a hash goes, that's fine. It's explicitly endorsed by the NIST, and there are hash functions in the SHA-2 family that are simple truncated variants of their full brethren: SHA-256/224, SHA-512/224, SHA-512/256, and SHA-512/384, where SHA-$x$/$y$ denotes a full-length SHA-$x$ truncated to $y$ bits.

Are there any security implications for hashing and storing sensitive data like this?

As far as determining the sensitive material from the digest itself, you're safe. All secure modern-day cryptographic hashes have what is called preimage resistance, which essentially means that it is computationally infeasible to "reverse" the hash, if you will. So, your sensitive data's confidentiality won't be compromised by storing the digest.

Now, the real question is: why are you wanting to store the hash in the first place? Hopefully you are not using it to detect if the data is maliciously modified; that generally is the purview of a MAC, such as HMAC or CBC-MAC.

Is it more or less secure than using the full SHA1 hash?

Much more secure, actually, if you care about collision resistance. There is a (theoretical) attack on SHA1 that finds collisions in 260 time, whereas truncating SHA-512 to 160 bits requires 280 time to find collisions (see the birthday attack). So, truncating one of the SHA-2 functions to 160 bits is around 220 times stronger when it comes to collision resistance.

Is there an increased risk of hash collision when using the truncated version?

Increased risk over SHA1? No. Increased risk over using the full SHA-512 output? Yes.

Truncating the output of a hash function always decreases its (theoretical) collision-resistance. In practice, it usually doesn't matter too much; for instance, 280 time is still pretty big. Still, if you used the full output of SHA-256, the same birthday attack would take 2128 time, which is totally out of reach.

by truncating it a potential hacker wouldn't know which algorithm was used in the first place

Always assume the attacker knows everything about your algorithm/cryptosystem except for the secret keys. This is known as Kerckhoffs's principle. I could talk for hours about why this principle is important, but let's just leave it at: you should follow it.

My understanding was that SHA1 would always produce a unique value, whereas there is a chance that the first 40 characters of a SHA512 output could appear many times.

SHA1 doesn't produce unique values. There are infinitely-many possible inputs to SHA1 (it takes a bitstring of any length), yet there are only 160 bits of output. By the pigeonhole principle, there have to be infinitely-many values that map to the same 160 bits of output. But despite that, no one has ever found a collision for SHA1. (Or for that matter, SHA-2.) But, again, there still exists a theoretical attack on SHA1, which is why cryptographers recommend against it.

I should note, if you are going to use the full output of SHA1, SHA-256, or SHA-512, you should be aware of length extension attacks.

A caveat to the "security implications" section: if the plaintext values are guessable (or chosen from a limited set), an attacker will be able to guess-and-test possible plaintexts. For example, if you hash the State field of an address database, there'll only be 50 distinct hash values, and an attacker won't have much trouble figuring out which is which. Similarly, if you hash the Name field, and the attacker wants to find out if "John Q. Smith" is in your DB, they can hash that and look for a match.
–
Gordon DavissonJul 27 '13 at 14:49

@GordonDavisson: Very true. To tell whether or not it's really secure we'd have to know the application.
–
ReidJul 27 '13 at 15:04

2

Note that SHA-224 is not a simple truncated version of SHA-256: the IV is different. SHA-224 is computed like SHA-256 and then truncated, but as the IV is different, the intermediate 256 bit result of SHA-224 is totally different (in general) than the SHA-256 computation. The same goes for SHA-384 vs SHA-512.
–
Henno BrandsmaAug 13 '13 at 18:07

In some tests I did, truncated hash functions resulted in far more collisions than hash functions of the same length (although the untruncated version didn't collide, of course). I don't think I used sha512 back then, but for instance aggressively truncating MD5 to 4 bytes resulted in more collisions than using the CRC32 of the original value.
–
ÁngelSep 10 '14 at 19:51