When I need to generate unique id's based on some information hashing is typical choice. However, sometimes that id needs to be of a particular size. I've seen a lot of schemes (HMAC-MD5-96 in SSH, CGA in SeND for IPv6) that use a portion of a hash, so I'm thinking it might be alright to use it that way.

Which properties of a hash are preserved and which do not apply anymore?

Obviously, with less bits, the chance of collision goes up, but can I still rely on the partial hash for uniform character distribution? What about the avalanche property? I'm guessing that if a hash modifies the entire hash upon the smallest change to the input, that would imply that a portion of that hash would be equally well changed. Am I thinking about this correctly?

Also, would these behaviors be different for different hashing algorithms?

4 Answers
4

Yes, starting from a appropriately goodhash or MAC, removing output bits demonstrably does not weaken it beyond the limit imposed by the number of remaining bits, with regards to collision-resistance and preimage-resistance, and other useful properties (such as impossibility to guess anything on the output without knowledge of the input). Put simply, if a standard hash is entirely unbroken, its restriction is unbroken.

More precisely, one appropriately good condition is secure in the Random Oracle Model. For a MAC, it is also enough that without the key, no attacker could harness enough computing power to distinguish the function from a random function with odds better than 50%, given access to an unlimited supply of $(input,output)$ pairs with $input$ iteratively chosen. For a Merkle–Damgård hash like $SHA256$ there are additional technicalities [we replace the key with the constants in the definition of the hash, and posit that they have been chosen at random; also we must make an exception for the length-extension property: from the length of a message $M$, it is trivial to construct extensions $E$ so that $H(M||E)$ is efficiently computed from $H(M)$ without knowledge of $M$].

Proof: If a function is secure in the Random Oracle Model or/and computationally indistinguishable from a random function, then a restriction with less output bit is also secure in the Random Oracle Model or/and computationally indistinguishable, because removing bits deprives an attacker of information and can only increase the difficulty of a distinguisher. Being secure in the Random Oracle Model or/and computationally indistinguishable implies collision-resistance and preimage-resistance within the limits of the number of bits in the output, both for the original function and its restriction.

It is notappropriately good that breaking collision-resistance or/and preimage-resistance of the original hash or MAC is computationally hard. For example consider $H(M) = C||SHA256(M)$ for some 256-bit constant $C$. This 512-bit hash is at least as collision-resistant and preimage-resistant as $SHA256$, which is rock solid. Yet a restriction to its left 256 bits is the constant $C$, and offers no security at all.

Yes, it is just fine to do what you describe. There are other ways to do it (e.g., using the digest of the object as a seed to a random number generator).

A good hash function should be designed such that even if you know the first n-1 bits, you should not be able to predict, with better than a probability of 0.5, the last bit. This is due to the avalanche effect. Another way of saything this is that no portion of the output of the hash function should be "better" than any other portion. I.e., using the even bits has the same strength as using the odd bits, which has the same strength as using the n/2 least significant bits, which is the same strength as the n/2 most significant bits.

Now, as you point out, the chance of collision goes up, but if you are okay with that, then the method you are describing is just fine. Otherwise, there would be a serious problem with the hash function.

output bits j and k should change independently when any single input bit i is inverted, for all i, j and k.

So my understanding is that since this should apply to all j,k in output, this means it should apply uniformly on the output, so any segment of the output should exhibit the same characteristics (statistically).

I couldn't find much about evaluation of hash functions against this criterion, but I imagine any algorithm that fails to have this property sufficiently will be rather weak.

Your solution is fine for the reason you stated of desirable properties assuming you are using a good hash which has the avalanche effect (and is a hash that is not continuous -- something like sha-1, sha-2, or md5).