When you try to connect via SSH, you see a signature which is short but I heard it is even stronger than sha256. It is perhaps stronger because it uses more rounds. Is there a hash function or a method on current hash functions that use more rounds or something like that to somehow prevent collision attacks and provide a very strong hash that is not crackable, however with a hash value that is short enough to be easily read? I ask because a 32 or 64 character hash is too long and people don't like to read and compare something like that, for example when we put hash value of our software to download ...

if such a thing exists, please provide a javascript implimentation of it

$\begingroup$Due to the birthday attack you won't find any short hash that is collision resistant. But probably 2nd preimage resistance is enough for you, so take a look at the article about cryptographic hash functions (especially the section called "properties").$\endgroup$
– j.p.Feb 16 '13 at 19:11

$\begingroup$@jug: Without breaking the birthday bound of course, the technique that I describe builds a publicly computable condensate of 16 characters (80 bits of information content) that is collision-resistant against an adversary able to perform $2^{62}$ hash operations, perhaps $2^{78}$; and even safer under some very realistic assumptions (apologies for multiple edits).$\endgroup$
– fgrieuFeb 17 '13 at 18:03

$\begingroup$@fgrieu: Does your proposed algorithm have any advantages against iterating $2^{32}$ rsp. $2^{40}$ times the following and then cutting to 64 rsp. 80 bit: append SHA256(m) to the message m, hash the result and append it to m, hash the result and append it to m, ...?$\endgroup$
– j.p.Feb 18 '13 at 16:30

$\begingroup$@fgrieu: mary wrote "for example when we put hash value of our software to download", which made think that this is mary's use case, and the SSH is just used as example where short signatures are OK. So probably collision resistance is an overkill here.$\endgroup$
– j.p.Feb 18 '13 at 16:34

$\begingroup$@jug: That would be vastly inferior because an attacker using custom hardware on an algorithm that requires only a small amount of space and no decision-making can build his customized hardware such that is many orders of magnitude faster than the CPU we presume legitimate users are going to use. This seriously reduces the security margin. It's much harder to make memory or general-purpose computing devices orders of magnitude faster than commodity memory and CPUs. (If we could, we already would.)$\endgroup$
– David SchwartzFeb 18 '13 at 16:35

2 Answers
2

explaining its conventional solution, using a hash published in hexadecimal, usually of 32 to 64 characters;

explaining a better (perhaps novel) solution, combining two established cryptographic primitives (a hash and a function originally designed for Password Based Key Derivation), plus a slightly better formatting than hexadecimal, giving a 16-character cryptogram with security comparable to existing practice.

I take the question as: how short can a cryptogram be when used to verify integrity of some data, in the common situation where:

we trust the party that computes the cryptogram from the original data;

an adversary knows everything we or the trusted party knows;

we trust the channel by which we obtain the cryptogram;

an adversary is assumed to be able to alter the data in transit;

perhaps: we trust that the adversary could not inject anything in the original data in order to facilitate later alteration.

The traditional approach to that problem is to have the trusted party hash the data, and publish the hash in hexadecimal. For 128-bit security (that is, resisting an adversary able to perform about $2^{128}$ hash operations), if we do NOT assume 5, it is used a 256-bit (64 hex characters) collision-resistant and (second-)preimage-resistant hash.

One temptation is to just assume 5. Then, we no longer need a collision-resistant hash, and can use a 128-bit hash (or a 256-bit hash truncated to 128 bits). However, that would be unwise in some circumstances. For example, in the distribution of some software, the adversary could be introduced enough in the development team to plant an innocent-looking file, say aardvark.c. With a 128-bit Merkle–Damgård hash, and assuming the adversary knows every bit of data before aardvark.c in the archive of the distribution, she can craft that aardvark.c and a completely different version thereof such that substitution will leave the overall hash of the archive unchanged, with effort comparable to $2^{64}$ hashes; which is a lot of effort, but feasible. Notice that using a 256-bit hash truncated to 128 bits fools that aardvark attack, but not one where the adversary knows all the data except the final zymurgy.c that she injects. This is why standard (and good) security advice is to use a full 256-bit hash.

Can we have a shorter cryptogram without giving-up security? YES!!

I propose to:

hash the data using a traditional hash;

transform that hash into a shorter cryptogram using a random-like function that is purposely slow and costly to compute, similar in design to those used for the purpose of key stretching;

express that hash into a slightly more human-engineered form than hexadecimal.

We could use SHA-256 as the hash; Scrypt with the hash as the P/key/password input, a version identifier "1" as the S/salt input, a 10-byte output, other parameters (to be determined) so as to require 4 seconds of computation on a Raspberry Pi model A, but allow parallelization on a system with 16 cores and 16 times as much available memory; and format the 80-bit output as a 16-characters cryptogram using base-32 encoding with the 23456789ABCDEFGHJKLMNPQRSTUVWXYZ characters.

Assuming 5, an attack is expected to require computing power equivalent to $2^{81}$ seconds of the CPU time of our Raspberry Pi, which is perhaps $2^{100}$ hashes on that hardware (assuming 0.5 million hashes/second), and cost equivalent to $2^{116}$ hashes on dedicated hardware (assumed to beat the BCM2835 CPU of the Raspberry Pi by a factor of $2^{16}$ in price/performance ratio when hashing, thanks to massive parallelization and not having to pay the price of a fair fraction of 256 MB of RAM per instance). Thus we have arguably 101 and perhaps up to 117 bits of security for the size of 80, at the cost of 4 seconds of extra computation per use (or less on more capable hardware).

Without assuming 5, a practical attack is possible with in-advance knowledge of the original data (except for the malleable portion), and computing power equivalent to about $2^{43}$ seconds of the CPU time of our Raspberry Pi (which allows computing cryptograms for $2^{40}$ acceptable and $2^{40}$ rogue versions of the data, among which an exploitable collision has a fair probability); with the same hypothesis as above we have arguably 62 and perhaps up to 78 bits of security. Notice that in-advance knowledge of the original data is a very hard requirement for an adversary, and each unknown bit makes the attack twice as unlikely to succeed.

Addition: A quick search did not locate an earlier suggestion of using a deliberately slow transformation in order to shorten a hash while keeping a good fraction of its security. If I am in a position to name the technique, let that be a condensate, obtained by hash condensation.

$\begingroup$@Ricky Demer: you have an awk's eye.$\endgroup$
– fgrieuAug 22 '13 at 6:48

$\begingroup$You should also use the randomized version of SHA-256 so that there won't be $\hspace{1.57 in}$ in-advance knowledge of the original data. $\;$$\endgroup$
– user991Aug 22 '13 at 7:44

$\begingroup$@Ricky Demer: I see a problem with using a randomized hash in this (and some other) contexts: the added random needs to be known to the verifier, and thus moved alongside the data; that eats on our 16-char allowance. Although the random does not need to be trusted by the verifier to the same degree as the hash/condensate, this is still an issue. Of course the random could be replaced by some function of the data, but then we are really making a two-pass hash in hope to improve its collision-resistance.$\endgroup$
– fgrieuAug 22 '13 at 18:20

As far as we know, there is no weakness in SHA-256 (which outputs $256$ bits, i.e. 32 bytes). This means, the best attack is a brute-force attack, i.e. about $2^{255}$ tries to find a preimage to a given hash, and about $2^{128}$ tries to find two different messages with matching hashes (i.e. a collision).

An SSH fingerprint consists of 16 bytes (written as 32 hex digits with : separators) , i.e. $128$ bits. However good the underlying hash function might be, still in about $2^{64}$ tries we can find a collision.
Having a slower hash function here might cost proportionally more work (e.g. by a factor $2^{15}$ if the user is still to accept this), which means finding a collision would be at the limit of what is doable nowadays.

The good thing is that a fingerprint collision is not what you have to worry about as an SSH user – if an attacker succeeds to put up two servers with identical fingerprints, nobody will care.

The more interesting attack would be a second preimage attack (i.e. given the fingerprint (and the public key) of a server to be impersonated, find a second public key which maps to the same fingerprint.
Actually, even this would not be that dangerous – as long as the used signature scheme is secure, the attacker wouldn't find the corresponding private key.

So what is needed for a successful impersonation attack is to find a private key so that the matching public key would have the same (or at least a superficially similar) fingerprint as the real server (or "any real server", if we have multiple targets we can attack). Not knowing what hash is used by SSH for its key fingerprints, I still suppose that the fastest way to do this would be to randomly generate key pairs until we hit a fitting one, which would need around $2^{127}$ tries.

I suppose around any cryptographic hash would do, including simply truncating the output of SHA-256 to 128 bits.

For your software hash, about the same things are needed. A collision attack is only a problem if the creator of the software would create two versions, one of them "good" and the other one "bad", lets someone else check and approve the good one, and then (maybe selectively) offers the bad one for download.

If all you fear is a forgery of your software (or even transfer errors) then you only have to fear a second preimage attack, and 128 bits of a good hash (not MD5) should be enough.

If you want to calculate this in JavaScript (which usually means "in a browser"), please make sure that your JavaScript checker itself, as well as the "right" hash might be forged. So only do this when you deliver the page and the script by SSL/TLS, and there can't be any malicious script in the same context.