For a project I wonder if there exists some kind of fixed-size checksumming/fingerprinting function in which based on this fingerprint given data block 1, it is easy to generate more data blocks that share the same fingerprint/hash key.

Therefore this is unlike an MD5 sum. (In that I don't know how to easily go back from MD5 sum → new matching file.)

Basically, I am looking to generate the set of data blocks that hash to the same fingerprint of data block 1. Data blocks 2, 3, 4 etc... may be the same size or even smaller than 1 – ideally the less information entropy the better – but the series must be deterministic and finite, and computationally easy to find.

Given a 32-byte key for instance, index should be consecutive integers like you would expect in arrays, and a fixed index should map to a fixed datablock (for that key). Within that set of datablocks, each datablock should calculate to the same hash key, but when enumerated should maintain their relative index position (eg. datablockYYY should always and only be found at 15 index positions higher than datablockBBB).

The tricky part might be that the index range of hashes should not have to be so vast as to need as many bits of entropy as the data blocks themselves generally, but substantially less: limited to a 31bit unsigned integer for simpler testing, let's say. In fact, when I come to calculating the key1 given my starting datablock1, I hope to find a datablockN whose index in the hash is not too far from datablock1's index. The data blocks need not all be the same size, nor all possible blocks of a certain size be mappable by a particular key (pigeonhole principle in reverse). An onto function? Not sure about the terminology.

This question came from our site for scientists using computers to solve scientific problems.

2

There are lots of functions that have this property, but what other properties do you want it to have? E.g., if you wanted a 1-byte fingerprint for an n-byte block, you could just take the first byte as the fingerprint. It's probably pretty useless, otherwise, but it does meet your stated criteria. Can you update the question to explain what uses you want to put the fingerprint to and what properties it needs to have?
–
Bill BarthMar 6 '12 at 13:07

The crypto mods have welcomed this question with open arms (and seem to have some better ideas than I do about how to answer this question). This question is a decent fit here, but a better one at crypto, so we're going to move it. Thanks for asking the question here, and please feel free to bring other numeric/scientific computing questions here in the future.
–
Aron AhmadiaMar 6 '12 at 13:09

Welcome to Cryptography Stack Exchange. What are you actually trying to do here, i.e. for what do you need this pair of functions?
–
Paŭlo Ebermann♦Mar 6 '12 at 18:32

I guess SciComp was too small so they moved it here. Encryption is not my goal but it could get used that way. This is just experimental, one component for theory testing. I wanted to find out about already known hashing recipes, that are practically two-way to somewhat interchangeably swap data blocks with others; which to implement seems to need the properties above.
–
MarcosMar 6 '12 at 21:46

2 Answers
2

For cryptographic hash functions we usually want to avoid collisions as much as possible (and even more we want to avoid any way to get from the output back to the preimage).

So what you want certainly is not a cryptographic hash function, but something else.

On the first look, something like a CRC (cyclic redundancy check) could fit your bill. These have the property that they reduce arbitrary-length messages to fixed-length checksums ($n$ bit for a CRC-$n$), and given a $n$-bit checksum and a prefix of a string with only $n$ bits missing, it requires just some linear algebra to compute the exactly one postfix missing. (You can have other parts missing instead, but then the computation is a bit more complicated.)