I'm trying to compare two graphs using hash value ( i.e, at the time of comparison, try to avoid traversing the graph )
Is there a way to make a function such that the hash values compared can also lead to determining at which height the graphs differ?
The comparisons between two graphs are to be made by comparing children at a certain level.
One way to compare the graphs is have a final hash value for the root node and compare them, but that wouldn't directly reflect at which level the graphs differ, since their immediate children might be the same ( or any other case ).

I'm guessing that what you're after is an efficient way to compare two graphs: if so then you might be better off asking a question about that rather than the specifics of one potential approach to the problem...
–
vaughandroidJun 27 '12 at 15:56

2 Answers
2

You could probably cobble something together where each level of the graph affected a different portion of the hash value. What you'd essentially be doing here is creating a separate hash value for each level and appending them in sequence.

If the number of levels in your graphs isn't fixed/known, then you'll have to wrap this around. This wouldn't tell you exactly which level was different, but it would narrow down your options.

e.g. For a 32-bit hash and graphs with an unknown number of levels, each level could still hash to 4 bits: bits 1-4 would be affected by the 1st, 9th, 17th, ... layers; bits 5-8 would be affected by the 2nd, 10th, 18th, ... layers; and so on.

Why it's probably a bad idea:

The hash function I've described is a rubbish hashing function. I expect it would produce a lot of clashes, and poor distribution.

Using hash values to detect differences is only quicker if there is a difference. hash(A) != hash(B) tells you that A and B are different, but hash(A) == hash(B) does not tell you that they are the same. In the second case, you'll have to do a full comparison to be sure.

Some other possible approaches:

Take another approach. If the comparisons are too slow, run a profiler and see if there are other ways you can speed up the comparison: try using a different data structure for storing your graphs, etc.

Try a compromise:

Create a separate hash value for each level as well as one for the entire graph. Comparing hash values for each level is still going to be loads quicker than comparing every single node.

Instead of using a hash value to speed up your comparisons, create two objects with metadata about the graphs (e.g. # of levels, # of nodes on each level, etc.) and compare those.

No there is no way to do what you are asking, the purpose of a hash value is to take something and make it something else with no apparent relation to what it was. This means hash values are only useful to tell you if to things are the same, not what is different between them.

It is theoretically possible to reverse engineer the source of a hash, and you could use the source to find differences, but this is both computationally expensive and entirely pointless when you have the source available with a much faster way to generate it.