Approximate Matching

Approximate matching is a term used in computer forensics to mean that two objects have similar contents but are not identically the same. It replaced the previously used terms similarity and fuzzy hashing.

The following two paragraphs are clearly similar but not identical:

We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defense, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

Contents

Kinds of Similarity

In forensics there are several kinds of similarity that are of interest:

Binary Similarity

Textual Similarity

Visual Similarity

Audible Similarity

Algorithmic (code) Similarity

Binary Similarity

Binary Similarity between a master object and a target objectcan be rigorously defined as the fraction of substrings that the two documents have in common divided by the total number of substrings in the master document. Notice that this implies that the similarity function does not have the commutative property. That is, BS(a,b) may not equal BS(b,a).