Discussion of Similarity Metrics

Jaccard / Tanimoto Coefficient

Analysis

In some case, each attribute is binary such that each bit represents the absence of presence of a characteristic, thus, it is better to determine the similarity via the overlap, or intersection, of the sets.

Simply put, the Tanimoto Coefficient uses the ratio of the intersecting set to the union set as the measure of similarity. Represented as a mathematical equation:

In this equation, N represents the number of attributes in each object (a,b). C in this case is the intersection set.