JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 61(10). p. 2151-2160

Abstract:

Similarity measures, such as the ones of Jaccard, Dice, or Cosine, measure the similarity between two vectors. A good property for similarity measures would be that, if we add a constant vector to both vectors, then the similarity must increase. We show that Dice and Jaccard satisfy this property while Cosine and both overlap measures do not. Adding a constant vector is called, in Lorenz concentration theory, "nominal increase" and we show that the stronger "transfer principle" is not a required good property for similarity measures. Another good property is that, when we have two vectors and if we add one of these vectors to both vectors, then the similarity must increase. Now Dice, Jaccard, Cosine, and one of the overlap measures satisfy this property, while the other overlap measure does not. Also a variant of this latter property is studied.