Multimedia:File fingerprint

MediaWiki currently uses a SHA-1 hash to characterize the content of an uploaded file; such a hash is supposed to be unique for each file, and it allows to identify duplicates. However, this feature only identifies exact duplicates; it can't identify similar or derivative files.

Some applications use identifiers like "fingerprints" or "signatures" based on image identification technology like Haar-like features to find and track similar pictures. For example, digiKam uses a "lengthy number using a special technique (Haar algorithm) that make it possible to compare images by comparing this calculated signature. The less numerical difference there is between any two image signatures, the more they resemble each other."[1]

A similar feature for MediaWiki (probably as an extension) would greatly benefit Wikimedia Commons by providing the ability to:

identify derivative works (e.g. an original image and a cropped version)

identify similar pictures (e.g. different pictures of the same object, specifying a threshold of similarity)