The concept of similarity is traceable in various areas of computer
science and other disciplines. One of the most visible applications of
this concept is similarity data retrieval. This search paradigm has
become important especially due to the current phenomenon of data
explosion that can be noticed in two dimensions: volumes of data
produced today are rapidly increasing, and the diversity of data types
is growing. The talk addresses the problem of an efficient management
of very large datasets on the basis of similarity. We model data and
similarity using the metric space, which is a very universal and
extensible concept. One of the main results of our long-term work in
this area is a very universal large-scale distributed system for
similarity indexing and searching. We have a fully functional
prototype implementation of this system - several demonstrations will
be part of the talk.
The rough outline of the talk is the following:
- motivation of the research, current state of the art,
- fundamental principles of the similarity search based on metric space
model,
- peer-to-peer networks and their utilization for distributed data
structures design,
- description of our specific approach - M-Chord, M-Tree, system
architecture,
- content-based search on digital images - MPEG-7 descriptors and
their combinations,
- demonstration of a prototype system for similarity search in a
collection of 50 million digital images downloaded from Flickr (photos
sharing system),
- face-recognition demo.