This is the end of the preview.
Sign up
to
access the rest of the document.

Unformatted text preview: 1 Near-Neighbor Search Applications Matrix Formulation Minhashing 2 Example Application : Face Recognition ◆ We have a database of (say) 1 million face images. ◆ We want to find the most similar images in the database. ◆ Represent faces by (relatively) invariant values, e.g., ratio of nose width to eye width. 3 Face Recognition – (2) ◆ Each image represented by a large number (say 1000) of numerical features. ◆ Problem : given a face, find those in the DB that are close in at least ¾ (say) of the features. 4 Face Recognition – (3) ◆ Many-one problem : given a new face, see if it is close to any of the 1 million old faces. ◆ Many-Many problem : which pairs of the 1 million faces are similar. 5 Simple Solution ◆ Represent each face by a vector of 1000 values and score the comparisons. ◆ Sort-of OK for many-one problem. ◆ Out of the question for the many-many problem (10 6 *10 6 *1000/2 numerical comparisons). ◆ We can do better ! 6 Multidimensional Indexes Don’t Work New face: [6,14,…] 0-4 5-9 10-14 . . . Dimension 1 = Surely we’d better look here. Maybe look here too, in case of a slight error. But the first dimension could be one of those that is not close. So we’d better look everywhere! 7 Another Problem : Entity Resolution ◆ Two sets of 1 million name-address-phone records. ◆ Some pairs, one from each set, represent the same person. ◆ Errors of many kinds : ◗ Typos, missing middle initial, area-code changes, St./Street, Bob/Robert, etc., etc. 8 Entity Resolution – (2) ◆ Choose a scoring system for how close names are. ◗ Deduct so much for edit distance > 0; so much for missing middle initial, etc. ◆ Similarly score differences in addresses, phone numbers. ◆ Sufficiently high total score -> records represent the same entity. 9 Simple Solution ◆ Compare each pair of records, one from each set. ◆ Score the pair. ◆ Call them the same if the score is sufficiently high. ◆ Unfeasible for 1 million records. ◆ We can do better ! 10 Example : Similar Customers ◆ Common pattern : looking for sets with a relatively large intersection. ◆ Represent a customer, e.g., of Netflix, by the set of movies they rented. ◆ Similar customers have a relatively large fraction of their choices in common. 11 Example : Similar Products ◆ Dual view of product-customer relationship....
View
Full Document