Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

The application processes song and takes 100 sample per second, so around 15000 samples for complete song. These sample values are stored in database, 1 row for each sample as {HASHKEY, NOTE_ID, TIMEOFFSET}. For fingerprint of complete song, I may have around 15000 rows in fp_core table. I am planning to put fingerprints of 50000 songs in database, so around 750 million rows will be in fp_core table.

I have other application to process recordings and detect songs played in it. Process is, create set of HASHKEY from recording audio, same as for creating fingerprint of original song. Recording audio will generate around 20000-30000 HASHKEYs. Then application retrieves rows from fp_core table for all matching HASHKEYs generated by recording audio.

To retrieve data from fp_core table by processing recording, I am doing is, filling these all HASHKEYs of recording in one more table, table is:

@billinkc do you mean to use WHERE with IN clause?
–
UDPLoverSep 15 '13 at 18:01

So, you want to compare the 20-30k hashkeys of a sample with the 15k hashkeys of a song and if there are no (or few) matches, to be discarded. If there are many matches, the sample is identified as to be this song (and do this against all groups of 15k until you identify the sample.)
–
ypercubeSep 15 '13 at 18:05

You mention that the hashkeys can be different for the same song, but it sounds like you have some mechanism for detecting similar songs - if possible, add an additional column called 'songid' to your fp_core table and then query on this single value using a WHERE clause.