Sample retrieval based on similarity

Hello! I’ve got Max/MSP and Max for Live 9. I’m wanting to build a standalone or plugin that allows the user to select/import a sample and then have the software detect specific qualities (BPM, Amplitude, Noise, etc.) of that sample. The software would then select similar samples from a directory of hundreds of samples. That is, it would retrieve samples that are somewhat similar to the selected sample.

Not sure if it would be easier to have the software scan all the samples ahead of time and then just perform the weighting of the qualities and information based on the metadata extracted from the scans (that is the software would just scan and build a database or an XML file and work from that.

more precisely: i tryed to use the mpg7 enconder mxj for complete analyzing but stumbled upon some strangeness/bugs and couldn’t sort it out yet. i’m also very interessted in this so if someone knows other ways to get in-depth, non realtime analyzed, meta descriptions of audio files, please post. O.

making sample picking more intelligent. having thousands of samples on your disk indexed in a database. with a special browser to sort samples into predefined or custom categories, sort by ANY attribute or similarity or to compare with a given sample. such kind of sound browser would have been a nice suprise in live 9. but they still want you to type "bd" into the search field to maybe find some bassdrum samples on your harddrive;)

Anyways, I’ve been thinking about something like this, too. I just think I have too much data to go through and analyze it like this – even in a much sped up environment it’d still take ages. [ninja-edit: huh… this mpeg7 thing might be the ticket, actually yeah]

Good information, everyone! That echonest Max Object looks like it could do the trick!

The Mpeg7 is a possibility too. However, the query by humming sounds like it would be most suited for musical samples (which could be handy). I’m trying to locate found sounds and field recordings. At the very least to analyze them. Echonest is new to me so this is certainly a good place to start.

I think Wetterberg’s onto something – using MPEG7 for searching music sample databases. That could be very useful…

I’ve just been doing a similar thing the ‘old fashioned’ way – in Max – for a current project. More specifically by modifying one of the few Max-based SQLite database examples/tutorial (MovieBase) by Andrew Benson and then expanding on that – and learning just enough SQL(ite) on the way. With this setup it is now possible to build compound searches for multiple ‘keywords’/categories etc. and quickly recall matches for playback or whatever and it works quite well. In my case, the sounds (largely ‘non-musical’) have been tagged and categorised manually using my own arbitrary set of (non-analytical) perceptual keywords, but it would of course be possible to at least partly automate the identification of sounds by analysis, so perhaps MPG7 could be useful there… hmmm…

so let’s take some steps together.. i did some more testing with the mpeg7 mxj, some descriptors do not affect the sql output but they are saved to xml i think.
i can get maximum of 15 values out of the sql output. i do know nothing about the quality of these values and if that is enough to create a good fingerprint of the sound for comparisation. O.

-- Pasted Max Patch, click to expand. --

Copy all of the following text. Then, in Max, select New From Clipboard.

Attachments:

Not much time for this right now, but here are some random queries/observations:

I sniffed about briefly but didn’t find a description of the mp7 descriptors and/or how they are derived – Any pointers/links? Maybe some descriptors will be much more useful than others?

With regard to additional fields and tags for the database that may be useful, a lot will depend on what you want to achieve and how you intend to seek them, but for text based fields there is a (very) extensive listing/breakdown in this document on the Soundminer site: http://www.soundminer.com/current/MI_Whitepaper.pdf

If you do go down the language based descriptor path: In order to avoid dealing with lots of distinct fields I used a single keyword/tag field in my .db and then made use of the SQLite LIKE operator which makes it possible to search for multiples in one field…

I also downloaded some recommend pdf . but did not find clear definition of single descritors. i’m making good progress. i imported 20000 files (samples < 10seconds). it took about 2 hours. the database file is not even 5 mb. and just by ordering the columns (most values are floats) i now can get a picture of what the descriptor does by listening to the top entries ordered ASC/DESC.

LIKE can only be used with character strings. i have rows of floats and a name column. i’m thinking the whole time how similarity between rows can be calculated.
or how to find the closest match in a column to begin with.

aa 1. 2. 2.
bb 5. 2. 6.
cc 2. 3. 3.
aa 2. 5. 3.

i have something like this and row 3 is a closer match to row 1 then row 2, even if row 2 has 1 exact match. and row 4 is maybe a closer match to row 1 then row 3 because the character field has a higher priority in the similarity caculation.
does that require advanced sql?
Another big gap to a useful patch is a missing drag n drop ability from max4l to Live’s slots to make use of the results instantly. on windows i can call up an explorer window with the selected file in focus but that’s a poor consolation. O.
edit: seems like this pdf contains some in depth info at least for the mathematical side of things: ftp://sumin.in.ua/books/DVD-021/Kim_H.,_Moreau_N.,_Sikora_T._MPEG-7_Audio_and_Beyond%5Bc%5D_Audio_Content_Indexing_and_Retrieval_%282005%29%28en%29%28285s%29.pdf

after more research i can now sort by similarity(distance) with an input float with this
exec "SELECT * FROM sounds ORDER BY ABS(spectralCentroid – 50.) LIMIT 100"
for example gives 100 closest matches to 50hz spectralCentroid. and now i can also expand this to sort complete rows by similarity to input values.
it’s exciting for me because this sound indexing thing was one of my first visions after discovering max 4 o. 5 years ago :) O.

and what i found now is even more exciting (and to make this thread more complete for people using the forum search)
there’s Alexander J. Harker’s descriptors~ objects, realtime and buffer version, for mac and win(beta). the entire external package is just great. with awesome help patches.
thank you Alexander.http://alexanderjharker.co.uk/
edit: windows version is not public yet. but he was kind enough to give access after contacting :) O.

Nice work 11olsen – looks like you’ve solved a lot of your challenges and thanks to your persistence with this thread there are also more links to useful resources now (ie that pdf and AJHarker externals – the latter of which I had on my system already!!) . As for the Live browser issue you mentioned earlier, perhaps there is a possibility that the Live 9 API will enable scripting of the ‘enhanced’ browser to enable listing of your search results(?). In any case, i definitely see the potential of this stuff for my purposes into the future…

didn’t know that. really useful feature extraction set. the "mean MFCC" alone for example seems to output a good "fingerprint" value for the sound. the max integration you’re talking about is only a way to evaluate the output description files of MEAP. not enough to somehow integrate this in an automated process, scanning a number of files. i wish all of that would live in an mxj or C external instead of this ##### graphical interface. but it seems not to be far away, it’s open source, commented source files. O.