Wednesday, 27 August 2008

I'm extremely impressed. The ISMIR proceedings are online. Whoever wants a printed copy can organize it themselves (it couldn't be much easier). Some might also want to only print the papers they are interested in. And some might be happy to have only an electronic version.

It's always been a pain to drag the heavy ISMIR proceedings home. And it always felt like a huge waste of paper.

I heard Juan and Youngmoo talk about this idea a year ago in Vienna (at last year's ISMIR). I'm very happy to see that they found a solution that should make everyone happy.

Juan writes in his email:We hope that you will like this new approach to printing the proceedings which we intend to be more cost effective, more convenient, and, with luck, more environmentally friendly than mass printing of proceedings for all attendees who may not wish to carry a printed copy around.

Paul (who is already distributing a large chunk of Last.fm tags) and I are planing to include a few slides in our ISMIR tutorial on how to obtain tag data.

Below is some Python code that basically takes an MP3 file as input and outputs a list of Last.fm tags (for both artist and track). The MP3s don't need correct ID3 tags, but they need to be full length (clips won't work).

The Python code uses Norman's command line finger printing client to find the correct artist and track name. The path to the executable needs to be set in the code. Norman supports Win32, OSX Intel, Linux - 32.

The output is written to a file. For each MP3 file passed as argument there are up to two rows in the output file: one for the artist tags, and one for the track tags. Each row has the format: "<mp3filename> <encoded artist or artist/track name> <tag> <score> [<tag> <score> ...]". Tabs are used as delimiters.

Btw, special thanks to Eric Casteleijn for various Python recommendations (lxml etc). (Which reminds me that I still need to fix the other Python code I posted.) As usual any feedback is much appreciated.

[...] "tags are often ambiguous, overly personalised and inexact" [...] "The result is an uncontrolled and chaotic set of tagging terms that do not support searching as effectively as more controlled vocabularies do." [...]

This was published in the D-Lib magazine in early 2006. I wouldn't be surprised if by now the authors realized they were wrong.

But why would anyone ever want to control the vocabulary people use when describing something so extremely multifaceted and something that evolves so fast like the content on the web (delicious), or snapshots of life (flickr), or music (Last.fm)? I guess I'd need to think more like an old-skool librarian to understand that.

Seems like the interesting talk by Miles Osborne on "Using Nutch and Hadoop for Natural Language Processing" is still missing on skills matter's website. I'll update this blog post when the talk is added.

Saturday, 2 August 2008

I've added 5 dissertations to the incomplete list of MIR related PhDs. I'm particularly happy that Markus, Tomoyasu, and Kazuyoshi (all of them are former colleagues of mine) finished their thesis so successfully.