Saturday, 19 December 2009

Automatic matching is difficult but I've made a number of changes to improve the matching in Jaikoz in the latest release.

Jaikoz searches for possible matches in Musicbrainz then rescores them taking additional information into account to find the best match, it does this because an original Musicbrainz score only takes into account the search terms when scoring but we need to consider more values. For example we do not specify a duration in a search because some songs do not even have a duration within Musicbrainz so would never be returned by a search, but having got some potential results we want to give a higher score to those with a duration that matches the original song. Musicbrainz uses Lucene for searching with its own custom analyzer for deciding which songs are returned by a search and this latest release of Jaikoz uses the exact same analyzer to ensure scoring is compatible. This is one advantage of working on both Musicbrainz and Jaikoz !

When searching for a track we now consider more variations of the name because songs entered into Musicbrainz are normalized , for example We Have Explosive (Pt. 5) should be entered into Musicbrainz as We Have Explosive, Part 5 but they might not have been. This normalization is detailed in the Style Guidelines and In Jaikoz we now check for the title as it appears in your metadata and also as a normalized version as far as possible.

We also make workarounds for common errors in entering data. For example Musicbrainz Issue #5538 shows that users usually enter song titles as 'No. 1' , but in a large minority of cases enter 'No.1' , Jaikoz workrounds this issue.

Cluster Albums finds albums by artists with the same name but a different Release Id and tries to move the songs so that they are all on the same Release Id, note this is different to what 'Cluster' means in Musicbrainz Picard and perhaps I should have called it something different. Previously it did this by matching title against title for each Release Id being used, and picked the Release Id which had the most matches but now this has been improved. Firstly we use fuzzy matching on the title allowing for normalization as explained earlier. Secondly if all but a couple of tracks are successfully matched to one Release Id we allow matches on Acoustic Id and song length to shoehorn the remaining tracks into a potential release. This is really useful when the same song exists on two albums but is radically renamed between the two.

Tuesday, 15 December 2009

I went to the yearly Musicbrainz summit in Nurenberg a couple of weeks ago, a good time was had by all , and the old town is certainly a lovely place

We discussed future developments, and their plenty of good news for Jaikoz users.

Musicbrainz were very happy with the work I'd done rewriting the existing search system and I am the main developer of the search for the new NGS release being developed so I'm able to understand how search works in Musicbrainz and improve it to suit Jaikoz better. This should ensure you get better results from Jaikoz then any other other tagger.

NGS has lots of new features including recognising the same recording on different albums, and handling multiple disc release better - this will help with getting better matching.

Work is going to be done on creating a genre system from the existing folksonomy cloud so this will help with a field that is currently quite poorly served by Musicbrainz. Knowing the attention to detail of Musicbrainz editors I'm hopeful that in time Musicbrainz many create the definitive genre list.

Historically Musicbrainz have been very cautious about having APIs to allow data to be added in to Musicbrainz. Sensibly they do not want it to end up a mess like freecddb but they have warmed to the idea now and I think this is a neccessary move to keep on top of the exposion in music being recorded. So in the New Year Im going to have a think about taking advantage of this loosening up of data entry.

And Musicbrainz have just hired their first fulltime developer Kuno Woudt , a well established Musicbrainz editor and developer so this should speed up Musicbrainz development.