I've made quite a few changes, fixed some bugs, and added some new functionality which I will outline below. More details will be filled in when I remember them. This is an early release and I'm looking for testing/feedback. There are some caveats and warnings that will be listed in bold below read them!

Major Changes:
Updated to take advantage of the NGS schema.
Removed the deprecated MusicIP functionality and replaced it with AcoustId.
Added a matching algorithm (Munkres) that is significantly more robust than the default SmartUpdateTracks - taken from beets ( http://beets.radbox.org/ )
Added the ability to read and write custom tags (mbid de facto standard tags, puid tags, acoustid tags, and a limited list of my personal custom tags)

There are surely some things I'm forgetting so the list might change. Originally this had much more functionality but the AMG and AAE parts had to be stripped out because neither website likes their data used in this way.

WARNING:
writetags.exe is a helper application (used only for the custom tags listed below) written in Python that uses the Mutagen and beets/mediafile.py libraries. These libraries do not write ID3v2.3 tags - they write ID3v2.4 tags. Windows explorer cannot understand this standard specification. This means that whenever the script is writing to the subset of custom tags it uses that the track is being converted from ID3v2.3 (Mediamonkey) -> ID3v2.4 (script/writetag.exe) -> ID3v2.3 (with a Mediamonkey re-write). I have not noticed any data loss in this process but I do not use every possible ID3v2.3 tag. Your mileage may vary. The tagger and the fingerprinter should both work with nearly every audio format - though I have not tested .wma for writing tags. If you use this script you agree not to hold me responsible for any data loss that may occur. I suggest you test the script on a small subset of files and go slowly while it is still new. Any tag that MM recognizes it is writing itself - you can disable writetag if you don't want the Musicbrainz identifiers written to their appropriate "custom" tags. Those of you who are not updating tags when the MM database changes should note that this script will ignore that behavior - after it finishes writing custom tags MM will come back and write whatever tags you have in the MM db to the file.

Usage Notes: AcoustIdAcoustId is a new open source fingerprinting framework that is developed by Lukáš Lalinský, one of the lead developers of MusicBrainz's Picard application. fpcalc.exe is the Win32 binary that will generate the fingerprints. If the version bundled doesn't work then check here for the "chromaprint binary" that will work for you. There is detailed information available on how it works on his blog/website. There is one key thing to note: you can help the database grow and become more accurate. This script will automatically submit any unknown metadata to the server if you enter a "user API key" into the settings page. The ability to fingerprint a file is not dependent on the user API key but the submission of unknown tracks to the server *is*. Lukáš has taken pains to provide an anonymous way to make these submissions. If you sign up for an API key using an "OpenID Account" (things like Google, Yahoo, etc all qualify) then he receives an anonymous hash which is "associated" with your data but your username and etc are not. This lets him weed out any malicious submissions while offering privacy to people who submit data. It takes 30 seconds to get an API key. Goto this link and sign in with any open ID (or a MusicBrainz account - but those are not anonymous). Once signed in click on "Get API Key" - enter this key into the settings in the script and you are good to go for submission and the world will be a better place.

AcoustId can be turned off at any time from the settings page. The data is written to SQL tables for use in other applications (scripters can look them up) AND to media tags provided "writetag" is also enabled (which it is by default). By default Acoustid will attempt to fingerprint every selected file. You can "limit" the fingerprinting in the options and it will function like the old script with MusicIP (it will still fingerprint 2 tracks if available to cut down on the number of result albums).

Usage Notes: Writetag
If enabled writetag can save the following information to the following "tags" using the community de facto standard tag mappings:

It does this using a simple XML interface and it will only read/write the list above plus a few others I use having to do with rovi/amg. Because it's using XML to go back and forth there is a certain disk access penalty and the reading/writing is generally slower than it should be. Unfortunately there is no way that I can see to disable the "Close" button in the AutoTag window - but the only way to run writetags.exe and save the extra information is in an event that must fire after you push the AutoTag button. The long and the sort of it is this: do not close the autotag window until you see that MM no longer has an active process in the status bar - otherwise you will be looking at an error and some tags will not be written. Annoying, but it's the best that can be done with an application that doesn't do this natively. The format to read is: "writetag.exe read <mediafile>" - it will spit out xml that corresponds to the tags it recognizes. "writetag.exe write <mediafile> <xmlfile>" will write the recognized tags in the xml file to the mediafile if possible. Remember that the resulting file will be in ID3v2.4 so if you are using writetag.exe on your own you might need to find a way to get it back to ID3v2.3 if you want windows explorer to show the tags. If you're just using this script it updates the tags back to ID3v2.3 automatically.

Usage Notes: Munkres
The default matching in Mediamonkey is terrible. I've shamelessly stolen a better solution from beets that will take into account Title, Artist, Track#, Duration and use a weighting system and a string distance scoring system combined with the Munkres algorithm to find the best possible match given 2 sets of data (set 1 is the MusicBrainz response, set 2 is the list of tracks you highlighted to tag). This is the other executable bundled and it works similarly with an xml interface. By default Munkres is enabled and it should be relatively low key except that there is a new value and checkbox in the results window. If the master checkbox is on in the settings page you can toggle the Munkres matching on and off from each result screen to see what happens. If you aren't into numbers you can ignore the score and just pay attention to the "Confidence" value. If you are into numbers you can play around with the weights at the top of the script file to suit your liking. The Munkres algorithm takes a huge possible map of data and tries to find the best solution. If you search for 20 tracks and get a response from musicbrainz that is 20 tracks there are 20! (that's 20 factorial) operations to find the best match. That is 2,432,902,008,176,640,000 operations. That is a very large number. The Munkres algorithm cuts that down significantly - BUT THERE IS STILL A COST. 20-30 tracks should be no problem on most machines but if you select hundreds or thousands of tracks with Munkres on you will be in for some trouble (O(n^3) for those of you who care). Anyway: just don't be stupid and select hundreds of tracks and you will be fine.

Usage Notes: General Search
In general the script is best used on albums - and, insofar as possible, complete ones. The Musicbrainz tags are being written when you press AutoTag - and these are assumed to be canonical - this means that the next time you search if there is an embedded tag it will be used as the basis for the search. This is, in my view, very desirable behavior. But it does mean that you want to be sure when you hit AutoTag. The script is "release-centered" and while it can probably find 1-off and partially complete tracks you will find that the matching algorithms and etc work less well in those situations. They still work - but not as well.

Apology:
The code is a mess. Commented blocks all over the places and etc. This is an early release and I used someone else's already messy code as a framework so it's far from clean. Nor is it particularly optimized. It has also had a few thousand lines stripped out so I can make it public without being sued by AMG and etc. I don't use this version at all so motivation is low but when I'm bored enough I might clean it up.
---
v1.25
-Various bug fixes
-more options to sort results list
--
v1.23
-Various bug fixes
-Additional string distance algorithm
--
v1.1 - 2012-01-26
-Fixed Title checkbox to properly maintain original title - this makes Munkres matching mandatory - no idea what it will do if you uncheck munkres (it should just skip over any track-specific tagging options unless the titles are equivalent except for case)
-Added Work Relationships (Artists only) - These appear in "Original Lyricist"
-Removed option to place "lyricist" in Original Lyricist tag

Last edited by booblers on Sun Jan 29, 2012 8:16 am, edited 23 times in total.

Question: Where is the script? Where is the settings page?Answer: Ctrl-L. In the upper right hit "Options." Select "Musicbrainz NGS." When a result comes up you will see a smaller "settings" button in the upper right of the result window. I tried to comment the options well.

Question: Why did you include multiple python .dll and other supporting files? (with both Munkres and Writetag)Answer: I'm lazy. Also: these are useful to have around and you can re-use these easily in other scripts. But mostly: I'm lazy.

Question: Why is it slow? How can I make it faster?Answer:1: First make sure logging is disabled. It should be by default. The script currently generates a *lot* of log messages and this can slow things down significantly. 2: If you want to cut down on the fingerprinting time you can "limit" the fingerprints generated in the settings page. I find the acoustid lookups to be comparatively fast. Once tagged (with either mb_album_id or acoustid_id), lookups will skip this step and be very quick. 3: You can turn off Munkres matching for large requests or if you have a very slow cpu. 4:Turn off writetags and SQL tables. If you disable: Acoustid, writetags, sql, and munkres what you should be left with is a very speedy MusicBrainz NGS tagger. Less useful in my view, but very quick.

Question: Is there a standalone fingerprinter like the PUID generator?Answer: No but if people really want one it can be done quite easily based on what you see here.

Question: Why not use Munkres to compare all the releases resulting from a search to pre-select the best one?Answer: This is entirely possible and is how my private version works. This can be costly in terms of time (depending on the number of releases returned) and might take up to 10-15 seconds when there are quite a few releases to compare. If people don't find it too slow in the current form I will add this functionality to this version. What one wouldn't want to do is allow 100+ returns from MB for comparison through the matching engine.

Last edited by booblers on Fri Jan 27, 2012 3:58 pm, edited 2 times in total.

I'm terribly sorry for being too stupid to understand what this does, but since I"m an avid Musicbrainz user, I simply MUST understand!

Can you explain? it seems as though this will remove the need to pass all new music through Musicbrainz before importing into MM?

Basically, my current procedure is to find and tag all new music with MusicBrainz. I have several MusicBrainz scripts that save albums with multiple discs as one album, removes nonsense words and caps, sets The Beatles to Beatles, The, and also gets mood, genre etc from Last.fm (something missing in MM) etc. I run my new music through MusicBrainz and have it save to a !Quarantine folder.

then!

I have MM monitor my audio/!Quarantine folder and have a category setup where "path/filename contains '!Quarantine" and then when the new music is imported, I use MM to find lyrics (lyricator, before it stopped working. Dead in the water atm) and find updated Album Art if desired (Batch Art Finder before it stopped working. Dead in the water atm), I normalize volume, set my grouping tags, then when all i said and done I "auto organize" files to save to audio/grouping/artist/year-album/track#-title.mp3

which removes the music from my "Music in Quarantine collection" as the file no longer has "!quarantine" in the path.

Does your script remove that first step? does it do something else? reading through 4 times, I just dont get it, and I"m shamed to say that

This is a mediamonkey-style auto tagger. MM has built-in auto-tagger capabilities that can be customized. You can see the list that comes with MM by hitting ctrl-l.

Basically the script will take a set of selected tracks (typically an album) and, by using acoustid fingerprinting, or by guessing from the metadata, it will make a request to Musicbrainz for canonical metadata for that album. It will then use a matching algorithm to check the result and present you with what it thinks is the best way to tag the tracks given the data.

Whether or not it removes your first step depends on what you mean when you say that you "pass all new music through Musicbrainz." Do you mean Picard? Musicbrainz is just a database really with a front end on the website. Picard is the Musicbrainz tagger. If you mean Picard then, most likely, yes, this will function much like Picard does (without having to open a new application). There are some differences but once you get used to them you should find them similar.

Style things like moving "the" to the end are not part of this tagger. I prefer the canonical musicbrainz names. It could be added relatively painlessly by you or someone else. If you track down the musicbrainz sort order field (I believe there is one - let me know) and want me to add that as an option then I will consider it.

Mood and genre information is in my personal version of this and uses last.fm & amg/rovi. I might consider adding last.fm back in since it is public but it would take some work as the three are intertwined.

EDIT: If what you mean by "run through Musicbrainz" is that you use the original version of this script - the one I linked to - then you will find that very little has changed in your work-flow. You will get better results, faster, with this version if you enable Acoustid, Munkres, and Limit Acoustid.

It *is* in principle possible to make a completely automated version of this same script that runs in the background using the Munkres confidence information - (that is what beets is designed to do and some code comes from beets). Maybe I will do that in the future.

Last edited by booblers on Fri Jan 27, 2012 4:00 pm, edited 3 times in total.

Perhaps I should also note that while in its current form this script will not save you the first step - beets will. It's very powerful once you get it setup and it would save you several of the steps that you describe and (depending on your music sources) could completely automate all the steps.

beets (with plugins) + headphones is a very powerful combination... far more powerful than anything you will ever get out of MM. They also happen to be free and open source. headphones has a ui for the acquisition but beets is command line which some people find daunting - but if you're serious about saving time and energy and curating a large collection it can't be beet (lulz).

There are 3 other lines that look similar - change those as well. In my current source they are lines 1385, 1392, 1397, 1402.

There is a question about those 4 lines I have for you. In my version of this script I was using "Titles" to mean something other than what it sounds like. It was actually a toggle for "Per-Track Info based on Title" - which meant to me that if I unchecked Title that I would update the *album* information (things like AlbumArtist, Album Title, etc) but none of the information specific to a track/recording - if that makes sense.

Obviously that is not intuitive to other users and needs to be changed. My questions are these: are you more likely to use a toggle for per-recording data vs album data or individual toggles for each field? And: would you ever use both?

Edit: Actually the fix above will work if you don't toggle Title on and off again - if you do that fix will not work - it will take some other things for me to fix that completely but let me know how you're more likely to use the toggles first.

I always disable base fields for all taggers like Artist, Album, Genre, Date, Art, Comment, Album Artist. I'm only interested in tagging fields like Composer, Conductor, Involved People, Lyricist, Original Title, Original Date, Original Lyricist, BPM, ISRC, Publisher, Mood and Tempo. For these fields a per Album trigger works fine for me.

Do you break out those fields into particular MM fields or do you like the functionality of the old musicbrainz tagger. In the old version (and this one) there is no separate field for Lyricist, Original Lyricist, etc - they are all lumped together in involved people and etc.

I believe that is a result of you (presumably) unchecking "Title" - which as I described above will remove all track-specific data at the moment. I'm changing that tonight and will double check that the extended information is available.

When it comes to "Original Lyricist" and "Original Artist" there are new options available in Musicbrainz NGS but that are not in the script. There is a logical object called a "work" which is the composition behind the current recording. It is possible to get artist-credits for the work and save it there. The original just puts a copy of the Artist/Writer/etc.

Which of this information is valuable to you? For instance: Work->"writer" could go into the Original Lyricist tag if that is desirable. So too with Work->"title" into "Original Title" What to do with original instrumentation and etc is up in the air. I don't believe there is a way to access "Original Date" in musicbrainz. If you know of one let me know. There might technically be a way to infer this by comparing the date on every recording related to the work but that could involve a very large number of requests. I'm also not sure what should go in "Original Artist" - unless it's the artist of the first "recording" - which, like the date, is likely going to take a large number of requests to infer.

Your script works (without any modification) for MM4 too (installed in MM4, started with Administrator rights).
And so far: It matches my tracks REALLY good!

I got a result line like:
Munkres Match - Cost: 0 Confidence: HIGH

What exactly does this two values 'Cost' and 'Confidence' mean?
Could you please give us a link to the possible content?

I have entered my own AcoustID in the Settings of your script - what does it help?
Do you send generated fingerprints back to Lukas with my own AcoustID?

What exactly means the setting: 'AcoustID SQL Tables'?
Are this tables inside MM4? What is the purpose of this SQL tables?

One more question (not sure, if this is the right place for it:)
How can I add a column in the preview/to change-window (lower part of search result window) with the content 'Cover exists in track' and 'Cover size'?
This would help a lot to decide, if I need the new cover or not.

And another one:
If I use this search string inside your script: 'Book of Reflections Chapter II: Unfold the Future (2006) (FI)' I get exactly the album I'm looking for. But there is no cover from MusicBrainz. BUT, your fantastic script offers a 'cover art link' to another source.
Question: Could you embed this cover if MusicBrainz/Amazon doesn't show a cover?
This would be one step more, that we don't have to do manually.

And one more:
You write about Writetag:If enabled writetag can save the following information to the following "tags" using the community de facto standard tag Where can I adjust the tags, I want to be written in my tracks? For example, I don't want this tags:
- MUSICBRAINZ ALBUM ARTIST ID
- MUSICBRAINZ ALBUM STATUS
- MUSICBRAINZ TYPE
- MUSICBRAINZ RELEASE GROUP ID
and two others:
- REPLAYGAIN_TRACK_GAIN
- REPLAYGAIN_TRACK_PEAK

What exactly does this two values 'Cost' and 'Confidence' mean?
Could you please give us a link to the possible content?

"Cost" is a reflection of the Munkres algorithm which is an algorithm designed to find the lowest-cost of certain combinations of things. In the case of matching albums to canonical metadata what I (which is to say: the author of beets) did was calculate a "cost" for each track in your library compared to a every track in the musicbrainz data. This "cost" is actually a "string distance value" combined with some other information like the duration of the track in your library compared to the duration at musicbrainz. These values are calculated for every possible combination of the tracks you selected as compared to every track available at musicbrainz. Then - the best combination of all of these things (ie: the lowest "cost") is found and presented to you. You can basically just ignore the "cost" unless you want to dig into the code and see how the algorithm is working (or adjust it).

"Confidence" is based on the cost and reflects pre-set thresholds for the cost that are reasonable. When the script tells you that the confidence is "High" you can be pretty damned sure it's the correct match. "Medium" generally speaking means it's an acceptable match but something weird is going on with your original metadata or the duration of your tracks (maybe they are silence padded, maybe musicbrainz has the wrong track duration data, etc). Medium is generally speaking a pretty good match. "Low" means all bets are off for a variety of reasons. It can't find what it recognizes as a good match with the selected Musicbrainz data. Now - it's possible "Low" is still the correct match and as a human you might be able to recognize it as such - but the script cannot.

I should maybe also note that Munkres will turn itself off if and it will read "Off" if the match is so low that it has 0 confidence the data is correct. Again - it is technically possible for it to be the correct data but there's no way for the script to tell. It will also turn itself "Off" if you have more tracks selected than musicbrainz returned a result for. So for instance if you selected 20 tracks and the Musicbrainz release you are looking at only shows 15 data items the matcher will shut off believing that you've selected the wrong release. You, of course, might be right and can do what you want but it isn't going to help you

MMuser2011 wrote:I have entered my own AcoustID in the Settings of your script - what does it help?
Do you send generated fingerprints back to Lukas with my own AcoustID?

Yes - Lukáš will appreciate that, I thank you on his behalf. What it does is submit any unknown *metadata* to the AcoustID servers. So let's say you generate a fingerprint and get an acoustid back from the server. But the server when it sent you the acoustid said that it doesn't know *what* the song is (title, artist, musicbrainz id, etc) - when putting in the API key will do is send whatever data you have about the track to AcoustId for possible addition to the metadata database there. That aspect is all handled by his servers which use their own matching algorithms to match unknown metadata to musicbrainz and etc. Note that this means that you can get an AcoustId ID from the server but get 0 metadata back. If that happens the script moves on to various ways of searching Musicbrainz itself without help from acoustid (but it does save the acoustid since later they will be linked to musicbrainz on Lukáš' side).

MMuser2011 wrote:What exactly means the setting: 'AcoustID SQL Tables'?
Are this tables inside MM4? What is the purpose of this SQL tables?

There are 2 tables added to the internal MM database. 1 is holding the fingerprint/duration information for acoustid. These are large unique strings generated by fpcalc.exe. The other table is holding the actual AcoustId IDs. This information is written to the files themselves with writetag.exe if you leave that on. The tables exist so that certain other scripts have the option to interact with the data without opening up every media file you own to see if they exist. For instance I have a vague plan to make an auto-acoustid-tagger available that will be able to check the internal database to see if a file has already been tagged. Over time the fingerprint table might grow very large. One has the option to turn off additions and checks to this table as a result. You might see minimal performance loss with it on. If you start to experience slowdowns you can try turning it off. There's a more technical answer to your question but I'll leave it there for now.

MMuser2011 wrote:One more question (not sure, if this is the right place for it:)
How can I add a column in the preview/to change-window (lower part of search result window) with the content 'Cover exists in track' and 'Cover size'?
This would help a lot to decide, if I need the new cover or not.

This is, as far as I know, one of the huge number of flaws with MM. There is no way that I know of to add those fields down there. Maybe Lowlander knows a way. Or maybe it's in MM4, I don't know. What might be possible is for me to add the current image into the results below the suggested image. This may or may not be simple depending on how MM stores the images and how it makes them available to us. I will look into it.

MMuser2011 wrote:And another one:
If I use this search string inside your script: 'Book of Reflections Chapter II: Unfold the Future (2006) (FI)' I get exactly the album I'm looking for. But there is no cover from MusicBrainz. BUT, your fantastic script offers a 'cover art link' to another source.
Question: Could you embed this cover if MusicBrainz/Amazon doesn't show a cover?
This would be one step more, that we don't have to do manually.

Musicbrainz stores user-created links to various sites related to a release. The "cover art link" is user-input and is not necessarily a direct link to cover art (as it is not in your example). The short answer to your question is: no. Such a thing might be possible in principle but it would take more work than it's worth to parse that page for the image you're looking for. That said - there are options for cover art that will exist soon. There's a private option I use called AlbumArtExchange. Sadly they frown on this type of use and so I can't include that (in fact it's not simple to make the script work with AAE - it has to impersonate all sorts of things in sketchy ways to work with AAE). However there is a new option on the scene that could be pretty good very soon. You can read about it here: http://wiki.musicbrainz.org/Cover_Art_Archive

I will look into adding options for that in the future.

Last edited by booblers on Fri Jan 27, 2012 4:39 pm, edited 2 times in total.

MMuser2011 wrote:And one more:
You write about Writetag:If enabled writetag can save the following information to the following "tags" using the community de facto standard tag Where can I adjust the tags, I want to be written in my tracks? For example, I don't want this tags:
- MUSICBRAINZ ALBUM ARTIST ID
- MUSICBRAINZ ALBUM STATUS
- MUSICBRAINZ TYPE
- MUSICBRAINZ RELEASE GROUP ID
and two others:
- REPLAYGAIN_TRACK_GAIN
- REPLAYGAIN_TRACK_PEAK

Well... writetag is currently an all or nothing affair. It doesn't, in principle, have to be and maybe I will add options to turn on and off what it writes to the UI but I wouldn't hope for it too soon. This script is mostly just a port of my private version and I like those tags in mine etc.

Just to be clear - writetag is only involved in writing tags that MM cannot write itself (because the developers are... myopic... to put it nicely).

This means that it writetag isn't doing anything with replay_track_gain or anything that I didn't list in the writetag section. That is all MM and you have to deal with them on that one.

As for the musicbrainz stuff that is all information that you may not be currently using but that shouldn't cause any problems. Like I said when I get bored enough I might make options for each tag in the UI but I'll have to be pretty bored since that's boring work