Category: Server

Import from FreeDB

The process of importing a release from FreeDB has been rewritten.
The improvements include:

Previously it was easy to miss the “CD has multiple artists” tick box, and hence get the import wrong. Now the user is quite obviously asked to make a clear choice (with no default) as to whether a single artist or multiple artists are involved.

All artists can be searched for, using the same technique as is used for entering a “change track artist” moderation.

All fields have a “Guess Case” button; also, for single artist releases, there is a “Guess All” button. “Smart Quotes” are fixed (replaced by apostrophes) by default.

At any stage it’s easy to go back and change your earlier answers, without losing any later answers.

Also, releases imported from FreeDB (both automatically and manually) should
now have their track lengths set.

Other Changes

There is now an “Edit Artist Alias” moderation type, so you don’t have to
delete and re-add an alias to change it. See the “Edit” links on
/showaliases.html.

Previously add links to add a note to a moderation required Javascript to
work. This has now been fixed.

For “Add non-album track” moderations, the (meaningless) track number is
no longer displayed.

A couple of security flaws relating to moderation notes have been fixed.

There is now a script to fix “smart quotes” in bulk (activated by the
server administrators).

The nightly database dump archives (mbdump*.tar.bz2) now each include a
TIMESTAMP file. This will make it easier to identify exactly when each dump
was taken, and to check that a set of dump files belong together.

Artist Subscriptions

On each artist page (showartist.html or artistinfo.html) you’ll see a
“Subscribe” link; use this link to subscribe to this artist. To see your
list of subscriptions, click the “Subscriptions” link on your profile page
(moderator.html). From there you can unsubscribe whichever artist you don’t
want in your list any more.

Once a day, the system will look for any moderations added for your subscribed
artists, and e-mail you with a list looking a bit like this:

The list tells you how many open and applied moderations have been added to
each of your subscribed artists. Artists with no new moderations are not
included. If there are no new moderations for your artists, no e-mail is
sent.

You must be a registered user, and have a validated e-mail address, to use
this feature. Please note that, even though it’s called “Subscriptions”,
it’s free 🙂 Finally, please be assured that, like your saved preferences,
your list of subscribed artists is not made publicly available.

New preferences: mods per page; show “add album” inline. Filter the VA album browser by release type / status.

Change Log for mb_server

Added EMail verifiction support: When a returning user logs in, the user will be prompted to verify or blank out their email address. New users can specify a email address when creating their account. The moderator profile now supports changing the email and sending out confirmation emails.

Moderators can now send mail via the ‘send email’ links in the profiles page or the moderation page.

Moderators also get mailed when a new note is attached to one of their moderations.

The support for looking up various artist albums in the FileLookup interface has been improved, and should make the tagger behave a little better. However, the tracks for VA albums are not being returned via the web service yet — that is scheduled for the next update since libtunepimp needs it.

Added an explanation of automoderators and which moderations qualify for automoderation to the moderation intro. This page now also lists which moderators are automoderators.

There is now a “List Failed Moderations” page (for you, or any other moderator). (Patch by Duncan Findlay)

As well as showing mods by artist, you can also show them by album. This functionality is quite limited though; it’s primarily intended to find the original “Add Album” mod and any “Edit Album Name” mods, and it *may* also find other mods (e.g. move album, edit album attributes, remove album). It *won’t* find mods for each of the tracks on the album (e.g. edit track name).

The voting page now performs stronger validtion, including checking that the user is logged in.

Previously if you did a “move disc id” moderation, then the album meta-information (including the number of disc IDs each album has) was not updated – thus, the disc ID icon would remain on the old album, but the disc ID itself would move to the new one. This is now fixed.

The “clientversion” table has now been moved into the “core” dump file (mbdump.tar.bz2)

These changes were also made recently:

A preference was added to control how many moderations are shown on each voting page. The allowed range is 1-25; the default is 10.

Another preference was added to enable “Add Album” moderations to show the whole album within the voting page.

The “browse various artists” page now allows you to filter by release type and release status (e.g. you can browse only soundtracks).

Reworked the moderation system behind the scenes. List moderations by moderator. Much improved “artist filter” system.

Change Log for mb_server

The bulk of the work this time was a series of changes which are actually invisible to end-users of the system:

Each of the 27 moderation types has been re-implemented using a dedicated “handler” module, conforming to a single moderation handler API.

Moderations are inserted using a more Perl-like “named parameters” style of calling. Instead of passing cryptic parameter combinations off to a POST to /bare/enter_*mod.html, each web page for editing the data (e.g. “editartist.html”, “remdiscid.html”) now call the “insert moderation” API itself.

The main effects of those changes are:

It will now be much easier to add new types of moderations, which in turn means that it will be much easier to extend MusicBrainz to handle new types of data.

It will also be much easier to adjust the behaviour of the existing moderation types, e.g. to change what is shown as “old” and “new” values for a given mod type, perhaps to make voting easier.

Although all of those changes should be mostly invisible to the end-user, a number of other changes relating to the moderation system were also introduced:

It’s now possible to see a list of moderations entered by any given moderator. Previously it was possible to see “My Mods”; now “My Mods” is just a special case of “View Mods by Moderator”. To see a moderator’s list of moderations, go to the moderator’s profile page (/moderator.html) and follow the “List Moderations” link.

The “artist filter” system used in some moderations (e.g. move album, merge artist) is now much improved. Instead of entering the artist sortname first, then perhaps having to enter the artist name on the next page (where you can’t edit the sortname), you can now just search for the artist however you like, then enter both the artist name and sortname together if required. You can also “search again”. Anyway, try it and see. It’s much nicer.

The “change track name and artist” page (changetrack.html) is improved. Bug #736075 has been fixed. You’re now given buttons to flick between the current track/artist names and the guessed (split) names, in case it guessed wrong. Also the order of track/artist for this album (i.e. does this album go “track – artist” or “artist – track” ?) is remembered, so you should only have to click “Swap” up to once per album.

It’s easier to enter “non-album tracks” than before; there’s also a link for entering a non-album track for the artist when you do a tag lookup.

The “moderation pending” flag is now correctly set in a few places where it never was before (e.g. for “merge albums”, all the albums in question are marked as “mod pending”).

When viewing moderations, the “old” and “new” values have changed in minor ways for some moderation types. For example, the tagger version is shown for “Add TRMs” mods. Bug #738587 has been fixed.

Previously, certain “edit” moderations were only auto-moderated if the only change was upper/lower case. Now, more things can be auto-moderated, including changing the amount of space (e.g. changing two spaces to one), and changing types of quotes (e.g. single quotes (apostrophes), double quotes, and so-called “smart quotes”).

WEB SITE

Search Facility

The search facility has been rewritten. Previously, searching was quite slow, especially for tracks, and especially if your query contained several (say, five or more) words. For example, if you did a track search for “The house of the rising sun”, that could easily have taken ten minutes to run – but your web browser would time out long before that, of course.

Accents, apostrophes and other punctuation are now handled much better. You can search for numbers too.

You can now deliberately search for a word more than once within a query – for example, an artist search for “The The” will only match artists containing at least two “The”s.

The ranking of search results is now much better, and is based on a
“similarity” algorithm. Try a track search for “love to hate you” and you’ll see what I mean.

If your search takes longer than 30 seconds, your query will be aborted and the web server will return a “Your search has been cancelled” message. Hence you get told what’s going on, and the server doesn’t get bogged down running a query which you’re not going to see the results of anyway.

Finally, support for an “any of these words” search has been removed, at least for now. You can now only search for “all of these words” (and because that’s the only option, that field has been removed from the search form).

Moderation Enhancements

The “artist filter” pages now won’t show certain artists when it’s appropriate – for example, if you’re merging artist A into something, then artist A won’t be shown as a possible “target” to merge into.

The “change track” page (changetrack.html) now uses the right default values in the form, and includes a “use guesses” as well as a “use current” button.

When entering album data (via /cdi/enter.html etc), some extra checks are made:

If all (or most) of the track names seem to start with the track number, then you are warned about this, and given the option to automatically fix the problem, go back and manually fix it, or continue regardless.

If you’re entering a single-artist album, but it looks (according to the the track names) like you’re entering a multiple-artists album – or vice versa – then you are warned about the possible discrepancy, and given the choice between going back and fixing it, or continuing regardless.

More “moderation suggestion” reports: possibly duplicate artists, albums which need converting to multiple artists, and tracks named with their own sequence (track) numbers. There’s also a pair of reports of TRMs with many tracks, and tracks with many TRMs.

Other Miscellaneous Changes

Many pages now perform much better validation of their inputs, e.g. data provided on the “query string”. For example, numbers must be numeric, etc. For “artist id” inputs, we reject the id of the “deleted artist” on most occasions.

Many web pages have had minor usability enhancements, e.g. moving the input focus using Javascript (preferences permitting). Some of the visual layout has been reworked, the aim being a clearer, simpler appearance.

For the first time, many of the pages are now (finally!) valid HTML 4.0! There have been many minor HTML and CSS improvements.

There is now an artistinfo.html page (like showtrack.html and
albumdetail.html) which shows the database internal ID and the MusicBrainz UUID/GID values.

The various download links now redirect you via a “pick a mirror” page. Your preferred mirror is remembered via a cookie, if possible.

The “bio” page (bio.html) has been expanded to include many more people.

Some links which used to only work with Javascript enabled now also work without it.

BACK END COMPONENTS

We’ve moved to Postgres 7.3! (specifically, 7.3.2 at the moment). This affects a few things, such as the “create” SQL scripts.

Support has been added for having the database on a remote host. Did we mention that we’ve got a new, dedicated database box now? 🙂

Database Export / Import

The export / import facility has been re-written. Well, the export bit has anyway. The principal change, and a really important one, is that the nightly database dumps that appear on our download sites are now a consistent point-in-time snapshot. (Previously the dumps it used to generate could be, and sometimes were, inconsistent, and therefore could not be re-imported).

As part of that rewrite, a couple of other changes came about too:

The export format is now Postgres “copy” format, which is basically a tab-separated text format. Since this is much easier to manipulate than the old Postgres “dump” format we had before, it’s now much easier to munge the exported data into non-Postgres formats. Hence, because the exported data is now much more database-neutral, there is no longer any explicit support for dumping in MySQL or ExcelCSV format.

The handling of the “moderator_sanitised” pseudo-table is now much cleaner; it’s all done via a simple temporary table, with no need for the “fill_moderator” function we used to have.

Also, the “SetSequences” script forgot to update the “wordlist” sequence – this is now fixed.

When loading data using “InitDB.pl –createdb –import”, foreign keys are now added /after/ the data has been loaded. This should make for a faster import, and it certainly makes it a whole load easier to diagnose consistency problems if you’re loading from an inconsistent dump for some reason.

Finally, the script to rebuild the search engine metadata (build_words.pl) is now *much* faster – something of the order of 40 times faster according to my tests. It does have the very minor drawback that you can now only use it on an “offline” system, because the metadata will be unusable during the rebuild.

Other Miscellaneous Changes

Deferred Updates

Two types of update are now done using the “DeferredUpdate” module – which simply means that some data describing the update to be done later is appended to a text log file, and something else will come along later and actually do the update. The updates in question are TRM lookup count and artist alias usage count.

Clearly this means that the main request now goes significantly faster. There are two caveats to this however.

the process to subsequently apply the updates in the log file isn’t yet written

the other day we managed to lose most of the log file (we’re not sure why this happened exactly) – so that’s about two weeks worth of TRM lookup counts / artist alias usage counts lost. Well, it’s only usage data.

Other changes

$artist->LoadByName has had a simple but effective speed-up added. The old behaviour was just to do a case-insensitive search straight away, but that doesn’t use an index; so now we try a case-sensitive search on some obvious variations (e.g. all lower case, all upper case, Title Case etc) – which does use the index – before resorting to a full table scan.

RDFDump.pl used to be /really/ verbose. It’s now only that verbose if standard input is a terminal.

Moves have been made towards using separate HTTP vhosts for web pages / RDF requests.

MISCELLANY

On Saturday, 29th March 2003 the new database server (hostname ‘bender’)
was put into production. This server was purchased with your donations
– thanks! It’s great. We’re very happy about this 🙂

At about 1155hrs UTC, the system was put into “read-only” mode (which
means that logins were disabled, and logged-in users were logged out).
Then the database was exported to text files, transferred to bender,
then imported into a fresh database. At about 1220hrs UTC, the web
server was configured to start using the newly-created database, and
the old database was shut down.

We’ve tried to make sure that no data could have been written to the
system during the upgrade process – such data would be lost when the
final switchover occurred. Please let us know if you think we missed
anything.

Once again, thanks for your donations, which made this possible! We hope
to be able to bring you a faster, more reliable, easier-to-use service
as a result.