Category: Data

Time to check the weather forecast for hell, because it appears to have frozen over! We have finally released a new Virtual Machine that contains all of the MusicBrainz server software and fixed all of the currently outstanding bugs (for the VM).

The new VM now uses a 64-bit architecture and has 80GB of disk-space so it should be much easier to get along with. I tried to ship one VM that has the search indexes build in, but after 3 hours (and increasing time) of trying to export that VM I killed it. If someone has better luck exporting a VM after building search indexes, please let me know. Also, VirtualBox seems to have improved in stability on Mac OS, so we are not going to build a VMWare version of the VM at this time.

MusicBrainz has linked to Wikipedia for many years and we now have links to Wikidata as well. Wikidata, however, acts as a central repository for Wikipedia links, so it does not make sense for MusicBrainz to maintain its own separate set of Wikipedia links, especially since Wikipedia URLs are not very stable (because of page moves and deletions) and require a lot of maintenance. Most of our data with Wikipedia links is now also linked to Wikidata, so we plan to start removing Wikipedia links where we have a Wikidata link which has the same Wikipedia link.

We plan to start removing the links after the schema change this month, starting with the less common languages and entity types. It will take a while to work through the existing links, so we don’t expect to start removing English links from artists until after the Autumn schema change.

We recognise that some people may have code which depends on these links – if you’re using these links and the above sounds problematic, please let us know how you’re using the data (which languages and entity types) and how much time you would need to support Wikidata.

Over the past few weeks I’ve received a number of emails from people who are concerned about some editors who are losing sight of some basic principles behind editing data in MusicBrainz. I wanted to chime in and remind people of some of the principles that should guide how we all get along when we edit data in MusicBrainz.

First and foremost is:

Be polite and give people the benefit of the doubt that they are doing the right thing.

I don’t have to explain being polite. Yes, we all have our bad days — that is a given. But if you’re having a bad day, stop editing MusicBrainz and step away from your computer. Go outside! When you do edit, please be kind to your fellow editors.

Giving people the benefit of the doubt that they are doing the right thing is also important. The vast majority of people who edit MusicBrainz have good intentions and you should assume that to be the case.

Second, edit to make the database better. Vote yes if an edit makes the data better.

This one is a lot more vague, since “better” is a subjective term. We should accept edits that are “good enough” and avoid asking people to make “perfect” edits.

Edits fit into four categories:

Edits that makes things better (perfect or not)

Edits makes things different (but neither are better)

Edits that contain some correct things and some incorrect things

Edits that are outright wrong (existing data is better)

The first type should clearly get a yes vote. For the second, if it doesn’t make things worse, abstain and leave a comment. The third is a judgement call and I would suggest applying this heuristic:

Unless it takes more time to fix the edit than to make a new one, vote yes.

Clearly, the fourth type deserves a no vote.

That brings me to the final topic for now: No votes. A no vote is a very strong expression that has potentially chilling effects that may prevent people from editing again. A no vote should be considered the last resort. Use a no vote if you can’t find another way to resolve an edit.

Finally, some tips for auto editors: If you see an edit that is not perfect, approve it and fix it.

Auto editors are supposed to set the tone for the project and auto editors should practically never vote no on something. You have more powers than fellow editors, so please use your powers for good!

What is AcousticBrainz?
The AcousticBrainz project aims to crowd source acoustic information for all of the music in the world and make it available to the public. The goal of AcousticBrainz is to provide music technology researchers and open source hackers with a massive database of information about music.

AcousticBrainz uses a state of the art research project called Essentia (http://essentia.upf.edu/), developed over the last 10 years at the Music Technology Group.

Data generated from processing audio files with Essentia is collected by the AcousticBrainz project and made available to the public under the CC0 license (public domain). In 6 weeks since its inception, AcousticBrainz contributors have already submitted data for 650,000 audio tracks using pre-release software.

Today we are releasing client programs to submit data to the AcousticBrainz server and our first public release containing audio features for over 650,000 audio files.

What data does it have?
AcousticBrainz contains information called audio features. This acoustic information describes the acoustic characteristics of music and includes low-level spectral information such as tempo, and additional high level descriptors for genres, moods, keys, scales and much more. These features are explained in more detail at http://acousticbrainz.org/sample-data

What can I do with it?
We hope that this database will spur the development of new music technology research and allow music hackers to create new and interesting recommendation and music discovery engines. Here are some ideas of things we would like to see:

Music discovery

Playlist generation

Improving the state of the art in genre recognition

Analytics on the musical structure of popular music

and more!

This is one of the largest datasets of this kind available for research, and the only one of this size that we know of which contains both freely available data as well as the reference source code used to compute the data.

How can I contribute?If you are a music researcher, you can help us by contributing to the essentia project. Go to the essentia homepage to see how you can do this. If you do something cool with the data let us know. We’d like to start a “made with AcousticBrainz” page where we will showcase interesting projects.

If you have any audio files, we would love for you to contribute audio features to our project. You can do this by downloading our submission clients from http://acousticbrainz.org/download. We provide clients for Windows, Mac, and Linux.

There have been no real changes to the setup of the VM images — we still publish a VirtualBox and a VMWare image. The instructions for using the VMs haven’t changed; the latest data and the May 26th version of the software are loaded on these images.

Barry Norton has been a star and has created and hosted RDF dumps of the MusicBrainz data and also established a permanent SPARQL endpoint for our data on linkedbrainz.org.

The timing of this is perfect, because our next release will remove the RDFa from our pages. Proper RDF data and a SPARQL end-point are the best ways to move forward with MusicBrainz data in the context of linked open data.

This gives the MusicBrainz development team the freedom to focus on making MusicBrainz better while leaving the nitty gritty parts of making our data friendly to the linked open data hackers to experts like Barry.