Entity Linking for the Music Domain

ELMD: Entity Linking for the Music Domain Dataset

ELMD is a corpus of annotated named entities from the music domain that comes from a collection of about 13k Last.fm artist biographies. Entities are linked to DBpedia thanks to a voting system among different state of the art Entity Linking systems (ELVIS) with a precision of at least 0,94. In addition, by setting up a higher confidence threshold it is possible to obtain a subset of ELMD that prioritizes higher Precision by sacrificing Recall.

ELMD 2.0

During the last months we have reviewed and expanded ELMD, expanding it as follows:

Most of the entities are also linked now to MusicBrainz (Mapping retrieved through Last.fm API)

More annotations have been added by propagating existing annotations throughout the document in which they were found, assuming they appear in a one-sense-per-discourse fashion.

New output formats have been added: NIF and GATE

We provide updated statistics on the new dataset, e.g. number of annotations, unique entities by category, as well as percentage of annotations and unique entities with successful linking to reference KBs. Note that all entities are classified into one of the four categories and linked to Last.fm, and from there, these may be linked to DBpedia and MusicBrainz, to only one of them, or to none.

Annotations

Entities

All

144,593

63,902

Artist

112,524

39,131

Album

18,701

15,064

Track

9,203

7,832

Label

4,165

1,875

Annotations

Entities

DBpedia

58.6%

49.1%

MusicBrainz

93.6%

91.1%

Both

57.2%

47%

None

5%

9.2%

ELMD 2.0 is available in the following formats

In the JSON version every biography is stored in a separate document and splitted in sentences. For every sentence, annotations are stored as a list of entities with the following fields: startChar, endChar, uri (DBpedia URI), mbid (MusicBrainz ID), category (Artist/Album/Track/Label), and lastfm_url (Last.fm URL). Track and Album entities may have an additional mbid_artist field, which provides the artist's MusicBrainz ID.

In the XML version, entities are annotated inside text using the category of the entity as the XML tag and with 3 attributes: dbp (DBpedia URI), mb (MusicBrainz ID) and lfm (Last.fm URL).