Featured Dataset: Bricklink

Featured datasets usually end with an emphasis on what you can build on top of them, so I’m irretrievably drawn to feature the Bricklink set today—a set you can build on top of, and features data built for building… er… with Lego.

Bricklink is a Lego marketplace. Essentially it is the EBay for Lego, where you can buy or sell anything to do with Lego. The Lego community maintain a number of fantastic resources that catalogue all of the Lego products, often down to the detail of the number, colour and type of each part in a Lego set.

See what I mean about Lego? This dataset is licensed as Creative Commons CC-By, so is free to reuse with attribution (more on that below). So let’s first dive into the toybox of developers’ documentation, and see what the publisher (Leigh) has given us to play with:

Currently the dataset consists of descriptions of over 9000 Lego sets and over 23,000 Lego parts. A number of the lego sets have detailed inventories including number, type and colour of parts. There are also links to sets, instructions and part images.

The Documentation also leads with a quick list of some potential uses for the dataset:

Potential Uses

Building a Lego set browser

Building a Lego inventory manager

Building a Lego project or set recommender based around identifying sets that someone may be able to > create with existing parts

As the core of a larger dataset that aggregates additional information, e.g. inventories of custom > Lego models produced by the community.

When adding to developers’ docs, I think I prefer it when they begin with this list. Firstly, this sets a good context for me as I read up and imagine examples fitting into applications. Also, it helps me to better understand why the data’s been structured the way it has. I think this works well, and it would be nice to see on new datasets too.

A quick look through the data model shows us how many different topics are covered in this set, and personally I didn’t know this many topics could be covered by Lego! An example resource (http://data.kasabi.com/dataset/bricklink/colour/67.html) is a colour that Lego can be. We get a very comprehensive picture of this colour, with a name, description, ID and even an RGB hex value of the colour: C0C0C0 (which is called Silver in my browser). It turns out that the metallic silver bricks were first introduced in 1957, which also gives us a hint at how much detail Lego geeks can bring to a brick!

Indeed:

The following properties are associated with Colour:
Property NameProperty URINotes
Colour Type brick:colourType Type/classification of the colour
Identifier dct:identifier
Label rdfs:label
Rgb brick:rgb RGB value for the colour
Year Ended brick:yearEnded Year the colour was discontinued
Year Introduced brick:yearIntroduced Year the colour was introduced

When searching for other terms, I used the Search API to simply look for other things that were ‘silver’, and to my amazement discovered one of these: <http://data.kasabi.com/dataset/bricklink/minifig/sw229&gt;. If you use the lookup API, or simply cut and paste that into your browser, you can share my amazement :)

Some interesting properties in Bricklink point to the relationships between bricks and complete Lego sets. Between topics such as inventory, set, and part we can see how particular configurations of bricks are meant to go together. This provides the potential to find out which sets a particular group of bricks should belong to, get a picture of the dimensions of the individual parts (i.e. bricks), and all sorts of other combinations. I’m guessing there must be lots of people out there who have incomplete sets, and want to know which parts are missing.

As mentioned above, Bricklink is freely available to use and reuse, under the Creative Commons Attribution license. While it’s possible to use some datasets technically without any attribution, it seems like a good idea to me to point back at the dataset, and in this case, it’s part of the license agreement. In Kasabi, that’s easy to do through the Attribution API. Under the Attribution tab on Bricklink’s set page, there is an automatically generated embeddable script, which provides attribution for this particular dataset wherever it’s used.

So, there you have it, a dataset built out of Lego for helping developers construct Lego-based apps. It seems to me that the possibilities listed at the beginning are all on the minds of Lego-folk to help them achieve their multi-coloured, plastic construction dreams and put together master sets. One thing I’d like to add to the possibilities here is the idea of using the dimension data of bricks to put together an app which visualises bricks and sets by the size and shapes of the bricks… oh, and colour too!

I’m throwing an idea here that maybe others already had:
I’d like to know what is the minimum number of sets i should buy that allow me to build, through remixing their parts, the highest number of sets I don’t own yet.
uhmm… It seems kinda like a set-covering problem… :)