This post was originally published in 2009
It may contain stale & outdated information. Or it may have grown more awesome with age, like the author.

I’m in the middle of a project involving Wikipedia and ResearchCyc, and I needed to get the data contained in DBpedia’s Infobox RDF Triple file into a MySQL database so I could use it in conjunction with the database created by the excellent Wikipedia Miner.

This script parses out the Wikipedia page, DBpedia Infobox Predicate and Infobox subject, and inserts them into a MySQL table. I thought I’d share it with The Internet in case someone else wanted to work with DBpedia infobox data in the same way.

I used the DBpedia infobox data from this post: “DBpedia – Rethinking Wikipedia infobox extraction“, kindly provided by Georgi Kobilarov. The data linked to in that post is much more suitable for my requirements than previously available DBpedia infobox data – instead of multiple predicates for birth date – birthDate, dateBirth, dateOfBirth, birth…, they’ve been mapped to dbpedia:Person#birthdate. Wonderful!

The script could be improved: though the Wikipedia Pages and DBpedia Infobox Predicates are fine, some of the subjects are rather … interesting. As I’m currently only interested in the 200 most frequently occuring predicates, I haven’t put more time into smoothing out some of the more interesting subject data. If anyone makes changes to the script, please let me know and I’ll update this post.

Recent Posts

I had high hopes for Convoy, but time restrictions and changes to Rapid Weaver have made those hopes unrealistic. It is for these reasons that I must announce that Convoy will no longer be maintained or updated. I apologise to those of you who found…

Slider 2 was my first Rapid Weaver plugin. It is with a heavy heart that I announce its end of life. I believe that its utility is diminished in the current Rapid Weaver ecosystem that there is no longer any point to its existence. Happily,…