My good friend Richard Brent has often complained that my blog has very little Shakespeare content. Despite the domain name, I don’t think I’ve ever blogged about The Big S. For shame! Fear not, my Brentish-Boy, this post is all about Shakespeare. And MySQL….

Ahem…

When I first started shkspr.mobi it was intended to be an easy way to get Shakespeare on your phone. At that time, there were no mobile formatted texts of his plays and sonnets, so I had to create them. Finding Shakespeare’s works in a suitable format for conversion wasn’t too hard – but it meant lots of crufty code to read text files line-by-line. Yuck.

I’ve stripped out a lot of the extraneous stuff from the original version – word counts, etc. So it should be a fairly lean database which is easy to use. I’m not a database professional, so I would be grateful if you could suggest any improvements. Either using this blog’s comment form or on GitHub..

There are four tables

Paragraphs

This is where the main body of text is. A typical row will look like this

WorkID: hamlet

ParagraphID: 639015

ParagraphNum: 3427

CharID: hamlet

PlainText: Has this fellow no feeling of his business, that he sings atngrave-making?

Act: 5

Scene: 1

Works

This is what translates the “WorkID” into something human readable – plus some extra metadata

WorkID: hamlet

Title: Hamlet

LongTitle: Tragedy of Hamlet, Prince of Denmark, The

Date: 1600

GenreType: Tragedy

Character

This is what translates the CharID into a human readable name and description

charID: hamlet

CharName: Hamlet

Abbrev: Ham

Works: Tragedy of Hamlet, Prince of Denmark, The

Description: son of the former king and nephew to the present king

Chapters

This gives the setting for each Act and Scene.

WorkID: hamlet

ChapterID: 18893

Act: 5

Scene: 1

Description: Elsinore. A churchyard.

What’s Next?

The next steps for the project are fairly obvious:

Write some high level example code to show people how to use the database.

11 thoughts on “Open Source Shakespeare (in MySQL)”

Terence, the OSS site itself runs on MySQL, and has since 2003, when I launched the beta version of the site. The download page provides Access and CSV files because those are the most easily-consumed versions of the database. For whatever reason, I’ve actually never been asked for a mysqldump version of the site — probably because whenever someone has downloaded the db, they want to use the database in their own personal project, so they’d rather import the data into their own table structure, rather than replicating OSS’s.

I’m glad you’re finding the database useful, and I’m also glad to see it up on github.

Has this fellow no feeling of his business, that he sings atngrave-making?

I think that this may be a question for both Terence and Eric then. The text has a new-line character n in it, which really really annoys me far more than is reasonable. Does this text really need formatting in it? Can’t I decide how to word wrap the text?

Shakespeare is traditionally broken down into lines. The allows the reader to see rhyming couplets, get a sense of rhythm, etc. It’s also useful for long soliloquies to be able to reference specific lines.

The original DB used “[p]” to show new lines. I wasn’t aware of any parsing tools which could easily strip that out and replace it with, e.g. >br/&lt>

What one could do is create a separate tale which lists where the line breaks should be – then remove them from the text. To be honest, I think it’s probably easier for the user to strip out the n is they’re not needed.

Thanks for this, which I found via Bill Thompson on Twitter (@billt). One thing I notice immediately is that the sql file doesn’t have the table definitions. I can guess more or less how the tables should be created but it would be useful if these could be included. A ‘mysqldump’ should produce a file including all you need to recreated the database elsewhere.

You should avoid multi-valued columns such as the ‘Works’ column of the ‘Characters’ table. Instead, have a separate table with two columns; CharId and WorkId. This will make it much easier to extract the data.

I am a database professional (of sorts) but not a great expert on Shakespeare.

Sir Tim Berners-Lee proposes “5 Stars of Linked Open Data“, the last of which is “link your data to other data to provide context”. Accordingly, I’d suggest you add a line (or lines) to your “Works” table, with the URIs of, say, the equivalent English Wikipedia articles, and/or, their DBPedia (data) equivalents.

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Support this blog

Enjoyed this blog post?

Buy me a birthday present

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.