Thesis Complete

This morning I submitted a near-final draft of my thesis 'Multiple Versions and Overlap in Digital Text' to my two supervisors. The last chapter describes some new work on aligning multi-version texts automatically. Here's a table taken from the thesis which summarises its performance on a variety of multi-version texts.

The SZ column is the average version size in kilobytes, NV is the number of versions, TT is the total time taken to merge all versions, AT is the average time to merge one version after the first, both in seconds. The test machine had a 1.66GHz Core Duo processor, using one core. The Romulo doesn't merge properly at the moment because there is almost nothing in common between the versions, so the merge times don't mean much in this case.

The key is the AT column, which is how long it takes to 'save' an edited version back into the document. As you can see, it's pretty fast, considering that this is a hard problem. As far as quality goes, I can't see any bad alignments or false transpositions, except in the Malvezzi case. Once I can coerce the input into a sensible format this should also work.

Balisage

It looks as if I will be going to Balisage this year. I will be presenting a boiled down version of Chapter 5 of the thesis, which is all new work. I'll be very interested to hear their reactions, especially as I can now demonstrate the theory. (Their motto is 'There is nothing so practical as a good theory').

No comments:

About Me

I have a BA (1980) in Classical Greek language and Ancient History, a PhD from the University of Cambridge, UK in Classical Greek papyrology. I have recently been awarded a second PhD in the ITEE School at UQ for my thesis on 'Multiple Versions and Overlap in Digital Text'. From 1989 until 2005 I worked for the Cambridge Wittgenstein Archive making an edition of Wittgenstein. I spent three years as a Computer Associate running the IT support group at the Wellcome/CRC Institute in Cambridge. I have also spent three years developing a commercial license managment system for Mac OSX. I then worked for three years on the Leximancer text mining application (www.leximancer.com) which strives to extract meaning from natural text. I currently work at the Queensland University of Technology Information Security Institute. Since 2002 I have worked with Digital Variants, since 2010 with the HRIT group at the University of Loyola, Chicago, and since 2012 on the AustESE project at the University of Queensland.