Thursday, 12 March 2009

At the NZETC, the base format for all our texts is a format called Text Encoding Initiative (or TEI) and it's an XML format that we used to store all our texts in. We've just gone live with an updated version of the format, called 'P5.' As an end user, you're unlikely to notice the difference between the previous version an P5, but it enables us to do interesting things which you hopefully will notice. These include:

Representation of documents which have large additions (such as books with newspaper clippings pasted into the covers)

Representation of non-Unicode glyphs

The ability to piecewise add extra functionality to encode new features as we take on new projects