Introduction

Microsoft's official specification for the OpenType font file format is a somewhat dry and, of course, a very technical document. Reading through it is not a task for the faint-hearted! I'm interested to understand some parts of it so I recently purchased a copy of DTL OTMaster which has proved to be absolutely invaluable. At the time of writing DTL OTMaster costs about 250 euros but the time it can save you makes it worth every penny. This post is not intended as an "advert" for the software, just a quick demo of a really great tool that you may not have heard of; so here are some screenshots of what it will show you. In the screenshots below, OTMaster is displaying the open source OpenType (TrueType) font Scheherazade .

Screenshots

Here are some screenshots showing the internals of Scheherazade. Programmers will note that you are provided with information on the data types of various entries – the same data types referenced in Microsoft's specification. Very useful indeed! It's worth noting that OTMaster has many other features in addition to displaying the technical data – including some features present in Microsoft's VOLT – and in some areas they are better implemented than in VOLT, particularly the ability to preview multiple glyphs with mark-to-base positioning.

The "root"

On the left is the internal font structure: at the top is the "root" entry where you can see the glyphs in the font.

Summary information

Summary of key data contained at the start of the font.

cmap table

The following screenshot shows the font cmap table(s) – the font's mechanism to map from character codes (e.g., Unicode) to the internal, and font-specific, glyph identifiers (indices).

Introduction

Using an external pre-processor (built using HarfBuzz) you can achieve affects that are not possible (or, at least, not easy) directly with XeTeX. Here's a simple example of colouring Arabic vowels – this example is likely to be possible with XeTeX alone, but it's just a quick demo – many other interesting possibilities come to mind. At the moment the Arabic string is hardcoded into the pre-processor, just for testing, but I plan to make it read from files output by XeTeX – it's just a proof of concept. The vowel positioning was achieved by putting the vowel glyphs in boxes and shifting them according to the anchor point data provided by HarfBuzz.

Introduction: A very brief post

This is an extremely short post to note one way of building the superb HarfBuzz OpenType shaping library as a static library on Windows (i.e., a .lib) – using an elderly version of Visual Studio (2008)! The screenshot below shows the source files I included into my VS2008 project and the files I excluded from the build (the excluded files have a little red minus sign next to them). In short, I did not build HarfBuzz for use with ICU, Graphite or Uniscribe and excluded a few other source files that were not necessary for (my version of) a successful build. I've tested the .lib and, so far, it works well for what I need – but, of course, be sure to run your on tests! You will also need the FreeType library as well, which I also built as a static library. HarfBuzz also compiles nicely using MinGW to give you a DLL, but I personally prefer to build a native Windows .lib if I can get one built (without too much pain...)

Here are the preprocessor definitions that I needed to set for the project

WIN32
_DEBUG
_LIB
_CRT_SECURE_NO_WARNINGS
HAVE_OT
HAVE_UCDN

A tip, of sorts, or at least something that worked for me. When using the HarfBuzz library UTF16 buffer functions in your own code, you may need to ensure that the wchar_t type is not treated as a built-in type. For example, using wide characters like this const wchar_t* text = L"هَمْزَة وَصْل آ"; and, say, hb_buffer_add_utf16( buffer, text, wcslen(text), 0, wcslen(text) );. Within the project property pages, Set C/C++ -> Language -> Treat wchar_t as Built-in Type = No

A simple example to get you started

Based on code generated by the superbRegexBuddy software (the price is great value!), here's a simple example of using the PCRE regular expression library to search a UTF-8 text buffer for strings of Arabic text. The actual regular expression is very simple: ([\\x{600}-\\x{6FF}]+) – it just looks for sequences of Unicode codepoints from 600 (hex) to 6FF (hex). Not a particularly efficient function but it works – e.g., should calculate buffer length once etc.

I used code like this in an Arabic text pre-processor I wrote for working with XeTeX: saving Arabic strings to a file (from XeTeX), processing the text and reading it back in via \input{...}. Special effects not directly possible in XeTeX can be achieved by a pre-processing step. Yep, involves lots of \write18{...} calls. For sure LuaTeX offers many other possibilities but XeTeX's font handling (and use of HarfBuzz) are very convenient indeed!