Standards for Complex Scripts

We are living in the Information-Age. All of us are looking for useful information. Computers with networks are good amplifiers of information.

We know that most computers are great number crunchers. Apart from processing numerical quantities, general purpose computers need to handle text (written words).

Coding Text

Computers store alphabets, digits, signs and symbols using numerical codes. Numerical codes are nothing but simple numbers. E.g. Capital letter 'A' is generally stored as number 65. In past, different types of computers used different code-tables for Roman alphabets. Even different programs on same computers were storing text using their own private code-tables! In such situation, when we use a different program to read and display text stored previously by some other program, we get garbage.

Usefulness of our text increases only if different types of computer systems and programs can handle (read) it. We can ensure it by adhering to some universally accepted encoding standard for text.

Unicode

In nineties, computer scientists, vendors and savvy users felt an urgent need to have a universal encoding scheme for all the written languages of the world. This was addressed first by ISO and then by a non-profit organisation named Unicode.

At present, it seems that Unicode is going to remain de-facto standard for decades! So it is very important that operating systems and programs (like word processor) should be able to handle Unicode text properly.

Before Unicode, Indians were keeping Indic Akṣhars in place of Latin letters as there was no support for Indic scripts at operating system's level.

Text Processing

Text Processing means analysing key-strokes (being entered in input devices like keyboard), storing (saving) and rendering (displaying ) text. 'Storing' and 'transferring' international text is easy if standard coding i.e. Unicode is followed. So, most of the operating systems can store text of any script.

Rendering Roman (Latin) text is also simple, because --
a) letters, digits, signs and symbols are placed linearly (sequentially),
b) there is one and only one letter-form (shape/glyph) for every letter (character);

Uppercase letter 'A' and lowercase letter 'a' are treated simply as different characters (A is 65, a is 97). So all the operating systems can render Roman text.

Displaying Complex Text

Rendering Indic and Arabic scripts require complex text layout operations, like --
a) substitution: consonants may join to produce one conjunct,
b) positioning: a letter-form (mainly vowels) may require positioning relative to other (base) letter-form (consonants),
c) rearrangement: order of glyphs may change,
d) alternate forms: a character can have different forms depending upon surrounding characters. [mainly in Arabic]

Only fonts are not enough for displaying these 'complex scripts', the operating system must have corresponding software components. Most of the operating systems do not have software to render all the complex scripts.

A font-format is a specification (description) or a kind of standard, to bring order to how fonts should be designed/created. Basic TrueType font-format support simple typography required to render Roman text. This TrueType font-format is extended by some computer vendors/organisations to support 'rich typography'. OpenType, AAT and Graphite are some popular extensions of TrueType.

In order to conform to Unicode and support complex scripts like DevaNagari operating systems should have various software components like --
a) Text rendering library/services e.g. OLTS, ATSUI
b) Complex-script shaping engine e.g. Uniscribe, MLTE

In Windows 2000, XP and Vista these software components are present and standards are followed. Similar components also exists for Mac OS-X and are being developed for Linux (e.g. FreeType).

Though these components are available in these operating systems, but we need to enable the language we want to use. Only when a language is enabled and selected; an OS interprets characters coming through keyboard as those belonging to the script associated with that language. Also some components (mainly better fonts) from other vendors should be installed to properly typeset and render typographically correct text in Indian scripts.

Software for Complex Scripts

We generally use word-processors to enter, store and organise text. Lot of the word-processors depend on the operating system's services to render text. But there are few powerful word-processors which can perform complex text processing (on their own) even if the operating system do not provide such facilities. Then there are a lot of dumb word-processors which do not provide complex text processing even if the underlying operating system has required software components to render complex text.

In addition to a standard based operating system and modern OT/AAT fonts, a standard based word processor like OpenOffice should be used.