5.3 Keystrokes and codepoints

Input methods represent the third of three basic components for working with text data on a computer that were introduced earlier. In general, input methods can include things like voice- or handwriting-recognition. Keyboards are the most common form of input, however, and also the only one that is easily extended or modified. This discussion will therefore focus on keyboard input.1

5.3.1 From keystrokes to codepoints

Just as codepoints and glyphs are the counterparts to characters in the encoding and rendering components, keystrokes are the counterpart to characters in the keyboarding component. Whereas characters (or codepoints) get transformed into glyphs in the rendering process, keystrokes are transformed into codepoints in the input process.

All computer operating systems include software to handle keyboard input, and many provide more than one keyboard layout; that is, more than one set of mappings from keys to characters. Many keyboard input methods use a strictly one-to-one mappings from keystrokes to characters: for each keystroke, there is one particular character that is generated.2 But some keyboards provide alternative mappings based on a previously-entered “dead key” (a key that does not enter a character, but rather changes the character entered by the following key). For example, typing “`” followed by “a” to get “à”, but “`” followed by “o” to get “ò”.

Just as the mapping from characters to glyph might involve complex, many-to-many mappings, the same is potentially true for keyboard input. For example, it would be possible to have a single keystroke that generated a sequence of several characters, such as <n, g, b>, or <GREEK SMALL ALPHA, COMBINING ROUGH BREATHING MARK, COMBINING ACUTE, COMBINING IOTA SUBSCRIPT>. Similarly, it would be possible for a sequence of keystrokes to generate single characters, perhaps with each keystroke in the sequence changing the previous output.

Different input methods might generate exactly the same characters, though in different ways. For example, one may use a single keystroke to generate a given character, while another uses two or more keystrokes (the first, perhaps, being a dead key) to generate that character.

Figure 11: Two input methods: same character, different number of keystrokes

Complex input methods can map a single keystroke to different characters, depending on context. For example, in the case of Greek sigma, an input method may use a single key, such as the s key, to enter all forms of sigma, with the input method generating either a FINAL SIGMA or NON-FINAL SIGMA according to the context:

Note that, when the s key is pressed, the sigma is at that point word-final. It is the next character that is entered that determines whether or not the sigma will remain word-final. If another word-forming character is entered, then the FINAL SIGMA is changed to NON-FINAL SIGMA.3

The point to see in this discussion of input methods is that the mapping of keystrokes into characters is potentially a complex process involving many-to-many mappings, just as in the rendering process. Users familiar with systems designed to support English would certainly be familiar with keyboards that use one-to-one mappings only. But keyboard processing need not be limited in this manner, and many keyboard implementations are not.

Copyright notice

(c) Copyright 2003 UNESCO and SIL International Inc.

Note: If you want to add a response to this article, you need to enable cookies in your browser, and then restart your browser.

Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.

We will use the term keystroke to refer to the pressing of any basic (non-modifier) key in combination with zero or more modifier keys. By modifier keys, we mean keys such as Alt , Control , Shift , Alt-Graph , Option , and Command .