Sorry for the length of the following; if you're not interested,
skip it. The intention is to bring likeminded parties out of the woodwork;
if you are one, please contact me and we can continue the topic offline.

THIS IS A PREFORMATTED PLAIN-TEXT ASCII DOCUMENT. IT IS DESIGNED TO BE
VIEWED AS-IS IN A FIXED-PITCH FONT. ITS WIDEST LINE IS 79 COLUMNS. IT
CONTAINS NO TABS. IF IT LOOKS MESSY TO YOU, PLEASE FEEL FREE TO PICK UP
A CLEAN COPY AT:

A selection of terminal graphics characters is proposed for Unicode [24]
and ISO 10646 [19] to allow Unicode-based terminal emulation software to
(a) display glyphs that are found on popular types of terminals but
currently are not available in Unicode, and (b) interoperate with other
Unicode applications.

Terminal-host communication was the dominant form of interaction between
human and computer from about 1974 (when CRTs became affordable) to about
1994 (when the Web and Windows took over the mass market). Terminal-host
communication is still widespread, especially in large organizations, and is
expected to remain so for decades to come, playing an important part in
organizations like universities, hospitals, and government agencies, as well
as corporations, with central computing facilities, for use in applications
ranging from sofware development and system/network administration, to email
and text-based Web access, to data entry and inquiry, to transaction
processing.

A terminal, for purposes of this document, is a device for entry and display
of text in a fixed-pitch font on a screen (or on paper) in which characters
are displayed in rows and columns of fixed size "cells". Terminals
generally display the characters of ASCII [1] or EBCDIC [13], and sometimes
also accented or non-Roman letters (or ideograms), and often also "graphic"
(non-alphabetic, non-digit, non-punctuation) characters for purposes of
line- and box-drawing, mathematics, or other special effects.

In recent years, physical terminals have largely disappeared from the scene,
their functions subsumed into PCs running terminal-emulation software
alongside other applications. Unicode has effectively met the need for
encoding the earth's writing systems, but it is not well suited to terminal
emulation since it lacks some of the required graphics characters.

Without a standard encoding for the missing glyphs, each maker of terminal
emulation software must create or contract for custom fonts with private
encodings. Such fonts are not compatible with other (otherwise compatible)
fonts on the same platform (e.g. when copying and pasting between
applications), nor with each other. Furthermore, should Unicode printers
become standard equipment on PCs, terminal graphics characters will not
print correctly on them.

This document proposes a modest repertoire of terminal graphics characters
to be added to Unicode and ISO 10646, with specific encoding to be decided
by the UTC or other appropriate body, that all makers of fonts, code pages,
and printers can refer to in designing their products, and upon which all
makers of terminal emulation software can base their screen displays.

For best results, this project should be a cooperative effort among those
who care about both terminal emulation (and emulation of particular
terminals) and the Universal Character Set. Unfortunately, in many cases
the actual owners or creators of the original terminal character sets in
question are no longer available for consultation.

which is the basis for numerous PC-oriented so-called ANSI emulations.

Even within this fairly narrow scope, the task of settling on a set of
character-cell terminal graphics for Unicode is complicated by the
well-known problems that affect other preexisting character sets to varying
degrees:

1. Lack of official names for the characters.
2. Lack of definitive, high-quality pictures of the glyphs.
3. Lack of descriptions of the purpose and intended use of the glyphs.
4. Lack of a current registration authority or owner.
5. Questions of unification of glyphs from different terminal makers.
6. End-user demand for specific characters or sets.

The issue of unification is complicated by the fact that many of the
terminal graphics characters are designed to join at cell boundaries to form
"pictures" (such as boxes or forms to be filled out) or large characters
(such as big math symbols) spanning multiple rows and/or columns. The
relationship of similar-looking glyphs for different terminals is difficult
to determine -- e.g. exactly where does a line touch an edge, and at what
angle, and does it make a difference? In linguistic terms, which glyphs may
be considered allographs, and which are distinct graphemes?

This proposal does not require any action for well-known terminal
presentation forms such as double-high and/or double-wide characters, bold,
blinking, inverse, underlining, color, etc, since these are not encoding
issues. In particular, no special code points are needed for double-high or
double-wide characters, such as those seen on the DEC VT100 family of
terminals, nor for compressed characters as seen on Data General and DEC
terminals.

This proposal also does not cover true graphics terminals, such as Tektronix
vector graphics units, DEC ReGIS or Sixel graphics, etc, since these
graphics regimes are not character-cell based.

Note that the graphic characters listed in this proposal rarely, if ever,
appear on keyboard key labels. In general, these characters are never
typed, not even on real terminals, but are displayed when the terminal is
commanded into a special mode; for example, with ISO 2022 [17] character-set
designation and invocation escape sequences.

3. ORGANIZATION

This proposal groups terminal graphic characters into four major categories.
Some categories are complete by definition (e.g. the 2-nibble hex codes, of
which there can be only 256), but others should include space for expansion
as new glyphs are discovered or needed. The categories are:

Math Symbols
Although most math symbols found on terminals are already in Unicode,
certain terminal-based applications rely on the ability to construct large
symbols (integral and summation signs, braces, brackets) from smaller
character-cell-sized pieces.

Line and Box Drawing
Used for data entry, transaction processing, forms filling, etc, in
markets ranging from car rental and airline reservations, to medical
information systems, to online library catalogs. Although Unicode does
include a basic set (mainly those as U+2500), some others are missing.

Other Miscellaneous Character-Cell graphics.
Padlocks, stick-figure people, etc, e.g. to indicate the state of the
keyboard and/or host application, as well as mosaic graphics cells,
and assorted pictures and dingbats.

This document lists the terminal graphics characters for the terminals in
Section 2, to suggest unifications, and to assigns preliminary, temporary
Unicode values from the Private Use area:

For a total of 512 positions, not fully populated. Obviously the final
counts, code values, and block allocations, including reserved positions,
are likely to change as this proposal evolves.

All new characters proposed in this document should be precomposed, since no
terminals (with the exception of certain APL and ALA terminals) are capable
of composing characters on the fly from nonspacing diacritics or by
overstriking.

4. GRAPHIC REPRESENTATION OF CONTROL CHARACTERS

Several methods are available for "printing" control characters. First,
there is the de facto standard collection of dingbats in the 0x00-0x1F range
of IBM PC Code Page 437 [14]. As shown in Table 4.1, this is already
adequately covered by Unicode (in which "Code" is the Unicode value and
"IBM" is the IBM Code page value, both hexadecimal).

(Note that "black" and "white" are used in accordance Unicode terminology,
where they denote the presence or absence of (black) ink on the page;
however, any colors at all can appear on a terminal screen.)

More useful in a terminal emulator, however, is the ability to display the
the official abbreviation [1,18], or "name", of the control character in a
single cell, as is done by numerous terminals, as well as by data analyzers
and line monitors, which themselves also tend to be increasingly implemented
in software on PCs.

Some control characters have two-character abbreviations (such as CR, LF,
HT, FF), while others are three characters (NUL, SOH, DC1, DLE). Some
terminals compress three-letter abbreviations to the two-character forms
shown in Table 4.2. All terminals, however, display the abbreviations
diagonally in the character cell, as shown in Figure 4.1.

Unicode already has a block of Control Pictures at U+2400 through U+2421,
but (except for "NL" at U+2424) these go horizontally across the character
cell, rather than diagonally, thus making them difficult to distinguish from
normal alphanumeric text. A new, parallel block of C0 control pictures is
needed in which the abbreviations are displayed diagonally. These are
listed in Table 4.2, in which "Code" is the temporary Unicode value, "Name"
is the official (ASCII) abbreviation (and the one used in the Display
Controls character set of the VT220 family [5]), and "2X" is the 2-character
abbreviation (used in the Display Controls font of Televideo [22,23], HP [11],
Perkin Elmer [20], and other terminals).

There is little to gain by defining separate 2- and 3-character glyphs for
control characters that have 3-character names; therefore it is suggested
that the full abbreviation (from the Name column) be used, with the
characters arranged diagonally within each cell (rather than horizontally as
in the U+2400 block), and that the 2X column be ignored.

C1 Control characters are specified in ISO-6429 and used in the VT220
family of terminals [5] and the Wyse 370 [26], where they are represented
in the right half of the "display controls" font as shown in Table 4.3 (DEC
terminals use the full name, Wyse terminals use the 2X name). As with C0
controls, the "name" is displayed diagonally within the character cell.
Unicode presently includes no C1 control pictures.

Note that three of the C1 control pictures are unassigned (the ones marked
by "(1)", that would be at U+E020, U+E021, and U+E039 if these were
assigned). These positions should be left vacant in case names are assigned
to these characters in a future revision of ISO 6429.

As with C0 controls, it is presumed acceptable to encode the full
abbreviation, without the 2-character alternatives for 3-character forms.

Table 4.4 shows the names of control characters unique to EBCDIC (that is,
the ones it does not share with ASCII).

Names for IBM 3270 terminal Orders, LU 1 SCS Control Codes, and Format
Control Orders, which are not already listed as ASCII or EBCDIC control
codes, are shown in Table 4.5, to be used in debugging 3270 data streams.

Notes:
(1) Used for DEL on Televideo, HP. Similar to U+25A9, but without border.
(2) Already in Unicode.

Summary:

115 new characters required for graphic representation of
control characters. Range: U+E000 through U+E09F, 160 positions with
45 vacant for expansion.

5. HEX BYTES

Hexadecimal byte values, 2 hex digits each. Like display controls, but for
all 256 8-bit byte values, showing the byte code in hexadecimal, rather than
the (context-dependent) name. For hex debugging (in terminal emulators,
line monitors, protocol analyzers, etc). Should be arranged diagonally
within the character cell as shown in Figure 5.1:

One glyph is required for each hex byte code 00 through FF, or 256 glyphs
in all. Suggested temporary codes: U+E100 through U+E1FF.

Note that the SNI "IBM" character set contains glyphs for 01 through 1F,
which are shown sideways. I see no reason to encode these separately, but
others might disagree.

Summary: 256 new characters, U+E100 through U+E1FF.

6. MATH SYMBOLS

Unicode has a generous supply of math symbols, and no doubt more are in the
works. And of course it also includes the Latin, Greek, Fraktur, Hebrew,
and other letters used in mathematical notation.

However, terminal emulators also need special glyphs designed to be joined
together in adjacent character cells, vertically or horizontally, to form
large math symbols such as integrals, summation signs, braces, or brackets,
such as the integral top and bottom that already exist at U+2320 and U+2321.
Several other single-cell characters are also missing, including the small
radical sign from the DEC Technical character set. Table 6.1 lists the
needed characters, along with suggested temporary codes for them. At least
one real terminal reference is shown for each character, in column/row
notation, or an IBM Graphic Character Global Identifier (GCGID) [14]. Note:
SB stands for Square Bracket.

Notes:
(1) Also GCGID SS280000 and SS29000.
(2) I'm not too sure about some of the SNI symbols. I'm only guessing at
what the pictures (in the SNI 97801 manual) are supposed to mean; there
are no accompanying character names or text.
(3) These look like permutations of lowercase Latin letter n with hook
(small eng), in various sizes, with or without a vertical accent mark
on top. It's not clear to me whether these can be unified with any
existing Unicode characters.

As far as I can tell, none of the SNI letterforms listed above are in
Unicode 2.0.

A particular need addressed by this proposal is the continued ability to
support (sometimes mission-critical) terminal-based forms-filling
applications that also require entry and display of international
characters, as terminals are replaced by PCs. So far, Unicode has provided
the international characters, but not necessarily all the needed
character-cell based forms-drawing capabilities.

Some terminals have vertical and horizontal lines that are not centered
within the character cell, and currently not found in Unicode. Others have
black rectangles or other shapes not found in the U+2580 block.

Quadrant
A black rectangle filling one quarter of a cell, with one corner in the
center and the opposite corner at a corner of the cell. So "Quadrant UL"
is the upper left quadrant; "Quadrant UL and UR" is the top half of the
cell (which happens to be coincident with U+2580 and so is not included
here).

Line
Refers to a line that extends all the way to opposite edge(s) of a cell,
designed to be joined to (a) line(s) in the adjacent cell(s).

Bar
Refers to a horizontal line that does not touch any cell edges.

Wedge
Refers to a character cell with a diagonal line connecting opposite
corners, dividing it into two triangles; one black, the other white. Thus
an UL Wedge is similar to U+25E9, except it fills the entire character
cell.

Framus
(Pick a better word!) is a shape composed of two triangles with their
points meeting at the center of the cell to form an X with bars across the
top and bottom, closing the open ends. A black framus has the two
triangles filled in; a white one is in outline form. A framus with center
bar has a horizontal line through the center of the cell.

Notes:
(1) The vertical box lines are near, but not touching, the left and right
edges of the cell, respectively, and are two pixels thick on the H19
screen. Similar to IBM GCID SF640000 and SF650000, respectively.
(2) The center horizontal scan line is already in Unicode at U+2500.
(3) Only on Zenith models, not original Heathkits.
(4) Full black diamond, with points touching center of each cell wall.
(5) Similar to U+2504 but double rather than triple.

Notes:
(1) The reverse question is essential in VT terminal emulation, where it
indicates that an invalid code was received, or a parity or other
error was detected. It also stands for SUB and/or RS in Wyse display
controls mode, and is the glyph for 0xFF in the Televideo Multinational
Character Set [23]. And it it is also a glyph in the DG Special
Graphics Character Set [2].

Summary:
7 New glyphs, Range E0F0 to E0FF, 9 vacant.

9. UNFINISHED BUSINESS

The selection of characters presented in this draft is far from
comprehensive. Hundreds of other terminals from the past 30+ years are
likely to have glyphs or entire character sets covered neither here nor
in Unicode, and these might or might not be important in some application
somewhere. Readers are invited, therefore, to propose any needed
additions, bearing in mind that Unicode code space is not unlimited.

No attempt was made to account for the many Viewdata, Videotex, Minitel,
NAPLPS, or other mosaic graphics character sets. These should be tackled,
if appropriate, by someone who knows something about them.

Several character sets found in the references consulted are ignored here,
fully or in part, due to lack of motivation (nobody has ever asked us to
support them). Obviously these, and any other missing sets, can be
considered if there is a demand.

Siemens Nixdorf Facet
A set of 95 mosaic graphics, but not resembling any of the ISO Videotex
mosaic sets; difficult to describe.

Siemens Nixdorf Klammern
A set of 95 assorted blobs, bracket and brace pieces, clocks, arrows,
hourglasses, and Greek letters, some of which are unique; others can be
unified with existing Unicode characters or characters in this proposal.

Hewlett Packard Line Drawing
Mostly coincident with Unicode box-drawing set at U+2500, but with a
handful of unique characters, such as single-to-triple box intersections,
single-to-double intersections with wide spacing, etc. These should be
mappable to existing U+25xx glyphs without causing riots in the streets.

Hewlett Packard Big Character Pieces
Thick line segments for drawing large characters, used on the HP-2648.

And no doubt many more...

10. SUMMARY OF PROPOSED ADDITIONAL CHARACTERS

If all the proposed new characters are added to the UCS, this will enable
terminal emulators to fully handle at least the following terminal character
sets, which were not previously covered in full: