Conversion to C++ of the Andrew User Interface System

Abstract: The Andrew User Interface System--formerly called
the Andrew Toolkit--has now been converted to C++. This report describes
the advantages of AUIS in C++, the conversion process, and what we
have learned about C++ in the process. The conversion was possible
because AUIS was written in "Class-C," a set of conventions for object-oriented
coding in C. Our major problems were in resolving name conflicts,
providing for initialization, and supporting dynamic loading.

1. Introduction

The Andrew User Interface System (AUIS) [Palay, 1988] has been one
of the leading graphical user interface systems in the Unix (a trademark
of Unix Systems Laboratory, Inc.) and X Windows world since its introduction
at USENIX in 1988. However, acceptance has been limited somewhat because
it was coded in "Class-C," a special-purpose set of conventions for object-oriented
programming in C. Consequently, at the behest of its members, the
Andrew Consortium spent 1993 converting AUIS to C++. The conversion
effort skirted many traps, but succeeded and may even prove worth the
effort.

Some readers may recognize AUIS by the name Andrew Toolkit or ATK.
The Andrew Toolkit remains a component of AUIS and provides a compound-document
architecture and tools for building applications and new objects. [Borenstein,
1990; Sherman, 1991; Palay, 1992] However, AUIS is considerably more;
beyond ATK it includes complete editor applications for word processing,
source editing, drawings, equations, spreadsheets, fonts, preferences,
and more. The most elaborate application is the Andrew Message System,
a full-featured mail and bulletin board reading, composing and management
system; and one which is MIME compatible. AUIS is an open system
with the source code distributed under the X tape license.

A hallmark of AUIS is architectures for recursive embedding of objects,
which means that the object for one variety of information may be included
within that of another. Figure 1, for example, shows several types
of object embedded in a spreadsheet which is embedded in turn in a
document.

Figure 1. Recursively embedded objects. This screen dump shows
part of a document containing a spreadsheet whose cells have been combined
in various ways and contain (clockwise from upper left) text, equations,
animation, spreadsheet formulas, and a raster image.

Other toolkits are available for Unix and X, but none offer the scope
of Andrew. Motif, OpenLook, the Athena widget set, and other widget
sets provide "interactors" which each manage a small rectangle and provide
callbacks to the application when the user operates the interactor.
The newest and most publicized integrated user interface system is
the Object Linking and Embedding (OLE) component of Microsoft's Windows
version 3.1 [Microsoft, 1992]. This system supports recursive embedding
and even provides that embedded objects may execute in a separate process.
One user interface system that is already in Unix/X/C++ is the Fresco
project[Linton, 1993], which is based on Interviews [Linton, 1989]
and also on ideas from AUIS. Fresco is still far from complete. The
plan is that applications will not be part of Fresco, but will be created
by interested vendors.

2. Objects in the Andrew User Interface System

AUIS utilizes objects in several ways. Some objects in AUIS are "substrates"
in that they can contain other nested objects. The principal examples
are text, spreadsheets and drawings. Into these substrates it is possible
to embed any of these objects as well as the non-substrate objects:
rasters, images, equations, line animations, and a host of others.
In fact, most AUIS "applications" are themselves objects and can be
incorporated in a document or the screen image of another application.

Each visible AUIS object is implemented internally as two objects,
one derived from the "dataobject" class and the other from the "view" class.
Dataobjects retain the information to be displayed and are responsible
for reading/writing the information from/to a datastream. Views are
responsible for displaying the information from a dataobject within
a rectangle on the screen. They also handle interaction with the user
and printing. Splitting visible objects into two internal objects
has the advantage that there can be multiple views on a single data
object, as can happen when a document is viewed in two windows or when,
for instance, a spreadsheet is viewed as both a table and a pie chart.

The heart of ATK is its architecture for recursive embedding of objects.
In practice, the embedding is represented by a tree of view objects
where the window is the root and each object is the parent of those
it contains. The architecture defines methods on views that a parent
calls on the child to pass events and other methods which a child calls
on a parent to request future events. With these methods the parent
and child negotiate the sharing of resources such as screen space,
keyboard, mouse, menu, other user input devices, data stream space,
printed page space, execution time and memory, extension language interfaces,
and so on.

The HelloWorld application typically just displays HelloWorld and quits.
The simplest verion in AUIS, as shown in Figure 2, does much more:
displays on screen, prints, can be edited, and is avilable for cut/copy/paste.
Only a few more lines would be required to change "Hello" to bold-italic.

Figure 2. The helloworld application in the Andrew Toolkit.
The operations labeled "initialize objects" establish a text inset containing
"Hello, world!" Subsequently, other methods of the base class 'application'
will display the image and manage interation.

Unlike HelloWorld, typical applications display a variety of objects
scattered through a substrate widget such as a text, table, or drawing.
Figure 3 shows the code to place a single button within an Andrew
'layout' object; each additional object requires similar code.

Figure 3. Code to insert a pushbutton into a layout. The text
displayed in the button will be "Next Page". W is the layout width,
so the button will be in the upper right corner and occupy 91x40 units.

One facility of AUIS is an application called createinset which generates
the source code and help files for a new object with a given name.
Although this new object is functional, its real role is to be modified
to provide some new service. Such modification is usually easier than
creation of a new object from scratch. As part of the conversion to
C++, createinset was modified so it now produces code for C++ objects.

3. Advantages of C++

The primary advantage of C++ for AUIS is support of object-oriented
programming, which is the major programming paradigm for AUIS. In
addition, C++ offers a few facilities that seem likely to benefit
AUIS. We are already using inline procedures and we may be able to
utilize multiple inheritance in the near future.

Inline procedures are the appropriate conversion for many operations
that were done with preprocessor macros in C. In general, creating
a function from a macro may not be possible. However, many macros
in AUIS were simple functions to access or assign to object components;
the converter automatically transforms these to the appropriate in-line
code. The macros

Note that the types were automatically derived from the declaration
of name in the object declaration.

Multiple inheritance would have aided in several cases in the original
Andrew development. One obvious case is that of views and graphics.
Views are derived from a storage management hierarchy while graphics
are derived from a hierarchy intended to allow the system to utilize
multiple window systems. It was not practical to have eithe base class
derived from the other, and yet we wanted the convenience of doing
graphics to view objects. With multiple inheritance, we could have
defined a class that inherited from both bases; however, lacking it
we devised a macro-kludge wherein each method of a graphic is defined
as a macro function offered by views. One consequence is the need
to modify both classes for any changes to the graphics method. It
may be too late to revise the system to exploit multiple inheritance,
but other opportunities to do so will arise.

4. Converting to C++

In Class-C, header files were preprocessed from a syntax that permitted
declarations of methods and classprocedures, corresponding to C++'s
virtual member functions and static member functions. The conversion
to C++ was accomplished with two scripts written in Ness, the Andrew
extension and string processing language. [Hansen, 1990; Hansen, 1992]

In order to determine which function calls in the C code needed to
be converted to method calls in C++, the converter must determine which
methods are defined in each class in the original system. This information
is extracted from the original header files by one script, called C++Index.
The main script--called C++Conv--is then invoked with the names of
a collection of .c files. These and their corresponding header files
are converted to C++ by local syntactic transformations.

Processing of .c files affected primarily function declarations and
method calls. It was the convention in the old code that functions
were declared with names of the form class__name, that is, the class
name followed by two underlines and then the name of the specific function.
Such declarations were usually converted by changing the double underline
to a double colon: class::name. In addition, the declaration was changed
from old C with parameters declared after the right parenthesis to
ANSI C with declarations in-line within the header:

class::name (int length, ATK *altobj) .

Method calls were formerly disguised as function calls with the affected
object as the first argument:

class_name(obj, 3, NULL)

These were converted to the C++ form with the affected object preceding
the operator:

(obj)->name(3, NULL) .

Note that parentheses are always installed around the leading argument.
They are usually unnecessary, but having them there lets C++Conv avoid
checking to see if they are needed.

A crucial trick in the conversion simplified parsing: a first pass
converted every comment and string to a fixed length value containing
an index into a table. This meant that pattern match searches during
processing would find no spurious matches within comments or strings.
After processing, the fixed length values were re-expanded to their
original values.

The converter did not attempt global parsing of the C code because
this would only have been necessary to determine precise type information.
Instead, the compiler was employed as an adjunct to the converter
to find all the type errors. These were then corrected manually.
In many cases the correction was a revision of the code that went well
beyond what any converter could have done.

5. Problems posed by C++

The Class-C conventions provided an object oriented environment whose
features are largely a subset of those of C++, so conversion was more
straightforward than would be conversion of arbitrary C code. Nonetheless,
we faced numerous problems. Some of these were exacerbated by our
desire to continue utilizing dynamic loading as we have done in Class-C.
Dynamic loading is essential during system development since it drastically
reduces the time for the compile-link-test cycle. It is valuable in
production use because of the large number of objects implemented within
the system. If all objects were always linked with the entire system,
the system would be an enormous file and would take considerably longer
to load; moreover, installation of new objects would be more difficult.
As it is, users can have their own libraries of objects without requiring
a complete copy of the base system.

Where C++ lacked services, they have been implemented in a base class,
ATK, from which all Andrew Toolkit objects must derive. (Objects derived
from other bases or none at all can certainly be used, but they will
not have these services.)

Class initialization. In monolithic systems, the main program
can call on an initialization function in each class. In AUIS, the
main program is not necessarily aware of all classes that will be utilized
during an execution; it could be considerably wasteful to initialize
several hundred unused classes. (And it may be impossible for the
main program to initialize all dynamically loadable classes.) In Class-C,
each class had a class procedure InitializeClass that was automatically
called before any execution of code in the class. To emulate this
mechanism in C++, the convention is that a class is initialized the
first time one of its functions is called. Each constructor and static
member function must include as its first operation the statement:

ATKinit ;

methods do not need this statement because a constructor must have
been called in order for a method to be applicable. The initialization
routine called by ATKinit may cause initialization of other classes,
so all will be initialized before they are used. There is currently
no detection of circularity in initialization; the code is marked
as having been initialized before possibly initializing other functions.
(Perhaps something more stringent would be useful.)

We considered introducing file scope objects whose constructors would
implement the initialization for a class. However several current
C++ implementations construct all file scope objects before main is
called. Since ATK is a very large system we believed that lazy execution
of class initialization code was required.

Creation by type name. When an AUIS data stream is read in,
the type of each subordinate object is denoted with a character string
giving the name of the object class. The ATK class provides a static
member function ATK::NewObject whose argument is a character string
and whose value is an object of the named class.

In order to implement ATK::NewObject, each class of objects must be
registered. A class definition prepares for this by including in the
.C file a call on the ATKdefineRegistry macro. This macro creates a
table entry which is installed in a central table when the main program
or loader calls ATKregister for the class.

Object initialization. The Class-C method of initializing objects
was for each class to provide an initialization function with the following
signature:

boolean classname_InitializeObject(struct classname *self)

Obviously the return of a boolean indicating success or failure would
not map directly to the use of constructors in C++. (Since C++ constructors
have no return type.) The expected way for constructors to fail is
to throw an exception. However exceptions are not widely implemented
yet. Indeed, initially none of the C++ compilers available to us had
working exceptions. For the time being, InitializeObject methods are
converted to constructors, and the return statements are converted
to a macro which simply prints an error message if the value passed
indicates failure.

Header file incompatibilities. Header files in various compilers
and operating systems are incompatible with C++. For example, the
IBM AIX 3.1.5 header files used the keyword "new" as a parameter name.
The Cfront 2.1 and GNU C++ implementations we had available did not
remedy this situation. Moreover, some functions which aren't specified
by any standard simply had no prototype. One result was that we factored
out network socket code into C source files. We also adopted a coding
standard requiring inclusion of andrewos.h as the first included file.
This header file includes a number of standard system header files,
doing whatever is necessary to address any failings of the header files
with respect to C++ compatibility. The Class-C to C++ converter imposed
this standard on all converted source files.

Nested types. An attempt was made to use nested types. In particular
several classes used function pointers for callbacks, and it seemed
sensible to provide typedefs for these pointer types within the class
declaration. This approach was abandoned when we discovered the GNU
C++ compiler had several bugs in this area and the Cfront 2.1 compiler
didn't implement nested types at all. We settled for manual name scoping
of the typedefs by prepending classname_ to the typedefs.

Inheritance. In C++ all names in the scope of a class are inherited
by derived classes. In Class-C, however, only ordinary methods were
inherited; access to data members of base classes were via a "header"
member as in self->header.dataobject.id. Conversion led to some silent
name clashes between base and derived classes. Clashes with class
procedure names were harmless, since our original code would never
try to access the wrong version of the function. Data conflicts were
hazardous, however, since initially the converter lost the information
about which instance of the variable name was desired. Constructs
like self->header.dataobject.id were converted to self->id, so a
derived class version of id might be silently substituted for the base
class version. To resolve the problem, the converter was fixed to
retain the information; the example converts to this->dataobject::id.

Name scope. The introduction of nested types in some compilers
broke some code where structures, enums, and unions were defined inside
structures or classses, or where the first reference to a type was
within a struct or class. These nested types were now placed in the
scope of the struct or class, instead of in the global scope as before.
To avoid this problem the converter was modified to warn of type definitions
within structs and classes. Where these occurred we manually either
provided a forward declaration or moved the definition of the type
outside the class.

In Class-C each class has separate name spaces for member functions
(methods, macros and class procedures) and data. This led to a situations
where a function and a data member of a class had the same name. The
converter was extended to warn about these name conflicts and they
were resolved manually.

In C++ the names of data members are in the scope of member functions
effectively between file scope variables and arguments. This means
that an unqualified reference to a global variable could be silently
overridden by a class member of the same name. The converter resolved
this by utilizing :: to ensure that any potential conflicts would be
resolved automatically or result in a compiler error. In Class-C all
class data references were to structure members, so the converter added
:: before any name of a class member which was not preceded by '.'
or '->'. within the body of member functions. Unfortunately the case
of local variables shadowing class members also triggerred the addition
of ::, so a compiler detectable syntax error resulted in this case.
Avoiding this problem would have required full type information to
distinguish between local variable declarations and statements.

Conflicts also arose from the use of struct's and functions of the
same name, since the compiler thought the function call was a call
to the constructor. This problem was avoided by manually renaming
the structure or function where possible, and by making sure that a
prototype for the function was seen before the call.

Multiple inheritance. ATK is a "single root class" toolkit.
In Class-C the root class didn't exist as such, but there was a struct
basicobject, to which any object could be cast and still provide type
information. This proved adequate since Class-C supported only single
inheritance. When the ATK runtime system for C++ was designed an explicit
root class seemed the best solution. Now that we have started to look
at making it safe for client code to use multiple inheritance with
ATK classes, we have discovered some problems. If the root class ATK
is derived non-virtually, multiple instances will be included in each
derived class. In order to cast a derived class pointer up to an
ATK pointer in this case an explicit series of casts will be needed
to pick one of the ATK instances. It would then be impossible to cast
the pointer back to the original type without knowing the exact sequence
of casts used to create the ATK pointer. Not only is this dangerous,
but the current C++ definition makes it impossible if the derivation
from ATK is virtual. (We hope that RTTI will allow down casts from
a virtual base.)

Run-time systems. Class-C provides run time type information,
virtual constructors, and dynamic object construction by class name.
A separate section of this paper will address our design considerations
for dynamic loading in C++.

Run time type information for ATK in C++ (of the same sort as the RTTI
proposal before the C++ ANSI committee) is provided via a common base
class (ATK) and a class registry.

The root class ATK provides static methods to display a message or
throw an exception on failure of a constructor, create a new object
given a string representing the class name, compare two classes for
a base/derived class relationship by name, query by name whether a
given class has been registered, load a class by name, or register
a class. A single virtual function ATKregistry returns a pointer to
the ATKregistryEntry for the object's class. This function is implemented
by the ATKdefineRegistry macro described below. Other methods of
the ATK class provide for accessing the class name of an object, creating
a new object of the same class, and testing an object for a base/derived
relationship with another object or class.

An ATKregistryEntry structure for each class contains the class name,
a pointer to a function to create a new instance of the class, a pointer
to the class initialization function, a list of the parent classes,
and pointer to the next class in the registry. Currently the run time
system is limited to single inheritance. This is because the function
to create a new instance returns an ATK *, thus without compiler support
casting it down to the appropriate type would be impossible if the
class is derived from ATK multiply or virtually.

The ATKregistryEntry for a class is defined with the macro ATKdefineRegistry(classname,
baseclassname, classinitfunction) in the top level of the source file
implementing the class classname. The ATKregister(classname) macro
is used to enter the class in the class registry. Generally a source
file with a function containing the ATKregister calls is generated
automatically by a program, given a list of the desired classes and/or
libraries. The generated function is then called from the main() function
of the program.

Future work will probably include phasing out the use of the C++ ATK
run time type information system in favor of the ANSI standard support.
One particular feature the C++ ATK system lacks is the safety of the
proposed checked cast.

Dynamic loading. Class-C provides very flexible, on-demand
dynamic loading. The header file for a class is sufficient to compile
and run code utilizing it. During execution the class is dynamically
loaded when the code first executes a "class procedure" (constructor
or static member function) of the class. Another facility offered
is to create an object of a class from a C string giving the class's
name. With C++, dynamic loading is more difficult and less portable;
the tricks used in Class-C involved preprocessor definition of function
names, but static member functions are usually called with "::" qualifiers
and there is no good way to replace them with preprocessor magic.
In consequence, dynamic loading in the C++ version will be restricted
to loading a class given its string name. Methods can be applied to
objects of loaded classes only if they are virtual methods of a base
class linked with the system.

Weak vs. strong typing. AUIS code in Class-C is primarily based
on traditional, non-ANSI C without function protoypes. The C++ converter
automatically added prototypes and the C++ source was modified by hand
where compilation using these revealed type errors. Many such problems
occured with function pointers because the original code assumed that
function pointer values are interchangeable. For instance, the "proctable"
stores pairs: name and function pointer. The cannonical prototype
for these functions is (void foo(struct basicobject *self, long rock)).
However, sometimes these functions return values and often the rock
was a pointer. Casts were necessary in many places to force the code
to compile. Almost always the actual function was defined to take
a pointer to a derived class of view, for instance textview, figview,
rasterview, and so on. However, passing a derived object in this fashion
will break with multiple inheritance: it may not be possible for the
recipient function to access the passed object as a view if it derives
from other classes as well. We believe that the only truly type-safe
solution requires templates.

Memory management. Both of the main classes derived from class ATK
use reference counting. In consequence, such objects cannot be terminated
with the normal C++ delete operator. Instead they must apply the inherited
method Destroy. For the same reason, objects should not be declared
automatic. (Pointers to objects can be automatic.)

6. Performance

There should be little difference in performance between the C++ and
C versions of AUIS because much of the processing is straight C code
without method calls; and even where there are method calls, the code
generated should be similar in both cases. Differences did occur,
however, because two different compilers were used, a native compiler
for the C version and g++ for the C++ version. We coded three tests
in Ness, with test cases selected to exercise different aspects of
object oriented programming.

Test 1 - Count. This test counted down an integer variable
from twenty-thousand to zero. It made no method calls and thus measured
the quality of code created by the C or C++ compiler.

Test 2 - New. Three thousand objects were created. This measuresd
primarily the object creation mechanism, but some methods were called
during object initialization.

Test 3 - Dup(n). A string containing styled text was concatenated
with itself n times in the form
s := s ~ s This tested many method calls and object creations as the
styles were copied.

Results are reported for two platforms in Table 1, where all measurements
are in seconds. Several runs were made and the lowest value is reported
as being the least likely to have been affected by other processes.
In most cases the other values were within three percent of the lowest.

C

C++

PMAX/Ultrix

Count

1.08

1.26

New

1.04

0.80

Dup-6

1.26

1.41

Dup-8

4.57

30.32

RS6000/AIX3.2

Count

0.73

0.52

New

0.83

0.47

Dup-8

3.01

3.25

Table 1. Execution times for three tests (seconds).

The Count test shows that g++ produced faster code for the RS6000 and
slower code on the PMAX. The New test, however, shows that creating
objects is faster with C++. Possibly it is using a faster malloc package.
In both cases, the Dup test showed the C code faster. On the PMAX,
the parameter was reduced from eight to six since it seemed possible
that page thrashing accounted for the discrepancy. Even the lower
parameter, however, showed that the C code was faster.

7. Availability

As of February, 1994, there are TWO versions of the Andrew User Interface
System. Version 6.2 is the old AUIS version in C as it was released
to Andrew Consortium members in January, 1993 (although then numbered
5.2). It is freely available for exploitation both private and commercial.
The newer version, Version 7.1 in C++, is being distributed to members
of the Consortium for their use and will be released for general use
at a later date. To try AUIS from any internet workstation, give the
command
finger @atk.itc.cmu.edu

If you do acquire the Andrew User Interface System, you will find yourself
with an excellent environment for word processing, editing program
source text, and many other realms. You will also have the capability
to extend this environment in new and imaginative ways. If you do
the latter, we would be delighted to have you submit your work for
incorporation into the AUIS distribution so it can be enjoyed by all.