This paper proposes a possible design for the run-time object and class
layouts for the language described in Object-Oriented Extensions to
Pascal [1]. This language has a brief description in [2]. It is defined
to be an extension to the languages Pascal [3] and Extended Pascal [4]. A brief
and helpful description of the latter is found in [5]. In short, the Object
Extensions to Pascal (here abbreviated OOPascal) describes an object oriented
language with reference semantics for objects. The type system includes an
asymmetric form of multiple inheritance. In this form of inheritance, all
instantiated objects have a built in class Root as one of their
ancestors. Other classes (Property classes) from independently rooted trees
may be "mixed in" to the main (concrete) hierarchy to add functionality. The
author describes a simple design for object and class layout in the run-time
system of such a language. It should be emphasized at the beginning that this
is in no way to be interpreted as a prescribed method, and that implementors
are free to implement the abstract notion of a "reference variable" in any
appropriate way. It is assumed that the reader is familiar with the Object
Extensions to Pascal and is generally familiar with the main features of
Extended Pascal. Papers [2] and [5] form an elementary introduction and
reports [1] and [4] are definitive.

The Object-Oriented Extensions to Pascal is a hybrid object-oriented language
exhibiting multiple inheritance, reference semantics for objects, run-time type
inquiry, safe coercion, and a novel type view system. In contrast to C++ and
Eiffel, which have symmetric multiple inheritance systems, in OOPascal there
are two different kinds of classes, concrete and property. All instantiated
objects have as one of their types the class Root, which forms the
single root of the concrete part of the class structure. A class may be formed
using inheritance from many other classes but only one of them may be a
concrete class, and if the new class is to be a property class then none of the
parents may be concrete.

Classes defining properties of objects, but not objects themselves may be
defined and "mixed in" to the concrete hierarchy to provide services in a
uniform way. The property classes mixed in must all come from independently
rooted hierarchies. Thus, one may define a class List, derived from
Root, as a typical linked list structure and may define a property
class Sortable, providing element comparison methods. Then the class
Sortable_List may be defined using multiple inheritance from the
concrete class List and the Property class Sortable. Since
the expectation is that the features of a class defined in the concrete
hierarchy will be accessed more frequently than those from the property parts,
it is desirable to devise an implementation methodology that makes access to
the former as fast as possible.

Objects are referenced via "reference variables," which are similar in some
ways to pointers in that they imply indirect semantics and a "heap-like"
implementation. The implementation is not revealed to the programmer however,
and the notions of pointers and references are kept completely distinct. The
semantics of a reference variable is that it references the value "as a whole",
rather than a sub-object as is the case in C++ in some situations.

The language also provides an operator IS to query an object
as to its actual (polymorphic) type. We may thus inquire if anObject IS
List, and get a Boolean reply. We may also inquire if anObject IS
Sortable. In general these tests may only be performed at runtime as a
variable declared to be of a class type may hold a reference to a value of any
derived type.

We may also coerce a reference variable to a derived type. This is necessary
if, for example, our List class was declared to hold Root
objects and we have been inserting Expression objects into a value of
type List. When these objects are retrieved from the list the only
type known to the compiler is Root, but the programmer knows them to
be of type Expression. To regain the full protocol of
Expression for these objects the programmer may coerce them to type
Expression. The compiler will insert run-time checks as necessary to
guard against errors.

Finally, implementations may be complicated by the fact that we may define
variables to be of a property class type and objects having these properties
may be referred to by these variables. We may have a Sortable_List
object, myList, and a variable sortableItem of type
Sortable. The assignment

sortableItem := myList;

is
legal.

The above features detail the main features complicating the implementation.
On the other hand, the builder is aided by the fact that the notion of a
reference variable, which is the only means of accessing a value of a class
type, is completely abstract. The implementation of reference is not revealed
to the programmer, unlike the case with pointer variables, which are otherwise
similar. Thus the implementor is free to use whatever means are required to
represent the features of objects.

One of the major goals in the run-time design of object-oriented languages is
to minimize the time required to access object features, both the data elements
and the methods. If objects utilize reference semantics then the best that can
be expected in general is a fixed (compile-time) offset from a dereferenced
pointer. Many systems employ more elaborate, albeit slower, mechanisms. In
particular, multiple-inheritance complicates the design of the run-time system
as it is difficult to place an object feature at a particular offset and
guarantee that the offset will be unchanged in objects instantiated from
subclasses.

In seeking a simple implementation strategy, we would like to use fixed,
compile-time offsets as much as possible. This is the typical mechanism with
ordinary records, though the presence of schemata in Extended Pascal
complicates this to some extent. Since schemata permit structures to have a
size that is known only at run-time, some implementations do not use fixed
offsets for all record structures, but employ run-time computations via
something like a dope vector. It is, however, possible to put the dope vectors
at fixed offsets from some base, which might, in extreme cases, need to be
computed at runtime.

In any case, we would like to provide a solution that does not employ searches
of run-time records with associated comparisons made to guide further steps of
the search. While it may be necessary to traverse run-time links, we would
like to do so only when the number of steps along the chain is known to the
compiler. While we must, of course, make simple comparisons to do so, we need
not fetch information on which to base the comparisons.

As mentioned in Section 2, it is also desirable to optimize access to features
defined in the concrete part of the hierarchy, perhaps at the expense of
slowing down access to features defined in property classes. The paper [6] by
Brown describes a scheme for implementation of symmetric multiple inheritance,
in which the language does not guide the implementor in which of the parents of
a class is more important. Similarly, [7] by Stroustrup, gives implementation
details for multiple inheritance in C++. Here we seek something more
specialized.

I will make the following assumptions concerning the implementation as well as
the probable style of programming in OOPascal. First, it is assumed that, in
general, a given object will be referenced by several distinct variables
(aliases), but most of these variables will be declared to have concrete class
types with relatively few property class variables. Second, the general style
of programming will put most of the features of most classes into the concrete
part of the hierarchy, rather than the property parts. Third, I assume that
the dynamic storage used to hold object values is such that values will not
physically move during a single run of the program. This last assumption is,
of course, not met for generational garbage collectors and some heap management
systems, but relatively simple modifications of the following may be made in
these cases. Note that garbage collection is not mandated by the language
extensions, though it may be implemented if desired.

Consider the Example 1, which is a simple, abstract, case that will be used
and extended to illustrate the implementation strategy. We assume that
property class P and concrete class A have been previously declared. Note that
we have not listed instance variable types nor method parameters.

Also, while we have used only procedures, methods may be functions as
well. Note that the class AA inherits a constructor create
from Root. We also assume that it inherits a method method_a
from its concrete parent A and that P declares
method_p. Finally, we assume that class A has been constructed
without use of any property classes anywhere in its hierarchy.

The solution here employs two types of run-time records. The first
(object-records) represents the objects themselves. There is one or several
such records for each object, one for each independent hierarchy of which its
class is composed. If an object is created from a class that employs multiple
inheritance, then the object's run time record is composed of several segments,
linked together. The first segment represents all instance variables, but not
the methods, derived from the concrete part of the hierarchy and the others
represent the various property classes.

The second type of run time record (class-records) represents run-time
information associated with classes themselves. These class records hold
pointers to the code implementing methods (including constructors and
destructors) of the class as well as a pointer to the record representing one
of the parents of the class. If the class represented is a concrete class this
pointer is to the unique concrete parent. If it is a property class then this
may be to any of its parents. The layout of each record is fixed by the
compiler and does not vary at runtime. Likewise the links between the records
do not vary at runtime for any given object. The class-records logically exist
for the life of a run-unit. Object-records are created when a concrete class
is sent a constructor message as in

myList := List.create;

They
become undefined when the object is sent a destructor message as:

myList.destroy;

The first segment of the object-record also contains a pointer to the concrete
class of which the object is a member (its own-class), and a pointer to one of
the property classes which was mixed-in to form the own-class.

Each other segment of the object-record chain contains a pointer to a property
class associated with that segment as well as a pointer to the next segment of
the object-record.

Note that in this scheme it is not necessary to link the class-records together
except for the parent links. It is also unnecessary to duplicate most
class-records. Thus, in general there will be only one record for each class.
This is complicated, however, by the ability to override property class
methods, as will be shown below in Section 4.3.

See Figure 1 for a possible layout for Example 1. Note that the fact that
class A and property class PP were mixed to from class
AA is not directly represented in the class records. It is, however
represented in the pointers from the object records of the object
anObject. In this scheme, the inherited method_a occupies a
slot at the same offset in the class records for both A and
AA. This is similar for method_p in the class records.

In OOPascal reference variables may be declared to have either a concrete
class type or a property (or abstract) class type. We require different
implementations of the references in the case of a concrete and abstract
reference from that of a property reference. In the latter case a reference is
more than a simple pointer.

The (concrete) reference variable anObject is represented by a
pointer variable to the first object-record. Since the compiler has knowledge
of the layout, it knows that field_a is at a certain offset in that
first segment and that field_p, which was inherited from property
class P is in some fixed offset in the second segment. Likewise the
compiler knows that a pointer to code for method_a may be found by
following a pointer to the class record and then applying a fixed offset and
that for method_q may be found by a similar computation from the
second object-record.

Note that field_a will always occupy the same slot in its
object-record, as will method_q. Also, in any class derived from
class AA, the segment for class PP will always occupy the
second place.

Variables referencing property classe values are somewhat more complex. Here we
represent them with a pair of pointers, with the first (object-pointer)
pointing to the first object-record of the object as a whole. The other
pointer (property-pointer) points to that object-record segment that represents
the property class in question: namely the property class whose type is the
declared type of the reference variable. We note that the actual type may have
a more refined type than this declared type, so that if aProperty was
declared to have type P, the second pointer would still point to the
same location shown here. In some circumstances, in which the property class
is formed from multiple inheritance of property classes, the picture is
somewhat more complex as the class in question will require several segments,
with the first representing the property class as a whole, and the other
representing various of the properties mixed in to form it. This is discussed
later.

It is important to note a few things about this layout. First, it is possible
that a smart linker could decide, having seen all of the components of a
run-unit, that some classes are never instantiated, and strip their
class-records from the executable. It is also possible to decide in certain
cases that the links to property records can be reduced to offsets and so avoid
the chasing of links implied by the above. This is hard for the compiler,
however, because of the possibility of cross-module references. In some cases,
as when a class is not exported, the compiler itself may make such
optimizations.

From Figure 1 many of the properties of inheritance may be derived. First,
when fields are added in a new class declaration, they are appended to the
object-records of the parent class. When methods are added, pointers to the
code are appended to the class-records of the parent class. When a new class
is mixed in, a new object-segment is appended to the far end of the chain of
object segments required for objects of the new type. Since all methods are
polymorphic (or "virtual") when a method is overridden, we must replace the
pointer to code in the corresponding slot of the parent class with a new
pointer in the class of the derived class record. There are some subtleties,
however, as shall be shown.

Example 2 and the corresponding Figure 2 show the effect of such simple
inheritance, in which a new class is derived from AA and a new
property is mixed in. Also shown here is an override of a method of the
concrete hierarchy. Overriding property class methods is somewhat more
complex. In this example, the property class R is defined starting a
new property hierarchy, and AAA mixes it in with AA.

As we see, the layouts implied by this have new fields in the first
segment, a new method in the class-record, and a replacement (indicated by *)
in the method_a slot in that class-record. Likewise, a new property
segment is required in the object-record chain. This segment contains the
instance variables declared in the new property class, as well as a pointer to
the property class itself so that methods may be accessed. Note that since
property PP is still an ancestor of this new class, the object-record
representing PP is still second in the chain. Thus, a reference
variable of type AA may refer to an object of this new type without
modification of the access protocols established above for class AA.

As mentioned above, the situation is more complex when property class methods
are overridden. In particular, a class-record representing a property class
may have to be replicated to hold the pointer to the new method body. This is
illustrated in Example 3 and Figure 3. Here the situation is as in Example 2,
except that we also wish to override method_p which was inherited
(indirectly) from PP.

In figure 3 we show both a reference to an object of type AA and to
one of type AAA. Note that the class-record for class PP has
to be duplicated in the run-time system to be able to hold a pointer to the new
method body. Again, the knowledge that this must be done resides in the
compiler, and is discovered when the override of a property method appears in
the class declaration of AAA. This new class-record does not
represent a new property class, but only a segment of a new concrete class.
However, the duplicated PP record points to the original PP record as
if it were its parent. This is required for type inquiry, when the program
tests if anAAAObject IS PP.

Notice importantly, that we may rely on the declared type of a reference
variable to define the access to all features accessible via that reference.
We need not know the run-time value to which the variable refers, relying on
the fact that if it refers to a value of a type other than the declared type of
the reference, it must be a subtype of it, and therefore uses the same layouts
as far as defined in all directions by the class of this reference.

Next we note that although we have been describing the above in which one of
the classes mixed in is a concrete class the same layouts are implied if all of
the classes are property classes. The only difference is that the resulting
structure does not describe any object as a whole, since property classes are
not instantiable, but only describes a sequence of consecutive segments of
object-records in objects of classes into which this property class structure
is mixed. In either case, there is exactly one object-record per hierarchy
used to form the own-class of the reference.

It should also be obvious to implementors, that there is no need to allocate
the various object-records making up an object in separate operations. In fact
it is desirable to do so all at once. Thus, Figure 4, describes a more typical
physical layout. Note that we still need pointers to the various segments
(unless an optimizer can determine that they are to the same offsets in all
instantiated objects) because the same property class may be mixed in to
several different concrete classes, and the several property classes mixed in
to different parts may be in different orders. Therefore, it is not possible
in general to associate a fixed offset in this structure with a given property
class variable, except relative to its segment.

Within a method of a class, the object in control of the process is
represented by a protected (implicit) parameter self. Protection of

the parameter implies that the reference may not be changed to another object.
If the class of the method is a concrete class, then the parameter may be
represented by a pointer to the first segment of the object-record of the
object that receives this message at run time. For a property class, however,
the reference to self is a pair of pointers as is the case for other property
class references. The property-pointer of this pair must be set by the caller
of the method, to point to the segment representing the property class in which
this method was originally declared rather than the declared type of the
reference variable mentioned in the message statement or expression. If this
is a property class variable, then an adjustment may need be made. If it is a
concrete class variable, then a property-pointer will need to be generated.

If we assume that objects do not move in the run-time system, type inquiry
may be implemented by pointer comparison between some fixed addresses and the
class pointers in each object. The same may be done with inquiry as to
property classes. Again, the layouts permit the declared types of the
variables to be used, rather than the run-time values to which they may refer.
Note that since an object is a member of the ancestor types of its own-class,
it is necessary to search parent chains when seeking a match.

If objects must be permitted to move during a computation, a slight refinement
maintains a pointer to each class-record, which must be updated when the
class-records move. Thus, again, we can implement type inquiry with simple
pointer comparisons.

Type coercion is also quite simple for references of concrete class type,
relying on the declared type of the reference variable being coerced and the
type to which it is being coerced. If they both represent concrete classes,
then nothing need be done except insert a run-time check that the class of the
reference is indeed the class of the coercion. If one of the references is
concrete and the other not, then we either throw-away the property-pointer, or
generate one from knowledge of the types of the two references.

When both references are property classes the situation is a bit more complex
in that the property-pointer must be adjusted. This adjustment could cause a
property-pointer to move either way in its chain. When we assign a reference
of a derived property class to one of one of its ancestors, we must move the
property pointer down the chain a fixed number of steps, depending on the two
types. Coercion in the opposite direction is only legal when one property
class has been derived from others. References of the ancestor classes may be
coerced to the derived class(but not to one another, as they may come from
different hierarchies). If the coercion is legal the compiler knows from its
defined layouts how many steps along the object-record chain must be stepped to
obtain a pointer to the object-record of the coerced type. Unfortunately these
steps are upward along the chain and as we have not doubly linked the chain, we
need to traverse it from the origin, for which we have the object-pointer.
These steps and the result type must be guarded with run-time checks, however,
since a downward reference may always be made in error.

Run-time sized schemata complicate things only a bit. If we assume that dope
vectors may be used to describe the actual types, and that these dope vectors
may have fixed size, then these may be placed where we have indicated instance
variables. The data in the dope vectors may then be used to allocate storage
for the indicated fields, either at the end of the object-segments indicated
above, or elsewhere in the heap or stack.

Here we have outlined a simple object layout model for the Object Extensions
to Pascal [1]. While not optimal, it is relatively simple to implement and
makes access to the main features of a class (the concrete features) as fast as
is typical in single inheritance systems. This model may be modified slightly
or drastically to meet the needs of a specific computing environment.

I would like to thank Tony Hetherington and Tom Turba for reading and
commenting on an earlier draft of this paper. Tony points out that two
implementations of Extended Pascal (his own at Prospero Software Ltd. for PC's,
and that of Digital Equipment Corp. for workstation and larger systems) use
slightly different layouts for schemata than suggested here, which may
complicate this design slightly.