Yet another Forth objects package

After criticizing the Neon model in the last
issue, here I present (and expose to criticism) a model that I
find better, and its implementation. Its properties (most are
advantages IMO) are:

It is straightforward to pass objects on the stack. Passing
selectors on the stack is a little less convenient, but possible.

Objects are just data structures in memory, and are referenced by
their address. You can create words for objects with normal defining
words like constant. Likewise, there is no difference
between instance variables that contain objects and those
that contain other data.

Late binding is efficient and easy to use.

It avoids parsing, and thus avoids problems with state-smartness
and reduced extensibility; for convenience there are a few parsing
words, but they have non-parsing counterparts. There are also a few
defining words that parse. This is hard to avoid, because all standard
defining words parse (except :noname); however, such
words are not as bad as many other parsing words, because they are not
state-smart.

It does not try to incorporate everything. It does a few things
and does them well (IMO). In particular, I did not intend to support
information hiding with this model (although it has features that may
help); you can use a separate package for achieving this.

It is layered; you don't have to learn and use all features to use
this model. In particular, the features discussed after the Section
Programming Style are are optional and independent of each
other.

An implementation in ANS Forth is available.

I have used the technique, on which this model is based, for
implementing Gray [ertl89][ertl97]; we have also used this technique in
Gforth.

Why object-oriented programming?

Often we have to deal with several data structures (objects),
that have to be treated similarly in some respects, but differ in
others. Graphical objects are the textbook example: circles,
triangles, dinosaurs, icons, and others, and we may want to add more
during program development. We want to apply some operations to any
graphical object, e.g., draw for displaying it on the
screen. However, draw has to do something different for
every kind of object.

We could implement draw as a big CASE
control structure that executes the appropriate code depending on the
kind of object to be drawn. This would be not be very elegant, and,
moreover, we would have to change draw every time we add
a new kind of graphical object (say, a spaceship).

What we would rather do is: When defining spaceships, we would tell
the system: "Here's how you draw a spaceship; you figure
out the rest."

This is the problem that all systems solve that (rightfully) call
themselves object-oriented, and the object-oriented package I present
here also solves this problem (and not much else).

Terminology

This section is mainly for reference, so you don't have to understand
all of it right away. I (mostly) use the same Smalltalk-inspired
terminology as [mckewan97]. In short:

class

a data structure definition with some extras.

object

an instance of the data structure described by the class definition.

instance variables

fields of the data structure.

selector

(or method selector) a word (e.g.,
draw) for performing an operation on a variety of data
structures (classes). A selector describes what operation to
perform. In C++ terminology: a (pure) virtual function.

method

the concrete definition that performs the operation
described by the selector for a specific class. A method specifies
how the operation is performed for a specific class.

selector invocation

a call of a selector. One argument of the
call (the TOS (top-of-stack)) is used for determining which method is used. In Smalltalk
terminology: a message (consisting of the selector and the other
arguments) is sent to the object.

receiving object

the object used for determining the method
executed by a selector invocation. In our model it is the object that
is on the TOS when the selector is invoked. (Receiving comes
from Smalltalks message terminology.)

child class

a class that has (inherits) all
properties (instance variables, selectors, methods) from a parent
class. In Smalltalk terminology: The subclass inherits from the
superclass. In C++ terminology: The derived class inherits from the
base class.

Basic Usage

This code defines a class graphical with an
operation draw. We can perform the operation
draw on any graphical object, e.g.:

100 100 t-rex draw

where t-rex is a word (say, a constant) that produces a
graphical object.

How do we create a graphical object? With the present definitions,
we cannot create a useful graphical object. The class
graphical describes graphical objects in general, but not
any concrete graphical object type (C++ users would call it an
abstract class); e.g., there is no method for the selector
draw in the class graphical.

For concrete graphical objects, we define child classes of the
class graphical, e.g.:

Here we define a class circle as a child of
graphical, with a field circle-radius (which
behaves just like a field in the structure
package); it defines new methods for the selectors
draw and construct (construct
is defined in object, the parent class of
graphical).

Now we can create a circle on the heap (i.e.,
allocated memory) with

50 circle heap-new constant my-circle

heap-new invokes construct, thus
initializing the field circle-radius with 50. We can draw
this new circle at (100,100) with

100 100 my-circle draw

Note: You can invoke a selector only if the object on the TOS
(the receiving object) belongs to the class where the selector was
defined or one of its descendents; e.g., you can invoke
draw only for objects belonging to graphical
or its descendents (e.g., circle). Immediately before
end-class, the search order has to be the same as
immediately after class.

The object class

When you define a class, you have to specify a parent class. So how
do you start defining classes? There is one class available from the
start: object. You can use it as ancestor for all
classes. It is the only class that has no parent. It has two
selectors: construct and print.

Creating objects

You can create and initialize an object of a class on the heap with
heap-new ( ... class -- object ) and in the dictionary
(allocation with allot) with dict-new (
... class -- object ). Both words invoke construct, which
consumes the stack items indicated by "..." above.

If you want to allocate memory for an object yourself, you can get its
alignment and size with class-inst-size 2@ ( class --
align size ). Once you have memory for an object, you can initialize
it with init-object ( ... class object -- );
construct does only a part of the necessary work.

Programming style

This section is not exhaustive.

In general, it is a good idea to ensure that all methods for the
same selector have the same stack effect: when you invoke a selector,
you often have no idea which method will be invoked, so, unless all
methods have the same stack effect, you will not know the stack effect
of the selector invocation.

One exception to this rule is methods for the selector
construct. We know which method is invoked, because we
specify the class to be constructed at the same place. Actually, I
defined construct as a selector only to give the users a
convenient way to specify initialization. The way it is used, a
mechanism different from selector invocation would be more natural
(but probably would take more code and more space to explain).

Class Binding

Normal selector invocations determine the method at run-time depending
on the class of the receiving object (late binding).

Sometimes we want to invoke a different method. E.g., assume that
you want to use the simple method for printing
objects instead of the possibly long-winded
print method of the receiver class. You can achieve this
by replacing the invocation of print with

[bind] object print

in compiled code or

bind object print

in interpreted code. Alternatively, you can define the method with a
name (e.g., print-object), and then invoke it through the
name. Class binding is just a (often more convenient) way to achieve
the same effect; it avoids name clutter and allows you to invoke
methods directly without naming them first.

A frequent use of class binding is this: When we define a method
for a selector, we often want the method to do what the selector does
in the parent class, and a little more. There is a special word for
this purpose: [parent]; [parent]
selector is equivalent to [bind] parent
selector, where parent is the parent
class of the current class. E.g., a method definition might look like:

:noname
dup [parent] foo \ do parent's foo on the receiving object
... \ do some more
; overrides foo

[mckewan97] presents class binding as an optimization technique. I
recommend not using it for this purpose unless you are in an
emergency. Late binding is pretty fast with this model anyway, so the
benefit of using class binding is small; the cost of using class
binding where it is not appropriate is reduced maintainability.

While we are at programming style questions: You should bind
selectors only to ancestor classes of the receiving object. E.g., say,
you know that the receiving object is of class foo or its
descendents; then you should bind only to foo and its
ancestors.

Method conveniences

In a method you usually access the receiving object pretty often. If
you define the method as a plain colon definition (e.g., with
:noname), you may have to do a lot of stack
gymnastics. To avoid this, you can define the method with m:
... ;m. E.g., you could define the method for
drawing a circle with

m: ( x y circle -- )
( x y ) this circle-radius @ draw-circle ;m

When this method is executed, the receiver object is removed from the
stack; you can access it with this (admittedly, in this
example the use of m: ... ;m offers no advantage). Note
that I specify the stack effect for the whole method (i.e. including
the receiver object), not just for the code between m:
and ;m. You cannot use exit in
m:...;m; instead, use
exitm. Moreover, for any word that calls
catch and was defined before loading
objects.fs, you have to redefine it like I redefined
catch:

: catch this >r catch r> to-this ;

You will frequently use sequences of the form this
field (in the example above: this
circle-radius). If you use the field only in this way, you can
define it with inst-var and eliminate the
this before the field name. E.g., the circle
class above could also be defined with:

radius can only be used in circle and its
descendent classes and inside m:...;m.

You can also define fields with inst-value, which is
to inst-var what value is to
variable. You can change the value of such a field with
[to-inst]. E.g., we could also define the class
circle like this:

Names and Scoping

Inheritance is frequent, unlike structure extension. This exacerbates
the problem with the field name convention: One always has to remember
in which class the field was originally defined; changing a part of
the class structure would require changes for renaming in otherwise
unaffected code.

To solve this problem, I added a scoping mechanism (which was not
in my original charter): A field defined with inst-var is
visible only in the class where it is defined and in the descendent
classes of this class. Using such fields only makes sense in
m:-defined methods in these classes anyway.

This scoping mechanism allows us to use the unadorned field name,
because name clashes with unrelated words become much less likely.

Once we have this mechanism, we can also use it for controlling the
visibility of other words: All words defined after
protected are visible only in the current class and its
descendents. public restores the compilation
(i.e. current) wordlist that was in effect before. If you
have several protecteds without an intervening
public or set-current, public
will restore the compilation wordlist in effect before the first of
these protecteds.

Interfaces

In this model you can only call selectors defined in the class of the
receiving objects or in one of its ancestors. If you call a selector
with a receiving object that is not in one of these classes, the
result is undefined; if you are lucky, the program crashes
immediately.

Now consider the case when you want to have a selector (or several)
available in two classes: You would have to add the selector to a
common ancestor class, in the worst case to object. You
may not want to do this, e.g., because someone else is responsible for
this ancestor class.

The solution for this problem is interfaces. An interface is a
collection of selectors. If a class implements an interface, the
selectors become available to the class and its descendents. A class
can implement an unlimited number of interfaces. For the problem
discussed above, we would define an interface for the selector(s), and
both classes would implement the interface.

As an example, consider an interface storage for
writing objects to disk and getting them back, and a class
foo foo that implements it. The code for this would look
like this:

Note that you cannot use protected in an interface; and
of course you cannot define fields.

In the Neon model, all selectors are available for all classes;
therefore it does not need interfaces. The price you pay in this model
is slower late binding, and therefore, added complexity to avoid late
binding.

Persistence

The internal data of classes and interfaces are stored in the
dictionary, so they survive the usual system-saving and reloading
mechanisms of Forth systems. For objects, it's up to you to save them
in the dictionary if you want them to survive.

Implementation

An object is a piece of memory, like one of the data structures
described with struct...end-struct. It has a field
object-map that points to the method map for the object's
class.

The method map (this is Self
[chambers&ungar89] terminology; in C++ terminology: virtual function
table) is an array that contains the execution tokens (XTs) of the
methods for the object's class. Each selector contains an offset into
the method maps.

selector is a defining word that uses
create and does>. The body of the
selector contains the offset; the does> action for a
class selector is, basically:

( object addr ) @ over object-map @ + @ execute

Since object-map is the first field of the object, it
does not generate any code. As you can see, calling a selector has a
small, constant cost.

A class is basically a struct combined with a method
map. During the class definition the alignment and size of the class
are passed on the stack, just as with structs, so
field can also be used for defining class
fields. However, passing more items on the stack would be
inconvenient, so class builds a data structure in memory,
which is accessed through the variable
current-interface. After its definition is complete, the
class is represented on the stack by a pointer (e.g., as parameter for
a child class definition).

At the start, a new class has the alignment and size of its parent,
and a copy of the parent's method map. Defining new fields extends the
size and alignment; likewise, defining new selectors extends the
method map. overrides just stores a new XT in the method
map at the offset given by the selector.

Class binding just gets the XT at the offset given by the selector
from the class's method map and compile,s (in the case of
[bind]) it.

I implemented this as a value. At the
start of an m:...;m method the old this is
stored to the return stack and restored at the end; and the object on
the TOS is stored TO this. This technique has one
disadvantage: If the user does not leave the method via
;m, but via throw or exit,
this is not restored (and exit may
crash). To deal with the throw problem, I have redefined
catch to save and restore this; the same
should be done with any word that can catch an exception. As for
exit, I simply forbid it (as a replacement, there is
exitm).

inst-var is just the same as field, with
a different does> action:

@ this +

Similar for inst-value.

Each class also has a wordlist that contains the words defined with
inst-var and inst-value, and its protected
words. It also has a pointer to its parent. class pushes
the wordlists of the class an all its ancestors on the search order,
and end-class drops them.

An interface is like a class without fields, parent and protected
words; i.e., it just has a method map. If a class implements an
interface, its method map contains a pointer to the method map of the
interface. The positive offsets in the map are reserved for class
methods, therefore interface map pointers have negative
offsets. Interfaces have offsets that are unique throughout the
system, unlike class selectors, whose offsets are only unique for the
classes where the selector is available (invokable).

This structure means that interface selectors have to perform one
indirection more than class selectors to find their method. Their body
contains the interface map pointer offset in the class method map, and
the method offset in the interface method map. The
does> action for an interface selector is, basically:

The data structure created by this code (including the data structure
for object) is shown in the figure, assuming a cell size of
4.

Related Work

For a comparison with the Neon model you just have to compare the
properties of the Neon model presented in the
last issue with the properties presented here.

Another well-known publication is [pountain87]. However, it is not
really about object-oriented programming, because it hardly deals with
late binding. Instead, it focuses on features like information hiding
and overloading that are characteristic of modular languages like Ada
(83).

There are also many other papers on object-oriented Forth
extensions; E.g., [rodriguez&poehlman96] lists 17 and [mckewan97] lists
6. In the rest of this section I will discuss two systems that have
the implementation using method maps in common with the package
discussed here.

The model of [zsoter96] makes heavy use of an active object (like
this in my model): The active object is not only used for
accessing all fields, but also specifies the receiving object of every
selector invocation; you have to change the active object explicitly
with { ... }, whereas in my model it changes more or less
implicitly at m: ... ;m. Such a change at the method
entry point is unnecessary with the [zsoter96] model, because the
receiving object is the active object already; OTOH, the explicit
change is absolutely necessary in that model, because otherwise no one
could ever change the active object.

The model of [paysan94] combines
information hiding and overloading resolution (by keeping names in
various wordlists) with object-oriented programming. It sets the
active object implicitly on method entry, but also allows explicit
changing (with >o...o> or with
with...endwith). It uses parsing and state-smart objects
and classes for resolving overloading and for early binding: the
object or class parses the selector and determines the method from
this. If the selector is not parsed by an object or class, it performs
a call to the selector for the active object (late binding), like [zsoter96]. Fields are always accessed through
the active object. The big disadvantage of this model is the parsing
and the state-smartness, which reduces extensibility and increases the
opportunities for subtle bugs; essentially, you are only safe if you
never tick or postpone an object or class.

Acknowledgments

Marcel Hendrix provided helpful comments on the paper. Andras Zsoter
and Bernd Paysan helped me with the related works section.

Glossary

bind ( ... "class" "selector" -- ... )

execute the method for selector in class.

<bind> ( class selector-xt -- xt )

xt is the method for the selector
selector-xt in class.

bind' ( "class" "selector" -- xt )

xt is the method for selector in class.

[bind] ( compile-time: "class" "selector" -- ; run-time: ... -- ... )

compile the method for selector in class.

class ( parent-class -- align offset )

start a
new class definition as a child of parent-class.
align offset are for use by field etc.

class->map ( class -- map )

map
is the pointer to class's method map; it points to the
place in the map to which the selector offsets refer (i.e., where
object-maps point to).

class-inst-size ( class -- addr )

used as class-inst-size 2@ ( class -- align size ),
gives the size specification for an instance (i.e. an object) of
class.

class-override! ( xt sel-xt class-map -- )

xt is the new method for the selector
sel-xt in class-map.

construct ( ... object -- )

initializes the data
fields of object. The method for the class
object just does nothing ( object -- ).

current' ( "selector" -- xt )

xt is the method for selector in the current class.

[current] ( compile-time: "selector" -- ; run-time: ... -- ... )

compile the method for selector in the current class.

current-interface ( -- addr )

this variable contains the class or interface currently being
defined.

dict-new ( ... class -- object )

allot and initialize an
object of class class in the dictionary.

drop-order ( class -- )

drops class's wordlists from the search
order. No checking is made whether class's wordlists are actually
on the search order.

end-class ( align offset "name" -- )

name execution: -- class

ends a class definition. The resulting class is class.

end-class-noname ( align offset -- class )

ends a class definition. The resulting class is class.

end-interface ( "name" -- )

name execution: -- interface

ends an interface definition. The resulting interface is
interface.

end-interface-noname ( -- interface )

ends an
interface definition. The resulting interface is
interface.

exitm ( -- )

exit from a method; restore old this.

heap-new ( ... class -- object )

allocate and initialize an object of class class.

implementation ( interface -- )

the current class
implements interface. I.e., you can use all selectors of the
interface in the current class and its descendents.

init-object ( ... class object -- )

initializes a
chunk of memory (object) to an object of class
class; then performs construct.

inst-value ( align1 offset1 "name" -- align2 offset2 )

name execution: -- w

w is the value of the field name in
this object.

inst-var ( align1 offset1 align size "name" -- align2 offset2 )

name execution: -- addr

addr is the address of the field name in
this object.

interface ( -- )

starts an interface definition.

;m ( colon-sys -- ) ( run-time: -- )

end a method definition; restore old this.

m: ( -- xt colon-sys ) ( run-time: object -- )

start a method definition; object becomes new this.

method ( xt "name" -- )

name execution: ... object -- ...

creates selector name and makes xt its method in the current class.

object ( -- class )

the ancestor of all classes.

overrides ( xt "selector" -- )

replace default
method for selector in the current class with
xt. overrides must not be used during an
interface definition.