C With (Object) Prototypes

Introduction

When talking about prototypes, especially in C, the first thing that comes to mind is function prototypes,
which are those little function declarations found in header files or at the beginning of a file.
They act as a “forward declaration” for the function, leaving the definition for later in the file
or even for a separate file (it's how shared libraries work, placing prototypes in header files and
the implementations in the shared object files.)

There is another type of prototypes, though: object prototypes. This kind of prototypes is actually very far from C,
since it's an object-oriented paradigm. Actually, even among object-oriented languages, prototypes doesn't seem to be much known
(or, at least, they used to be fairly unknown. I didn't really research if things have changed.)

In this article I'll briefly go over what object prototypes are, and then provide an implementation for the C language
(thus giving C an object-oriented interface.)

Object Prototypes

What is an object prototype? It's an object in the “object-oriented” sense, however,
the object is not created from a class, but from an object. In practice, it means that instead of
defining a class and creating instances of that class (the objects), an already existing object is
cloned and then the various methods and properties changed accordingly.

To illustrate this with an example, using classes one would write something like (it's pseudocode):

The Example class defines some methods to get and set a value;
the Example2 class subclasses the Example class and overrides the
getA method to return the same value incremented by 2. Example2's constructor calls super
to initialize the a variable.

Using object prototypes, one would write something like this instead (again pseudocode):

A more functional-like approach was taken, but it's only to be brief. As I'll show later,
it's possible to define methods using C's standard features.

The example clearly shows the difference: classes are static definitions and objects are created from them.
Prototypes are dynamic entities used to create other objects, which can then be used as prototypes
for other objects, and so on.

Searching the web, there are definitely better and more detailed explanations, but this is enough
to understand the rest of this article.

Object-Oriented C

During the years since C's inception, there have been many attempts at adding an object-oriented interface to the language.

Of course, there's C++; Objective-C; the less known ooc
(not to be confused with the other ooc!);
the GTK version, GObject; and even that one cousin no one wants to talk about, Xt.

All of these share one common trait: their objects are defined through the use of classes.

This article will define CWP, an object-oriented interface based on prototypes.

The Interface

Looking at the example above, we can see that we need the following features: a way to define properties
(Example.a = 0); a way to define methods (Example.getA = function ...);
a way to access properties (this.a); and a way to call methods (Object.clone()).

Some prototype-based languages allow to define methods as properties of the object,
but C isn't as permissive for reasons explained later on, so “properties”
and “methods” are treated as separate entities.

Given this, the following C functions are required:

cwp_set for setting a property value

cwp_get to get a property value

cwp_method to define a method

cwp_call to call a method

For completeness, two other functions, cwp_unset and cwp_unmethod,
are defined to permanently remove a property or a method, respectively.

These are the functions needed to access the object-oriented interface, but something important is still missing:
as explained earlier, when using prototypes objects are created from other objects;
what we need is a starting point, i.e. an object that exists the moment the code is compiled and is never deleted.

We accomplish this by creating the cwpObject prototype statically.

cwpObject

The static cwpObject prototype requires at least a method to create other objects,
which will be called “create” (in the example it's called “clone”), and a method
to delete the object when it's not needed anymore (this is C, memory management is not done automatically),
which will be called “destroy”.

Regarding properties, it's common for objects to override a method, but then call the superclass's
implementation somewhere inside it (the example shows it cleary in the overridden getA method). As such,
the cwpObject prototype will include a “super” property containing the object acting as a superclass.

These are the strict minimum, but there are some more methods which are pretty useful to have.
One of these is the “to string” method, used to generate a representation of the object that can be used
with e.g. printf. Another one is the “equals” method: for those familiar with Java, it's
the same thing. For those not familiar with it… it's a more detailed version of ==.

How do we represent this in C? Of course, it's a struct:

struct object {
};

Naturally, this means that if we want to pass objects to functions or return objects from them, we need to make the actual type a pointer:

typedef struct object *cwp_object_t;

For the sake of information hiding, the typedef will be placed in the header file (e.g. cwp.h),
while the structure definition in the implementation file (e.g. cwp.c), so that users will not know the structure layout,
using the cwp_object_t type instead.

The object Structure

As explained earlier, an object contains any number of properties and methods, so struct object will
contain vectors for properties and methods:

If we just use the raw vectors, users will have to know the index of a particular entry (property or method) before it can use it.
This makes code unnecessarily unreadable: what is that 0? Is it “create”? Is it “destroy”? As such,
it's better to search properties and methods using names (i.e. strings) rather than indices:

struct property {
const char *name;
cwp_object_t value;
};

Having properties store only objects will make things a bit more verbose, but simplifies some operations as shown later on.

This takes care of properties, but what about methods? How do we associate a function with a string?
Thankfully, C has function pointers, which allow us to assign a function name to a value, as long as
the function signature is the same.

struct method {
const char *name;
cwp_method_t value;
};

cwp_method_t is a typedef, which will be shown after introducing some additional concepts.

Indeed, struct property and struct method are essentially the same. Do we really need them both? Well,
we don't really need both of them, but using only one data type (e.g. by placing the two value entries in
an union) would require additional casting or some indicator telling us if we are dealing with a property or a method,
which is generally error-prone and doesn't really have any practical advantage. On the other hand, using two different types,
we don't need to pay that much attention to assignments and we can also leverage the compiler's type checking abilities.

Properties

Properties are values associated with a name. They can't be executed like methods, but can be used by methods to perform their operations.
They are essentially the “instance variables” of class-based objects. Unfortunately, unlike instance variables,
properties can't be made private, i.e. accessible only by methods. There is, however, a trick to “hide” them,
explained near the end of the article.

Our struct property associates a name to a cwp_object_t value. We could extend the structure to
also hold types like int or double, but then we'd have to have a way to pass the value with the
appropriate type when setting a property (C has static typing). This means having multiple “set” functions, one for each accepted type.

While it isn't particularily complex, it raises at least one important problem: what happens when we remove the property?
As long as the value is an integer, there isn't much of an issue, but what happens when it is,
for example, a string? Do we need to free it? What if the pointer to that string is still being used outside the
object? It's the same when considering generic pointers.

There isn't a definite answer here, it depends entirely on what the implementor thinks is best. Since whatever decision we take is
most likely to be the wrong one in some use cases, we're going to delegate all these issues to cwp_object_t's
“destroy” method. After all, if we need int or float values, we can simply
box them like Java does (and we're going to do exactly that later on).

Even though properties only have cwp_object_t values, there is still the problem of what to do when we
destroy an object. After all, a single object can be a property of multiple objects. Consider this example
(pseudocode using the functions explained earlier):

The o3 object is a property of both o1 and o2, so when we destroy
o1, the o3 object must still be alive. Similarily, when o3 is destroyed,
o2 still has it as a property, meaning o3 must be alive until o2 is destroyed.

We solve this problem by adding a form of reference counting. Usually references are used with garbage
collectors, but they solve our problem easily even though memory management is manual.

Now, setting a property will increment the object's nref field,
while unsetting a property or destroying the container object will decrement the nref field.
Of course, if there are no more references to the object, that means it can be safely destroyed.

Function Signatures

Before talking about defining methods, there is an important matter to take care of: as shown by
struct method the actual value of a method is a function pointer. This means every function assigned to
that pointer must have the same signature. The structure of this signature is decided entirely by the implementor,
so users will have to put up with seemingly nonsensical choices.

Defining a good signature, especially for functions exposed to the users, is as hard as finding good names for variables.
For CWP, I strived for signatures that are as uniform as possible, though they turned out to be a little bit unintuitive.

Some methods happen to return a value, while some others don't, so the first thing to do is decide how to handle these returned values.

While it's true that it isn't too different in practice than the case with the specified return value,
it makes the code as a whole look more harmonious, since every function looks the same.

We've took care of return values, but what about errors? Sure, we can use the returned value to indicate an error,
but some methods might return values of a type that isn't really suited for errors (e.g. integers; sometimes -1 is an error while
other times it's a valid value).

We could use the now free returned value of the method to report an error using a dedicated type (like an enum),
but then it would be easy for users to forget about it. Consider this example:

cwp_call(cwpObject, "create", &a);
cwp_call(a, "method", NULL);

What happens if cwp_call fails (e.g. because there is no memory available for the object)? When the program is run,
it will result in a crash, but that might happen after several days of execution, and debugging it would likely be hard.

To reduce these kind of problems, the error is yet another function parameter. In particular, it's the first parameter.
That way, users can't say they forgot about it. Either they explicitly pass NULL, or they at least recognize that an error can occur any time.

Lastly, methods need other parameters to work, which depend entirely by the method itself. This is solved easily by
usnig variadic arguments (i.e. the <stdarg.h> header file.)

The methods follow the same structure so that it's easier to define them (less cognitive dissonance).

The reason is that certain methods can't return an object no matter what, but rather values of other types, so void * is the
only way to solve that problem.

Methods

After the discussion about function signatures, defining methods is actually fairly straightforward: first, we define a function
with the cwp_method_t signature, then use it as the method value. Since this is one of the most performed actions,
we can define a macro to save us the time to write down the signature (which also helps if the signature changes for one reason or another):

Among the possible ways a method can fail, the most common are wrong arguments and no more memory available
(this case usually for the “create” method), thus, a minimal definition for cwp_error_t would be this:

typedef enum {
cwpNONE,
cwpARG,
cwpNOMEM,
} cwp_error_t;

cwpNONE means there were no errors and it should always be used when a method terminates correctly,
otherwise the user might get spurious errors caused by recycling a variable.

About the returned value, what happens if we want to return nothing? A straightforward implementation might simply return NULL,
but that's not optimal. NULL can cause issues when it's not expected and in general is a headache (Hoare even regretted thinking about it).
It's much better to return a special cwp_object_t value, so that other functions working with objects can operate on
a valid object without worries. This special object will be called cwpEmpty.

Defining the cwpObject prototype

Now that we have a framework to define properties and methods, we can finally define the cwpObject prototype.

Since it has to be used in cwp_call, its type must be cwp_object_t, i.e. a pointer.
However, it must already exist before the program is executed (we can't rely on users calling functions like init_cwp
and doing that automatically is a hack.)

We solve this problem by defining a variable of type struct object (not a pointer), then assigning
its address to cwpObject:

struct object object = {
};
cwp_object_t cwpObject = &object;

What's inside object? Of course, the first value is the value of the nref field.
Since cwpObject always exists in itself, its value will be 1.

After the number of references, we have the properties and the methods. These vectors must be statically initialized too,
so we apply the same trick and define two more variables:

The &empty value is the value of the cwpEmpty object introduced previously.
It will be explained after these variables.

The choice of using cwpEmpty as cwpObject's superclass is purely implementative.
Other implementors might decide to make it something else.

The C functions used as methods are of course defined using the framework fleshed out earlier. The “destroy” method is special:
since cwpObject can't be destroyed, we can't assign a method to it, but at the same time that method must be
available to objects created from this prototype. Using a NULL pointer is the only choice, as long as
we properly check for it inside cwp_call, in which case nothing is executed and cwpNONE is returned as error code.

The “to string” method is defined using a space in its name, rather than using the more familiar “toString”.
This is merely to show that we are not bound to any particular limitation with names, as long as it's a valid string.

cwpEmtpy isn't supposed to be used as a prototype, so we'll give it no properties and no
“create” method (it naturally follows that there is no need for a “destroy” method).
However, the “equals” and “to string” methods are pretty useful (especially “equals”),
so they are made available to the object.

Creating Objects

Now that our starting point is ready, we can finally define our generic “create” method. It's “generic” because
this method will simply clone an already existing object as-is, without modifying anything. Specific changes are left to users
when they want to “subclass” (or rather, specialize) an object.

There are many ways in which an object can be cloned. Some languages makes so that changing
the prototype also changes the already cloned objects, while others keep the two entities separated, i.e. when a object is cloned,
changes don't affect the clones.

There are practical advantages and disadvantages to both choices and one isn't better than the other.
However, the second option is easier and less error-prone to implement, so that's how CWP objects will be created.
On the other hand, it will use a lot more memory.

Even though the method isn't supposed to modify the new object, it does actually need to make an important change:
the “destroy” method needs to be changed from NULL to the object_destroy function.

The method would then need to follow these steps:

Check if the arguments are valid

Allocate enough memory for the object and the object's vectors

Copy the prototype's property and method vectors

Change the “destroy” method

Increment the properties' nref field

Later on, some special objects will be introduced which follow a similar pattern. For convenience, a function oalloc
implementing the steps 2, 3, and 5 will be defined:

When returned is NULL, we don't create the object at all because the caller isn't interested in the return value
(for whatever reason). When allocating the object, the error code is set by oalloc so in case of errors we can simply return.

When an object is freshly created, before being returned to the caller, the layout of the methods and the properties is known.
As such, the “destroy” method is set to object_destroy immediately using an index in the vector.

The generic “create” doesn't need additional additional arguments, so only self is checked for validity.

Destroying Objects

We have a method to create objects, but now we need a method to destroy them. Intuitively, it's just a matter of reversing the
“create” method:

Since this method isn't expected to return a meaningful value, it returns cwpEmpty.

When examining properties, special care must be taken as an object can contain itself as a property.
Since the “destroy” method is called on the object's properties, if a property is the object itself that would cause
a sort of infinite recursion. When this case is met, we can skip the property, because the value is going to be destroyed anyway.

Now, here comes something interesting: what happens when the allocated memory is freed? Of course, the answer is that it becomes invalid.

When creating a new object, memory is simply copied over using memcpy. This isn't a big deal,
because what is copied is just pointers. However, when an object is destroyed, all the memory allocated for it should be freed.

Since using memcpy all we do is share the same pointers between two different objects, we can't just call free on
the pointers, because otherwise if the same data is being used by a still-living object, that data will become invalid,
causing a crash (if nothing worse).

As such, names are duplicated. Objects take care of deleting themselves thanks to the nref field
(as long as objects are never destroyed with free, but only with the “destroy” method) and
function pointers don't need to be deallocated. This means the burden of managing the strings is entirely on us.
Duplicating the string as a newly allocated piece of memory solves all of our problems, since we can then deallocate it without worrying.

Of course, we don't need to duplicate every possible name. The statically defined names inside object don't need to
be duplicated, since they must be always available and can never be deallocated (otherwise it's impossible to create new objects).

We now have a method to create and a method to destroy objects. “Deep equality”, performed by the “equals”
method, is just a matter of comparing each property and method of the two objects (with “equals”).
Special cases that can optimize execution aside, it's a fairly trivial implementation, and as such it's omitted.
The “to string” method will be explained later, after introducing other concepts.

Setting, Getting and Unsetting

After creating an object, it's only natural that one would want to modify its properties or methods.

Since they are defined as vectors, getting a property or a method is a matter of following these steps:

Check if the arguments are valid

Search the vector for the property or the method with the desired name

If the element is found, return or execute it

Otherwise, return cwpEmpty

Similarily, setting a property or a method would follow these steps:

Check if the arguments are valid

Search the vector for the property or the method with the desired name

If the element is found, change the value field

If it was a property, decrease the old value's nref field

If the old value has no more references, call “destroy” on it

Otherwise, increase the vector's size and add the new property or method to it

For both these actions, we need to first search the appropriate vector to find out if the object already has the property or the method.
There are probably more efficient implementations than plain vectors for this use case, but for the sake of simplicity and
ease of understanding, let's roll with that.

Like when creating an object, it's important to duplicate the string used as a name. However, unlike the “create” method, we can't trust
the input. After all, during creation we can safely assume the strings are properly NUL-terminated, which is something that can't be said for
user-provided strings. While not an end-all solution, using asprintf to duplicate the string should at the very least make the whole thing a
bit more robust.

Similarily, changing the vector's size can have some very subtle bugs. Using reallocarray makes the operation a little bit more safe, at the
price of requiring a fairly recent feature.

Unsetting (or “deleting”) a property or a method is a special case: to begin with, we have to deal with the fact that the
requested method or property might be in the middle of the vector, so we can't just do the reverse of cwp_set by shrinking it.

The solution is actually fairly simple: we swap the position of the requested property or method with the last element of the vector, then perform
the other operations.

However, there is at least another important issue: some properties or methods are required by other functions or methods to perform their duties.
The clearest example is “destroy” which is called by many functions to make sure objects are destroyed and there are no memory leaks.
As such, if the property or method to delete has an index in the vector which is less than object.nprops or object.nmeths,
the property or method is not deleted.

The actual implementation of the cwp_set/cwp_method, cwp_get/cwp_call and
cwp_unset/cwp_unmethod functions is not shown because it's just a matter of following what has been explained alredy, without
any particularly special considerations.

Boxing: Making cwpObject Actually Useful

Now we can add, change or remove properties and methods from our objects. However, right now it's hardly useful. After all,
properties can only be objects,
and even though methods can operate on aribitrary data, that would miss the point of object-orientation.
What we need is a way to store arbitrary data, e.g. an integer, as a property.

The solution is to create an object that contains the actual value, like a cwpInteger that contains the integer value we want to store.
Taking from the Java world, this process is called boxing.

Ideally, we want to store as many built-in types inside properties as possible, so we create the following prototypes:

cwpString for strings (char *)

cwpInt for signed integers (int)

cwpNat for “natural numbers” (unsigned int)

cwpLongInt and cwpLongLongInt for signed integers with more bits (long int, long long int)

cwpLongNat and cwpLongLongNat for natural numbers with more bits (unsigned long int,
unsigned long long int)

cwpFloat and cwpDouble for floating-point numbers (float, double)

How do we store the boxed value? After all, we specifially forbid adding thos fields to struct property because of easy-to-introduce errors.
The answer is making another struct:

Unlike struct property, this structure is completely invisible to the user. In particular, we forbid any modification after creating
the object. That way, we reduce the risks of adding a bug caused by mismanagement of these values.

Actually, since this structure is fairly limited in what it can do, we can probably optimize its memory usage by placing it in an
union or something like that.
For the sake of simplicity, though, the implementation explained in this article will keep it like this.

Now, we can define the prototypes. Numbers are essentially the same, except for the “create” methods which takes a value of the
appropriate type and assigns it to the appropriate field in the vector of primitive values.

We keep the “destroy” method because, even though it's a boxed integer, it's still an object, so users can add properties and
methods to it.

box_properties is used by cwpString too, while number_primitives is used by the objects boxing a number.

Like object_equals, number_equals is also very simple: it's a matter of comparing each number field of the two objects
(after all, the float number 0.0 is the same as the integer number 0, mathematically).

After defining all the boxed numbers, we can finally store a number inside an object:

The structure of this object is pretty much the same, with a few differences: to begin with, the value primitive is the value of
the string_value field. But a string also has an inherent lenght property. We define this property as a primitive value,
initially set to 0 (i.e. it's the empty string), then define a method “length” (or “size”, or even both), to get this number.

Like when dealing with property or method names, we must duplicate the string that will be boxed. Not only it makes sure we can safely deallocate
all the memory used by our string object, but it also makes sure (as far as asprintf goes) that our strings are
NUL-terminated. Another advantage is that strings can't be manipulated from outside, making them immutable objects.

String equality is essentially a wrapper around a strcmp of the string_value of the two objects.
Unlike normal objects, when comparing strings, what we care for is wether or not the “value” properties are the same, so we ignore
other properties or methods (but, of course, a subclass might think differently!).

cwpString contains a special method, “to c array”. This method is used to get the string as a char *, so that
it can be used in e.g. printf.

Ultimately, it's just a matter of returning the content of the string_value field, but we can't just return the pointer, as that would break
consistency within the object (that is, someone might change the contents of the string or, worse, deallocate it!).

The “to string” method is just a matter of returning a new cwpString
(or one cached in one of the properties) to the caller. Of course, the returned values (especially the one created by “to c array”)
should be deallocated after being used.

About Generic Pointers

With this implementation, the most essential types can be stored as a property of an object.
However, a particular type is missing: pointers. We do have pointers to characters (“strings”),
but not other types of pointers (i.e. void *, since we can't realistically box every possible pointer).

This is a design choice. Strings are special kind of pointers: the content being pointed to is supposed to be readable by humans, so it will not contain
strange bytes and (bugs aside) is always terminated by a NUL. These characteristics allow us to treat strings in a certain way, e.g. by
copying them before storing them. The last behaviour in particular means we can deallocate them as we please whenever we're done with them without
affecting the outside world (and viceversa) .

We can't do that with generic pointers. The major issue is that in some cases it might be impossible to copy the pointed content. For example,
how do we copy a FILE *? In this case, it's not just a matter of using memcpy (or asprintf, or…).

Since we can't copy a FILE * storing it as-is inside an object would mean that someone from outside can invalidate that pointer in one
way or another, making our object useless if not dangerous.

To avoid this kind of subtle bugs, generic pointers can't be stored inside an object. Users will have to manage them “the old way” (or
by casting some magic by treating them as a normal int, but that's a different kind of problem.)

Private Properties, or Hiding Informations Without Writing Code

As introduced earlier, properties can't be made private. An object always exposes all its properties and code can change them any time, which
is something that can cause bugs if some piece of code doesn't behave properly.

Does this means all we can do is hope for the best? We don't. Unlike other prototype-based languages, CWP lacks a “for each”
(or “map”, or…) method, that is, there is no built-in way to enumerate the properties of an object.

Why is this important? Because properties are stored in a vector that is not accessible from outside. Since it's not accessible, users can't
look at its content without examining the memory layout of the whole object (e.g. by casting it to a uin8_t *, then using pointer arithmetics
to get the pointer to the vector), but it isn't a very reliable method, and a faux pas can even corrupt the whole thing.

Thanks to this “missing feature”, the only way a user can know which properties an object has is reading the object's documentation.
Consider this example:

Point
-----
Constructors:
Point(int x, int y)
Create a new Point object at position (X, Y).
Properties:
x (int)
The X coordinate of the Point object.
y (int)
The Y coordinate of the Point object.
Methods:

The “private” property is not part of the documentation; thus, a user that will not read the source code of the Point prototype
can't know about it.

As the old saying goes: what the eye doesn't see, the heart doesn't grieve over. If the documentation doesn't mention a property (or even a method),
that property (or method) doesn't exist, and if a user somehow uses it, it's a bug in the user's code.

I agree that this method is not the classic encapsulation we're all used to, but the expected result is preventing users from accessing
certain properties or methods. The details are unimportant.

By relying entirely on the documentation, a “for each” method might look like:

Because the documentation told us that o has two properties called prop1 and prop2.

In addition, it has the added bonus that documentation remains up to date. Sometimes, applications or libraries don't really update their documentation,
or even just leave it unfinished, for one reason or another. It's common with tools like Doxygen that parse comments. Some functions or data types
might lack such a comment or have something very brief, which isn't really helpful. By giving documentation an important role like in CWP,
not keeping it as complete and up to date as possible means users will be unable to use it at all, either with the built-in prototypes or some
other user-defined prototypes.

Performance and Uses

Even with just a cursory examination, it's obvious that CWP is slower and uses more resources than the other class-based
“object-oriented C”.

However, unlike them, it doesn't require anything special before it can be used: since classes are static entities, it can happen that changing
just one of them breaks the whole program (e.g. because the linker can't find a symbol anymore when linking .o files).
CWP just needs to be #includeed and it will always work. Of course this doesn't mean it's automatically better, but
it's certainly a plus.

Given its performance, is it really good for general usage? Certainly, it doesn't make sense to use it to build common data structures. If you need
a linked list, it's better to use the usual struct list with raw pointers everyone is familiar with. This isn't Java, you don't
need objects for everything.

However, there are some use cases in which objects work better than normal structs. The most obvious example is graphical user interfaces,
since the whole thing is modeled as a set of objects that communicate with each other through events ever since Smalltalk.
Thanks to the dynamic nature of CWP, it's easier to specialize certain generic widgets:
for example a button instance can change it behaviour according to some state just by calling
cwp_method at some point. With class-based interfaces this operation is often not possible.

Conclusion

A complete working (and commented) implementation is available on my Gitlab page.

It's especially useful since this article omitted the code in some parts.