The Essence of OOP: It's all messages, all the time.

Category Archives: Syntax

A self expression is essentially a (possibly cascaded) message sent to self, except that the pseudo-variable “self” is omitted. The value of the “invisible self” that receives the message is established by configuring the compiler to set the pseudo-variable self to the desired value during execution of the self expression.

A self expression is allowed just onemessage chain having just one message receiver–and the receiver must not be specified syntactically.

Conceptually, the syntactically unspecified–and therefore implied–receiver of the message(s) in the message chain of a self expression is the pseudo-variable self. That’s because the same compiler infrastructure used to set the pseudo-variable self to the correct value during the execution of a method or block is also used to set the value of the implied, unspecified receiver of the message(s) in the message chain of a self expression, and also because a self expression is parsed by the parser simply by adding an actual parse node for the pseudo-variable self as the receiver of the message(s) in the message chain of the self expression. That’s possible because the parser implements self expressions as their own “root parse node” or “grammatical start symbol.”

Any syntactically-valid expression which sends a message to an operand can be converted into a self expression simply be removing whatever syntactical construct is the receiver of the message (a.k.a, the leftmost operand in an expression.) The following examples show two pairs of expressions, where the first member of the pair uses self expression syntax and the second one does not:

Prior Art

The inspiration for self expressions comes from Tirade, a data representation language invented by Göran Krampe. The main reason that Essence# uses self expressions instead of Tirade is simply because, once you have a full parser/compiler, it is significantly simpler to implement self expressions than to implement Tirade. Given a full compiler, implementing self expressions only involves adding a few, relatively small methods to the parser and compiler. And it also technically doesn’t require adding new syntax to the language, since self expressions only use syntactical forms that would have to exist in any case; the only innovation is to permit simpler syntax that omits an otherwise-required syntactical construct. So self expressions were “the simplest thing that could possibly work.”

That said, Tirade would be a much superior solution to the problem of programming-language neutral data interchange. But that’s not the problem that self expressions are intended to solve.

A method declaration is the syntactical construct used to define a method, including its name and its logic. In Essence#, the name of a method is typically referred to as its method selector; or even just its selector.

At the beginning of a method declaration, the parser will recognize “##” (two immediately-adjacent hash characters) as a lexical token called a method header token, and will interpret the token to mean “what follows is a method header.”

Whitespace is permitted, but not required, both preceding and following the method header token.

A method header token may optionally appear as the first token of any method declaration; but it is not required if the method declaration is the root of the parse tree.

However, if a method header token occurs as the first (non-whitespace) token following a “[” token, then the parser will have no choice but to interpret what follows as the remaining tokens of a method literal (which must be terminated by a “]” token eventually.) And in that case, the “##” (method header) tokenmay be required:

If the initial token following a “[” (BlockBegin) token is either a binary message selector or a keyword, then the source code enclosed within the “[” and “]” tokens will be parsed as a method declaration even though there is no leading method header token, and the entire construct (including the “[” and “]” tokens) will be interpreted as a method literal, and not as a block literal. So in either of those cases, no method header token is required.

However, if the first token following a “[” token is a unary message selector (which might instead just be a variable name,) and if there is no leading method header token in between the “[” token and the unary message selector, then the parser will not interpret the construct as a method literal, but will instead interpret it as a block. So, when a method declaration occurs as part of a method literal, andsaid method declaration has a unary method header, the only way to get the parser to interpret the construct as a method literal, and not as a block, is to use a method header token as a prefix to the unary method header.

Note: The method header token will also be required as a prefix to a binary method header, if the binary selector is the “|” (vertical bar) token. That constraint is required in order to avoid syntactical ambiguity, due to the fact that a vertical bar token may also be the initial token of a variable declaration list.

If and only if a method header is preceded by the “##” (method header) token, the name of the class which is to be used as the environment for binding variable references when compiling the method may be specified preceding the method header. But in that case, the token “>>” must then be used as a separator between the class name and the method header.

Whitespace is permitted but not required in between the class name and the “>>” token, and in between the “>>” token and the method header.

Following the method header there must be an executable code construct. An executable code construct defines the method’s logic. Colloquially, an executable code construct is referred to as a method body. A method bodyhas the same exact syntactical structure as a block body.

There are three different types of method header: A unary method header, a binary method header and akeyword method header.

A method declaration with a unary method header must be invoked using a unary message.

A method declaration with a binary method header must be invoked using a binary message.

A method declaration with a keyword method header must be invoked using a keyword message.

Examples of method declarations using all three types of method header are shown below (none of which have a method header token as a prefix):

If a method header token (“##”) precedes it, then a class specification construct may optionally precede any of the three types of method header. If present, the class specified by the class specification will be used by the compiler as the behavioral context in which the method will be compiled. In other words, the instance variables defined by the specified class, the class variables defined by the specified class and the global variables imported by the specified class will be used to bind any variables referenced by the method that aren’t either method parameters or local variables.

Here are the same three method declarations constructed to have an optional method header token and class specification construct as a prefix to the method header:

Method declaration using a unary method header and an optional class specification:

Any method declaration (whether it uses a unary method header, a binary method header or a keyword method header, and whether or not it uses an optional class specification) may optionally begin with a method header token. The reason the method header token is optional is because its purpose is either to separate one method declaration from another in a sequence of method declarations, or else to distinguish a method literal from ablock. Outside of those two cases, it has no purpose, function or meaning. Its presence or absence has no effect on the semantics of the method.

A method literal must be enclosed between a single beginning “[” character and a single ending “]” character, making its syntax rather similar to that of a block. The key difference between the syntax of a block and the syntax of a method literal is that the construct that immediately follows the beginning “[” character must be unambiguously a method header. And that, in fact, is one of the reasons that the “##” (method header) tokenexists, and is required in some cases, but optional in others: When the “##” token is required, it’s because its absence would create syntactical ambiguity, such that it would not be possible for the parser to distinguish ablock from a method literal.

Methods defined in Essence# class libraries declare methods as method literals, instead of as method declarations that are the root of their respective parse trees. Using method literals for that purpose obviates any need to encode method names as filenames; or alternatively, it obviates any need to define a special syntax for dealing with sequences of method declarations, or for syntactically embedding method declarations inside of class declarations. So there’s no need for a special “file in” syntax, nor any need for a special parser that can consume a special “file in format.”.

That said, the ANSI standard does require that a conforming implementation support the Smalltalk Interchange Format. Essence# does not currently support that format, but will do so before it leaves beta.

Using Class Specifications

A method declaration may optionally use a class specification construct–but only if a method header token is also used. That means a method literal may also use a class specification construct, since its syntax is defined as an embedding of a method declaration enclosed in between the tokens “[” and “]”.

The presence or absence of the class specification constructmay change the behavior of the compiler:

If there is no class specification in the method header, then whether or not the compiler will attempt to bind non-local variable references depends upon how the compiler is invoked. If the compiler is not provided with a binding context for non-local variables when it’s invoked, and if there is no class specification in the method header to provide one, then the compiler won’t check whether any references to non-local variables might be undeclared (however, that check is always performed whenever a method is added to a class or trait.)

On the other hand, if the method header includes a class specification, then the compiler will always attempt to bind references to non-local variables, using whichever class is specified by the class specification construct as the binding context. In that case, any undeclared variables will be treated as compilation errors.

In other words, the compiler interprets the presence of a class specification construct in a method header as a command to verify that there would be no undeclared variables referenced by the method it’s compiling, if that method were to be added to the specified class. Conversely, it interprets the absence of a class specification as a command to defer any such checks until the method is actually added to a class or trait.

When compiling either self expressions or “do its” (initializers or scripts,) the compiler is not configured to provide any default binding context for method declarations–and therefore is also not configured to do so formethod literals. That’s because there’s no way to know a priori what the “right” binding context might be in such cases.

Since methods are checked for any references to undeclared variables when they are added to a class or to a trait (which is usually the proper time, because that’s when the right binding context is known absolutely,) there are no system integrity issues raised by this binding paradigm. And that’s why the method literals in “methods.instance” and “methods.class” files don’t use class specification constructs in their method headers. There’s no need, really.

However, there are compilation use cases other than compiling “methods.instance” and “methods.class” files. And some of those use cases do require that the compiler bind all variable references during initial compilation–which is why class specification syntax is present as on option for method headers.

I haven’t discussed or explained much about the metaprogramming capabilities of Essence#. So let me do that now. First, I will present some example code, and then I will explain what it does, and how it does it:

What the example code does

The example code shows how to define instance-specific behavior for an object.

In this specific example, the array #(5 8 13)–which is an array of the three integers 5, 8 and 13–is made to also have the behavior defined by the trait Examples.TDebutant, and it is also bound to a method defined dynamically by the code of the example–one that no other array instance in the whole universe will have.

Because of the new behavior added uniquely to this one particular array object, it (and it alone) will understand the messages #introduceYourself (imported from the trait Examples.TDebutant) and #whatAreYou (dynamically defined and bound solely to the one array instance by the code of the example.)

So when the array instance #(5 8 13) is sent the message #introduceYourself, it responds by writing the text ‘Hello, world! I”m a Debutant!’ to the Console. And when it is sent the message #whatAreYou, it responds by writing the name of the superclass of its class to the Console. However, were either of those messages sent to any other array objects, the result would be a MessageNotUnderstood exception (that’s true even if any of the other arrays also contained the same three elements–only the object identity of a message receiver matters when one engages in this sort of metaprogramming.)

The point of the final statement of the example is to demonstrate that the array instance whose behavior was modified by the code of the example nevertheless continues to behave like a normal array object in all other respects. The final statement evaluates to the first element of the array that, when divided by 2, has a remainder (technically, a residual) of zero. So it evaluates to 8, which is the same result that one would obtain by sending the same message to any other array instance whose first even-numbered element was the integer 8 (assuming all of its elements were numbers.)

How the example code works

The code in the example does its magic by creating a new class (technically, a new Behavior, which you can think of as a “lightweight class” or “proto-class”), sending the new class the message #uses: Examples.TDebutant so that will import the methods defined by the trait Examples.TDebutant, sending the new class the message #addMethod: with a method literal object as the message argument, and sending the new class the message #superclass: with the class of an empty array object as the message argument (which will, of course, be the same class as that of the array #(5 8 13)). But all of that is only the appetizer; by itself, it would not be sufficient to modify the behavior of the array object #(5 8 13).

The master spell that takes us to the next level is performed by the magic message #changeClassToThatOf:, whose receiver can be any Essence# object whose object state architecture is compatible with the instance architecture of the class of the object that is the argument to the message. As you might infer from the name of this spell…uh, message, the effect of uttering (sending) it is to change the class of the receiver. In this case, it changes the class of the array object #(5 8 13) to be the new class we’ve created.

Obviously, the reason the array object #(8 5 13) acquires new behavior in the example code is because its class gets changed. And the reason it–and only it out of all other array objects–gets the new behavior is because we only changed the class of the one array object, and not the class of any other array. And the reason it gets the ability to respond to the messages #introduceYourself and #whatAreYou is because we specifically added methods with those names to the array object’s new class (note that the order in which we add the new behavior to the class and change the class of the array doesn’t matter, provided both of those changes are accomplished before we try to make use of them.)

And finally, the reason the array object continues to have array behavior after we’ve changed its class is because we made the array’s original class be the superclass of the new class we created. Since the array’s original class is the superclass of its new class, any messages sent to it will still end up being handled by the methods of its original class.

The release notes on CodePlex provide two examples of some minor new functionality added by this release. Please read the release notes for the details.

In addition to the new functionality, the bug fixes have enabled features that used to work, stopped working at some unknown point in the past, but are now working again. Permit me to demonstrate some of them in the following example script:

The example uses method literal syntax to define methods. Although those have been working ever since the moment the Essence# compiler was able to compile anything at all–and they are used exclusively in the Essence# class library as the syntax for defining methods–there were limitations on their use (partially due to bugs) that have now been eliminated. Specifically, it used to be the case that the containing scope in which they were defined had to be the same class in which they would be installed as methods. That restriction is now lifted.

However, there are consequences to lifting that restriction, one of which is that the compiler will blithely accept and ignore any undeclared variables in a method literal when the syntax demonstrated by the example above is used to express the method literal. Fortunately, there is an extended syntax for method literals that can be used so that the compiler will not ignore and will not accept references to undeclared variables. Here’s an example:

If the method literal starts with a class name, followed by a “>>” token (which then must be followed by the method’s selector), then the compiler will compile the method in the context of the named class, using the definition of the class at the time of compilation to resolve variable refernces. And the compiler will then also abort the compilation with an error if any of the variable references cannot be resolved. The class name may optionally use qualified name syntax (“dotted notation”) so as to refer to classes in specific namespaces.

Actually, there is much functionality in Essence# which I have not yet discussed. I will be revealing it as time permits.

Essence# has supported both dynamic array literals and dictionary literals from its inception. Those were both working from the moment the system first successfully compiled and executed a “do it” earlier this year (8 March 2014.)