The Essence of OOP: It's all messages, all the time.

Category Archives: Computer science

A self expression is essentially a (possibly cascaded) message sent to self, except that the pseudo-variable “self” is omitted. The value of the “invisible self” that receives the message is established by configuring the compiler to set the pseudo-variable self to the desired value during execution of the self expression.

A self expression is allowed just onemessage chain having just one message receiver–and the receiver must not be specified syntactically.

Conceptually, the syntactically unspecified–and therefore implied–receiver of the message(s) in the message chain of a self expression is the pseudo-variable self. That’s because the same compiler infrastructure used to set the pseudo-variable self to the correct value during the execution of a method or block is also used to set the value of the implied, unspecified receiver of the message(s) in the message chain of a self expression, and also because a self expression is parsed by the parser simply by adding an actual parse node for the pseudo-variable self as the receiver of the message(s) in the message chain of the self expression. That’s possible because the parser implements self expressions as their own “root parse node” or “grammatical start symbol.”

Any syntactically-valid expression which sends a message to an operand can be converted into a self expression simply be removing whatever syntactical construct is the receiver of the message (a.k.a, the leftmost operand in an expression.) The following examples show two pairs of expressions, where the first member of the pair uses self expression syntax and the second one does not:

Prior Art

The inspiration for self expressions comes from Tirade, a data representation language invented by Göran Krampe. The main reason that Essence# uses self expressions instead of Tirade is simply because, once you have a full parser/compiler, it is significantly simpler to implement self expressions than to implement Tirade. Given a full compiler, implementing self expressions only involves adding a few, relatively small methods to the parser and compiler. And it also technically doesn’t require adding new syntax to the language, since self expressions only use syntactical forms that would have to exist in any case; the only innovation is to permit simpler syntax that omits an otherwise-required syntactical construct. So self expressions were “the simplest thing that could possibly work.”

That said, Tirade would be a much superior solution to the problem of programming-language neutral data interchange. But that’s not the problem that self expressions are intended to solve.

A method declaration is the syntactical construct used to define a method, including its name and its logic. In Essence#, the name of a method is typically referred to as its method selector; or even just its selector.

At the beginning of a method declaration, the parser will recognize “##” (two immediately-adjacent hash characters) as a lexical token called a method header token, and will interpret the token to mean “what follows is a method header.”

Whitespace is permitted, but not required, both preceding and following the method header token.

A method header token may optionally appear as the first token of any method declaration; but it is not required if the method declaration is the root of the parse tree.

However, if a method header token occurs as the first (non-whitespace) token following a “[” token, then the parser will have no choice but to interpret what follows as the remaining tokens of a method literal (which must be terminated by a “]” token eventually.) And in that case, the “##” (method header) tokenmay be required:

If the initial token following a “[” (BlockBegin) token is either a binary message selector or a keyword, then the source code enclosed within the “[” and “]” tokens will be parsed as a method declaration even though there is no leading method header token, and the entire construct (including the “[” and “]” tokens) will be interpreted as a method literal, and not as a block literal. So in either of those cases, no method header token is required.

However, if the first token following a “[” token is a unary message selector (which might instead just be a variable name,) and if there is no leading method header token in between the “[” token and the unary message selector, then the parser will not interpret the construct as a method literal, but will instead interpret it as a block. So, when a method declaration occurs as part of a method literal, andsaid method declaration has a unary method header, the only way to get the parser to interpret the construct as a method literal, and not as a block, is to use a method header token as a prefix to the unary method header.

Note: The method header token will also be required as a prefix to a binary method header, if the binary selector is the “|” (vertical bar) token. That constraint is required in order to avoid syntactical ambiguity, due to the fact that a vertical bar token may also be the initial token of a variable declaration list.

If and only if a method header is preceded by the “##” (method header) token, the name of the class which is to be used as the environment for binding variable references when compiling the method may be specified preceding the method header. But in that case, the token “>>” must then be used as a separator between the class name and the method header.

Whitespace is permitted but not required in between the class name and the “>>” token, and in between the “>>” token and the method header.

Following the method header there must be an executable code construct. An executable code construct defines the method’s logic. Colloquially, an executable code construct is referred to as a method body. A method bodyhas the same exact syntactical structure as a block body.

There are three different types of method header: A unary method header, a binary method header and akeyword method header.

A method declaration with a unary method header must be invoked using a unary message.

A method declaration with a binary method header must be invoked using a binary message.

A method declaration with a keyword method header must be invoked using a keyword message.

Examples of method declarations using all three types of method header are shown below (none of which have a method header token as a prefix):

If a method header token (“##”) precedes it, then a class specification construct may optionally precede any of the three types of method header. If present, the class specified by the class specification will be used by the compiler as the behavioral context in which the method will be compiled. In other words, the instance variables defined by the specified class, the class variables defined by the specified class and the global variables imported by the specified class will be used to bind any variables referenced by the method that aren’t either method parameters or local variables.

Here are the same three method declarations constructed to have an optional method header token and class specification construct as a prefix to the method header:

Method declaration using a unary method header and an optional class specification:

Any method declaration (whether it uses a unary method header, a binary method header or a keyword method header, and whether or not it uses an optional class specification) may optionally begin with a method header token. The reason the method header token is optional is because its purpose is either to separate one method declaration from another in a sequence of method declarations, or else to distinguish a method literal from ablock. Outside of those two cases, it has no purpose, function or meaning. Its presence or absence has no effect on the semantics of the method.

A method literal must be enclosed between a single beginning “[” character and a single ending “]” character, making its syntax rather similar to that of a block. The key difference between the syntax of a block and the syntax of a method literal is that the construct that immediately follows the beginning “[” character must be unambiguously a method header. And that, in fact, is one of the reasons that the “##” (method header) tokenexists, and is required in some cases, but optional in others: When the “##” token is required, it’s because its absence would create syntactical ambiguity, such that it would not be possible for the parser to distinguish ablock from a method literal.

Methods defined in Essence# class libraries declare methods as method literals, instead of as method declarations that are the root of their respective parse trees. Using method literals for that purpose obviates any need to encode method names as filenames; or alternatively, it obviates any need to define a special syntax for dealing with sequences of method declarations, or for syntactically embedding method declarations inside of class declarations. So there’s no need for a special “file in” syntax, nor any need for a special parser that can consume a special “file in format.”.

That said, the ANSI standard does require that a conforming implementation support the Smalltalk Interchange Format. Essence# does not currently support that format, but will do so before it leaves beta.

Using Class Specifications

A method declaration may optionally use a class specification construct–but only if a method header token is also used. That means a method literal may also use a class specification construct, since its syntax is defined as an embedding of a method declaration enclosed in between the tokens “[” and “]”.

The presence or absence of the class specification constructmay change the behavior of the compiler:

If there is no class specification in the method header, then whether or not the compiler will attempt to bind non-local variable references depends upon how the compiler is invoked. If the compiler is not provided with a binding context for non-local variables when it’s invoked, and if there is no class specification in the method header to provide one, then the compiler won’t check whether any references to non-local variables might be undeclared (however, that check is always performed whenever a method is added to a class or trait.)

On the other hand, if the method header includes a class specification, then the compiler will always attempt to bind references to non-local variables, using whichever class is specified by the class specification construct as the binding context. In that case, any undeclared variables will be treated as compilation errors.

In other words, the compiler interprets the presence of a class specification construct in a method header as a command to verify that there would be no undeclared variables referenced by the method it’s compiling, if that method were to be added to the specified class. Conversely, it interprets the absence of a class specification as a command to defer any such checks until the method is actually added to a class or trait.

When compiling either self expressions or “do its” (initializers or scripts,) the compiler is not configured to provide any default binding context for method declarations–and therefore is also not configured to do so formethod literals. That’s because there’s no way to know a priori what the “right” binding context might be in such cases.

Since methods are checked for any references to undeclared variables when they are added to a class or to a trait (which is usually the proper time, because that’s when the right binding context is known absolutely,) there are no system integrity issues raised by this binding paradigm. And that’s why the method literals in “methods.instance” and “methods.class” files don’t use class specification constructs in their method headers. There’s no need, really.

However, there are compilation use cases other than compiling “methods.instance” and “methods.class” files. And some of those use cases do require that the compiler bind all variable references during initial compilation–which is why class specification syntax is present as on option for method headers.

What is an Object Space?

An object space is an object that encapsulates the execution context of an Essence# program. It is also responsible for initializing and hosting the Essence# run time system, including the dynamic binding subsystem that animates/reifies the meta-object protocol of Microsoft’s Dynamic Language Runtime (DLR.)

Any number of different object spaces may be active at the same time. Each one creates and encapsulates its own, independent execution context. The compiler and the library loader operate on and in a specific object space.Blocks and methods execute in the context of a specific object space. Essence# classes, traits and namespaces are bound to a specific object space. Even when a class, trait or namespace is defined in the same class library and the same containing namespace, they are independent and separate from any that might have the same qualified names that are bound to a different object space.

In spite of that, it is quite possible for an object bound to one object space to send messages to an object bound to a different object space. One way to do that would be to use the DLR’s hosting protocol. That’s because an Essence# object space is the Essence#-specific object that actually implements the bulk of the behavior required by a DLR language context, which is an architectural object of the DLR’s hosting protocol.

The C# class EssenceSharp.ClientServices.ESLanguageContext subclasses the DLR class Microsoft.Scripting.Runtime.LanguageContext, and thereby is enabled to interoperate with the DLR’s hosting protocol. But an instance of EssenceSharp.ClientServices.ESLanguageContext’s only real job is to serve as a facade over instances of the C# class EssenceSharp.Runtime.ESObjectSpace. And EssenceSharp.Runtime.ESObjectSpace is the class that reifies an Essence# object space.

So, if you are only interested in using Essence#, and have no interest in using other dynamic languages hosted on the DLR, there is no need to use a DLR language context in order to invoke the Essence# compiler and run time system from your own C#, F# or Visual Basic code. You can use instances of EssenceSharp.Runtime.ESObjectSpace directly. The only disadvantage of that would be that using other DLR-hosted languages would then require a completely different API (e.g, using an IronPython library from Essence# code requires using the DLR hosting protocol, and hence requires using a DLR language context).

The advantages of using instances of EssenceSharp.Runtime.ESObjectSpace directly would be a much richer API that is far more specific to Essence#.

You can get the object space for the current execution environment by sending the message #objectSpace to any Essence# class (even to those that represent CLR types.) And the Essence# Standard Library includes a definition for an Essence# class that represents the Essence#-specific behavior of instances of the C# class EssenceSharp.Runtime.ESObjectSpace. It’s in the namespace CLR.EssenceSharp.Runtime, and so can be found at %EssenceSharpPath%\Source\Libraries\Standard.lib\CLR\EssenceSharp\Runtime\ObjectSpace.

There are many ways that Essence# object spaces might be useful. One example would be to use one object space to host programming tools such as as browsers, inspectors and debuggers, but to have the applications on which those tools operate be in their own objects spaces. That architecture would isolate the programming tools from any misbehavior of the applications on which they operate–and vice versa.

Just because there is as yet no Essence# GUI library, and therefore no native Essence# code browsing tools, doesn’t mean that the intrinsic reflecting capabilities of Essence# can’t be used. In fact, scripts are provided in the shared scripts folder that provide at least some of the functionality traditionally provided by code browsers:

ShowAllMethods: The ShowAllMethods.es script can be used to print out the names and declaring class or trait of all the methods of a class or trait. The subject class or trait must be passed in as an argument, as in the following example which will print out the names and declaring class or trait of all the methods of class Array to the Transcript:

es ShowAllMethods -a Array | more

ShowAllMessagesSent: The ShowAllMessagesSent.es script can be used to print out all the messages sent by each method of a class or trait. The output is cross-referenced by the sending methods, and each such method specifies the class or trait that declares it. The subject class or trait must be passed in as an argument, as in the following example which will print out the names of all the messages sent by each method of class Array to the Transcript:

es ShowAllMessagesSent -a Array | more

ShowAllSenders: The ShowAllSenders.es script can be used to print out the names and declaring class or trait of all the methods in the object space that send a specified message. The subject message selector must be passed in as an argument, as in the following example which will print out to the Transcript the names and declaring class or trait of all the methods in the object space that send the message do:

es ShowAllSenders -a #do: | more

ShowAllSendersInHierarchy: The ShowAllSendersInHierarchy.es script can be used to print out the names and declaring class or trait of all the methods of a specified class that send a specified message. The subject message selector and the subject class or trait must both be passed in as arguments, as in the following example which will print out to the Transcript the names and declaring class or trait of all the methods of OrderedCollection that send the message do:

es ShowAllSendersInHierarchy -a #do: -a OrderedCollection | more

ShowUnimplementedMessages: The ShowUnimplementedMessages.es script can be used to print out the names of all the messages sent by the methods of a specified class or trait to the pseudo-variable self for which the specified class or trait has no implementing methods. The subject class or trait must be passed in as an argument, as in the following example which will print out to the Transcript the names of any messages sent to self by the class OrderedCollection that send messages for which OrderedCollection has no implementing methods:

es ShowUnimplementedMessages -a OrderedCollection | more

Note: Classes that represent CLR types typically have virtual Essence# methods that don’t need to be formally declared, because the Essence# dynamic binding system will automatically bind to and invoke the methods of a CLR type, provided those methods have less than two parameters. Messages sent in order to invoke such methods of CLR types will unavoidably show up as “unimplemented messages” when using the ShowUnimplementedMessages.es script.

ShowTraitUsageConflicts: The ShowTraitUsageConflicts.es script can be used to print out the name and declaring trait of all methods which were excluded from a trait usage expression due to the fact that methods with the same selectors were declared by two or more of the traits combined in a trait usage expression. The subject class or trait must be passed in as an argument, as in the following example which will print out to the Transcript methods excluded from the trait usage of ReadStream because two or more of the traits used by ReadStream had the same method selector:

Eventually, you should be able to use Essence# as a bridge between Smalltalk programs that don’t run on .Net and applications and libraries and services that are native to .Net. That’s not currently possible, because Essence# doesn’t yet support network connectivity of any sort. But the plan is to port Craig Latta’s Flow framework to Essence# to solve that problem.

The idea is that, once Essence# has a fully-functional bidirectional object transport capability (such as the one used by Spoon,) it shouldn’t be at all difficult to establish a “stored procedure” architecture by which Smalltalk applications that don’t run on .Net could send code to Essence# for execution in a .Net environment–and vice versa, of course.

Sending code is a more powerful architecture than just sending remote procedure calls. In the latter case, you’re limited by the API provided by the server. In the latter case, you can dynamically create your own API as the system runs. Even better, you can send code to be executed in an environment with better locality of reference to the data it needs, and better locality of reference to the functions that need to be applied to that data. It’s the same concept, and same architecture, that Gemstone has used so successfully for may years.

There are three issues that you will encounter when using Essence# that involve CLR types:

Obtaining a CLR type as a value that can be stored in a variable, passed as a parameter or sent messages;

Converting a value from one CLR type to another; and

Creating instances of CLR generic types.

Obtaining A CLR Type As An Essence# Value

The Essence# class System.Type (in the Essence# namespace CLR.System, which corresponds to the .Net namespace System) has many class messages that answer commonly-used CLR types: For example, the expression Type string evaluates to the .Net object that represents the .Net type System.String and the expression Type timeSpan evaluates to the .Net object that represents the .Net type System.TimeSpan. [Note: Those messages are unique to Essence#; the actual .Net class System.Type doesn’t support them. You can add Essence#-specific instance methods or class methods to any .Net class or struct; they just won’t be available outside of Essence#.]

Of course, if you have an instance of the type, you can just send it the message #getType (which is a message to which all .Net objects will respond.) Another way to get a CLR type is to send the message #instanceType to the Essence# class that represents CLR values having that type.

If the CLR type can’t be obtained using one of the techniques explained above, the fallback is to encode the assembly-qualified name of the type in a string, and then send the string the message #asHostSystemType, as shown in the following example:

In most cases (but not all,) the #asHostSystemType message will fail if the fully-qualified name of the assembly is not provided. Two exceptions to that would be when the type is in the mscorlib assembly or in the EssenceSharp assembly.

CLR Type Conversion

Formally–in other words, in theory–Essence# is dynamically typed: The “type” of an expression is simply the powerset of all messages that can be sent to the expression without causing a method-binding error. And that is also true in practice, in that the compiler permits any message to be sent to any expression, and in the fact that the compiler permits any expression to be assigned to any variable or passed as any argument in any message, and in the fact that typing errors are only discovered and reported at run time.

But that does not mean that any expression can be passed as any argument in any message to any receiver without causing any run-time type errors, even in cases where the only messages that will be sent to the value of an argument won’t result in a method binding error. That constraint does hold in most other Smalltalk implementations, and it does hold when the method that will be invoked by the message is an Essence# method. But it does not hold when the “method” that will be invoked is a CLR method.

In Essence#, a CLR method is just a “user primitive”–although the “primitive” does not always need to be formally defined as a method in the Essence# source code, because the dynamic binding subsystem automatically binds Essence# messages to CLR methods that require less than 2 arguments, to CLR type constructors that require no arguments, and to any non-private properties or fields of a CLR type. And it is generally the case in all Smalltalk implementations that “primitive methods” will fail if the types of their arguments are not what they require–even if the primitive could have performed its intended operation simply by sending (using Smalltalk dynamic message dispatch) the right messages to the argument. Primitives generally don’t send messages to their arguments using Smalltalk dynamic message dispatch. And the methods of CLR types certainly do not.

Fortunately, the Essence# dynamic binding system will–if it can–automatically do the required type conversions for you, such as number type conversions, String/Symbol conversions, and any conversions defined by implicit or explicit conversion operators defined by a CLR type. It will even convert Essence# blocks into CLR functions with the required parameter type signature.

But the Essence# dynamic binding system isn’t magic. It can’t handle all cases. It’s simply not possible to convert any type into any other type. And even when it is, the run time system may not have the information required to do it correctly. And in other cases, the logic to do the required conversion simply hasn’t been implemented, for one reason or another.

CLR Array Types

One common case where a conversion might be possible, but the dynamic binding system doesn’t attempt to do it, involves arrays. From the point of view of the CLR, an Essence# array is an array whose elements have the type System.Object (an instance of the Essence# class Array,) or the type System.Byte (a ByteArray,) or the type System.UInt16 (a HalfWordArray,) or the type System.UInt32 (a WordArray,) or the type System.UInt64 (a LongWordArray,) or the type System.Single (a FloatArray,) or the type System.Double (a DoubleArray,) or the type System.Decimal (a QuadArray) or the type String (a Pathname.)

Unfortunately, there are many methods of CLR types that require an array having elements of some type other than the ones supported by Essence#. If the CLR method parameter requires an array whose element type is not the same as that of the corresponding argument at run time, then the binding to the method will fail.

Fortunately, if the conversion of the array to one with the required type is possible (because all the elements can be converted to the required CLR type,) you can code that conversion yourself in Essence#. The following example shows how (this is not new functionality; it just hasn’t been documented/explained):

CLR Generic Types

The CLR has three types of “generic type”: Open generic types, partially-open generic types and closed generic types. An open generic type is also called a generic type definition. An open generic type is one where all of its generic type parameters remain “open” because none of them have been bound to a specific type argument. A partially-open generic type is one where some, but not all, of the type’s generic type parameters have been bound to specific type arguments. A closed generic type is one where all of the type’s generic type parameters have been bound to specific type arguments. For the most part, a partially-open generic type is essentially the same as an open generic type. The distinction that really matters is between closed generic types and ones that have generic type parameters that aren’t bound to a specific type.

It’s possible to create instances of closed generic types, but it is not possible to create instances of open or partially-open generic types. And that can be a problem, because .Net class libraries typically only provide generic type definitions that define open generic types. So, although you can define an Essence# class that represents a .Net type that is an open generic type definition, you won’t be able to use that Essence# class to create instances of the type by sending it the normal instance creation messages (e.g, #new or #new:). Creating instances of such a type requires providing one or more types that will be used as the type arguments to construct a closed generic type from the open generic type defined by the .Net class library.

Fortunately, you can use Essence# to construct a closed generic type from an open generic type definition, as illustrated in the following example:

One good way to handle the case where the .Net type you want to use is an open generic type definition would be to define a subclass of an Essence# class that represents that type, and then use one of the messages above to construct a closed generic type that will be the instance type of the subclass.

Another good way to do it is simply to use the CLR’s syntax for closed generic types when specifying the instance type of the subclass. The following examples show both approaches: