Build your own scripting language for Java

An introduction to JSR 223

Before Java Specification Request (JSR) 223, Scripting for the Java Platform, (and its predecessor, the Bean Scripting Framework, or BSF), many languages were already communicating with Java. Some languages would take textual code as input from a Java program and return the evaluation result back. Others would keep references to objects in a Java program, invoke methods on those objects, or create new instances of a Java class. Because each language would communicate with Java in its own way, developers would have to learn the script engine's proprietary programming interface every time they wanted to use a script engine in their Java programs.

To solve this problem, JSR 223 defines a contract that all script engines conforming to the specification must honor. The contract consists of a set of Java interfaces and classes, as well as a mechanism for packaging and deploying a script engine. When you work with script engines conforming to JSR 223, you'll always program to the same set of interfaces defined by the specification. The details specific to the script engine are well encapsulated, and you'll never need to concern yourself with them.

JSR 223 helps not only consumers, but also producers of script engines. If you have designed and implemented a programming language, you can reach out to a broader audience and make your software friendlier to use by wrapping it with a layer that implements JSR 223 interfaces.

Before we look at the JSR 223 interfaces and this article's implementations of them, I'd like to point out that though the name of the JSR and the title of this article both contain the word scripting, that's not to say there needs to be limitations on the languages that can be integrated with Java the JSR 223 way. You can take any language you fancy and wrap it with a layer that conforms to the contract laid out in JSR 223. The language can be object-oriented, functional, or in any other programming paradigm. It can be strongly typed, weakly typed, or not typed at all. In fact, before writing this article, I had implemented a JSR 223 wrapper for Scheme, a weakly typed, functional programming language, and put it up on SourceForge. For this article, however, we look at a much simpler language so we can stay focused on the topic of JSR 223 without the details of a complex language overwhelming us.

Don't worry whether you have prior experience constructing a programming language of your own. This article is not about programming languages, but about JSR 223's contract between programming languages and Java.

BoolScript engine

Figure 1 shows all the parties in our example and how they relate to each other. This article's example defines a simple language that I affectionately call BoolScript. I refer to the program that compiles and executes BoolScript code as the BoolScript engine. Besides compiling and executing BoolScript code, to qualify itself as a JSR 223 script engine, the BoolScript engine also implements the contract defined in the specification. As depicted in the figure, all the BoolScript engine's code is packaged into a single jar file called boolscript.jar.

Throughout this article, when I say JSR 223, I mean the specification itself. I refer to a realization of the specification as a JSR 223 framework. The JSR 223 framework used in this article is the one included in Java Standard Edition 6.0 beta (Java SE is Sun's new name for J2SE). Our example also consists of a Java program that uses the BoolScript engine. The program hosts the BoolScript engine, and its code is in BoolScriptHostApp.java. In Figure 1, notice that a host Java program always interacts with a script engine indirectly via a JSR 223 framework.

To run the example, all you need is Java SE 6.0 beta and this article's binaries. The exact version of Java SE 6.0 I used for developing the example is build 77. You can download it from java.net. The Java SE 6.0 beta available at Sun Developer Network should also work.

The article's code example, which can be downloaded from Resources comes in several files:

BoolScriptEngine-Source.zip contains the source code of the BoolScript engine

BoolScriptHostExample-Source.zip contains the source code of the host Java program

BoolScriptHostExample.zip contains the binary of the BoolScript engine and the host Java program

To run the example, unzip BoolScriptHostExample.zip to a folder of your choice and run the host Java program (BoolScriptHostApp.class). The zip file contains three jar files. You need to include those three jar files in the Java classpath when running the host Java program. You can find an exemplary command line for this in run.bat, also included in BoolScriptHostExample.zip. After running the example, you will see an output like this:

BoolScript language

Before we delve into the details of JSR 223, let's quickly go over the BoolScript language. BoolScript is so simple that all you can do with it is evaluate Boolean expressions. Here's what code written in BoolScript looks like:

(True | False) & True
(True & x) | y

As you can see, BoolScript supports two operators: & (logic AND) and | (logic OR). Besides operators, it supports three operands: True, False, and variables whose values might be either True or False. That's it for BoolScript.

Script engine discovery mechanism

To see what a JSR 223 framework does in between a host Java program and a script engine, let's assume you want to use a script engine in your Java program. First, you'll need to create an instance of the script engine. Second, you'll need to pass textual code to the engine and have the engine evaluate it. Alternatively, you might want the engine to compile the code and save the compiled code for later execution. Let's walk through these steps, bearing in mind that whatever we do, we can only use the script engine through the JSR 223 framework.

To create an instance of a script engine, you first create an instance of javax.script.ScriptEngineManager and then use it to query the existence of a script engine. You can query the existence of a script engine by its name, its mime types, or file extensions. If we store BoolScript code in *.bool files, then the file extension in our case would be bool. The code below queries the existence of the BoolScript engine by file extension:

But where do we specify our script engine's name, mime types, and file extensions? We specify them in BoolScriptEngineFactory. The class implements the methods getExtensions(), getMimeTypes(), and getNames() of the javax.script.ScriptEngineFactory interface. And it is in those methods that we declare the name, mime types, and file extensions of the BoolScript engine. The code for the getExtensions() method in BoolScriptEngineFactory looks like this:

You might wonder why bother using ScriptEngineManager to create an instance of BoolScriptEngine, when we can create it ourselves like this:

ScriptEngine bsEngine = new BoolScriptEngine();

Well, you can certainly do that. In fact, I did that a few times for the purpose of quick testing when I developed the example code. Creating a script engine directly might be okay for testing a script engine, but for a real usage scenario, it violates the principle that a client Java program should always interact with a script engine indirectly via a JSR 223 framework. It defeats JSR 223's purpose of information hiding. JSR 223 achieves information hiding by using the Factory Method design pattern to decouple script engine creation from a host Java program. Another problem with directly instantiating a script engine's instance is that it bypasses any initializations that ScriptEngineManager might perform on a newly created script engine instance. Are there initializations like that? Read on.

Given the string bool, how does ScriptEngineManager find BoolScriptEngine and create an instance of it? The answer to the question is something called the script engine discovery mechanism in JSR 223. It's the mechanism by which ScriptEngineManager finds BoolScriptEngine. In my subsequent discussion on this mechanism, you will see what initializations ScriptEngineManager will do to a script engine and why.

According to the script engine discovery mechanism, a script engine provider needs to package all the classes that implement a script engine plus one extra file in a jar file. The extra file must have the name javax.script.ScriptEngineFactory. The jar file must have the folder META-INF/services, and the file javax.script.ScriptEngineFactory must reside in that folder. If you look at boolscript.jar's contents, you will see this file and folder structure.

The content of the file META-INF/services/javax.script.ScriptEngineFactory must contain the full names of the classes that implement ScriptEngineFactory in the script engine. In our example, we have only one such class, and the file META-INF/services/javax.script.ScriptEngineFactory looks like this:

net.sf.model4lang.boolscript.engine.BoolScriptEngineFactory

After a script engine provider packages his or her script engine in a jar file and releases it, users of the script engine install the script engine by putting the jar file in the Java classpath. Figure 2 shows the events that take place when a host Java program asks the JSR 223 framework to discover a script engine.

Figure 2. How a host Java program discovers a script engine

When asked to find a particular script engine by name, mime types, or file extensions, a ScriptEngineManager will go over the list of ScriptEngineFactory classes (i.e., classes that implement the ScriptEngineFactory interface) that it finds in the classpath. If it finds a match, it will create an instance of the engine factory and use the engine factory to create an instance of the script engine. A script engine factory creates a script engine in its getScriptEngine() method. It is the script engine provider's responsibility to implement the method. If you look at BoolScriptEngineFactory, you'll see that our implementation for getScriptEngine() looks like this:

The method is very simple. It just creates an instance of our script engine and returns it to ScriptEngineManager (or whoever the caller is). What's interesting is after ScriptEngineManager receives the script engine instance, and before it returns the engine instance back to the client Java program, it initializes the engine instance by calling the engine's setBindings() method. This brings us to one of the core concepts of JSR 223: Java bindings. After I explain the concepts and constructs of bindings, scope, and context, you will know what the setBindings() call does to a script engine.

Bindings, scope, and context

Recall that the BoolScript language allows you to write code like this:

(True & x) | y

But it doesn't have any language construct for you to assign values to the variables x and y. I could have designed the language to accept code like this:

x = True
y = False
(True & x) | y

But I purposely left out the assignment operator = and required that BoolScript code must execute in a context where the values of the variables are defined. This means that when a host Java program passes textual code to the BoolScript engine for evaluation, it also needs to pass a context to the script engine or at least tell the script engine which context to use.

You can think of a context as a bag that contains data you want to pass back and forth between a host Java program and a script engine. The construct that JSR 223 defines to model the context is the interface javax.script.ScriptContext. A bag would be messy if we put a lot of things in it without some type of organization. So to be neat and tidy, a script context (i.e., an instance of ScriptContext) partitions data it holds into scopes. The construct that JSR 223 defines to model the concept of scope is the interface javax.script.Bindings. Figure 3 illustrates context, its scopes, and data stored therein.

A script engine manager (i.e., an instance of ScriptEngineManager) can be used to create multiple script engines.

A script engine manager contains a scope called global scope, but it does not contain a context.

Each scope is basically just a collection of name-value pairs. Figure 3 shows that one of the scopes contains a slot whose name is x and a slot whose name is y. And remember that a scope is an instance of javas.script.Bindings.

The context in a script engine contains a global scope, an engine scope, and zero or more other scopes.

A script engine can be used to evaluate multiple scripts (i.e., separated code snippets written in the script language).

But what are the global scope and engine scope in Figure 3? A global scope is a scope shared by multiple script engines. If you want some piece of data to be accessible across multiple script engines, a global scope is the place to put the data. Note that a global scope is not global to all script engines. It's only global to the script engines created by the script engine manager in which the global scope resides.

An engine scope is a scope shared by multiple scripts. If you want some piece of data to be accessible across multiple scripts, an engine scope is the place to put the data. For example, say we have two scripts like this:

(True & x) | y //Script A

(True & x) //Script B

If we want to share the same value for x across the two scripts, we can put that value in the engine scope held by the script engine that we will use to evaluate the two scripts. And suppose we want to keep the value of y only to Script A. To do that, we can create a scope, remembering that this scope is visible only to Script A, and put the value of y in it.

As an example, the main method in BoolScriptHostApp.java has the following code for evaluating (x & y):

The code puts the values of both x and y in the engine scope. Then it calls the eval() method on the engine to evaluate the BoolScript code. If you look at the ScriptEngine interface, you'll see that the eval() method is overloaded with different parameters. If we call eval() with a string just as we did in the code snippet above, the script engine will evaluate the code in its context. If we don't want to evaluate the code in the script engine's context, then we have to supply the context we'd like to use when we call eval().

Our implementation of the eval() method delegates the job of evaluating BoolScript code all the way down the method invocation chain until the following method in BoolTermEvaluator is called:

This method evaluates BoolScript code by evaluating terms that are True, False, or variables. When it sees that a term is a variable as shown in the code excerpt above, it gets a reference to the engine scope by calling getBindings() on the context that's passed to it as a parameter. Because more than one scope might be in a context, we indicate that we want to get the engine scope by passing the constant ScriptContex.ENGINE_SCOPE to getBindings(). After we get the engine scope, we look up the variable's value by the variable's name in the engine scope. If we cannot find a value for the variable, we throw an exception. Otherwise, we have successfully evaluated the variable and we return the value back.

Finally, I am ready to explain why a script engine manager initializes a script engine by calling the engine's setBindings() method: When a script engine manager calls an engine's setBindings() method, it passes its global scope as a parameter to the method. The engine's implementation of the setBinding() method is expected to store the global scope in the engine's script context.

Before we leave this section, let's look at a few classes in the scripting API. I said that a ScriptEngineManager contains an instance of Bindings that represents a global scope. If you look at the javax.script.ScriptEngineManager class, you'll see that there is a getBindings() method for getting the bindings and a setBindings() method for setting the bindings in a ScriptEngineManager.

Similarly, a ScriptEngine contains an instance of ScriptContext. If you look at the javax.script.ScriptEngine interface, you'll see a method getContext() and a method setContext() for getting and setting the script context in a ScriptEngine.

So nothing prevents you from sharing a global scope among several script engine managers. To do that, you just need to call getBindings() on one script engine manager to get its global scope and then call setBindings() with that global scope on other script engine managers.

If you look at our example script engine class BoolScriptEngine, you won't see it keeping a reference to an instance of ScriptContext explicitly. That is because BoolScriptEngine inherits from AbstractScriptEngine and AbstractScriptEngine already has an instance of ScriptContext as its member. If you ever need to implement a script engine from scratch without inheriting from a class such as AbstractScriptEngine, you will need to keep an instance of ScriptContext in your script engine and implement the getContext() and setContext() methods accordingly.

Compilable and Invocable

By now, we have implemented the minimum for our BoolScript engine to qualify as a JSR 223 script engine. Every time a Java client program wants to use our script engine, it passes in the BoolScript code as a string. Internally, the script engine has a parser that parses the string into a tree of objects commonly called an abstract syntax tree. And then it passes the tree to the BoolTermEvaluator.evaluate() method we saw earlier. This whole process of evaluating BoolScript code is called interpretation, as opposed to compilation. And in this role, the BoolScript engine is called an interpreter, as opposed to a compiler. To be a compiler, the BoolScript engine needs to transform the textual BoolScript code into an intermediate form so that it won't have to parse the code into an abstract syntax tree every time it wants to evaluate it. This section shows how this functionality is achieved.

Java programs are compiled into an intermediate form called Java bytecode and stored in .class files. At runtime, .class files are loaded by classloaders, and the JVM executes the bytecode. Instead of defining our own intermediate form and implementing our own virtual machine, we'll simply stand on the shoulder of Java by compiling BoolScript code into Java bytecode.

The construct JSR 223 defines to model the concept of compilation is javax.script.Compilable, which is the interface BoolScriptEngine needs to implement. The following code in BoolScriptHostApp.java shows how to use a compilable script engine to compile and execute script code:

In the code above, bsEngine is an instance of ScriptEngine that we know also implements the Compilable interface. We cast it to an instance of Compilable and call its compile() method to compile the code x & y. Internally, the compile() method transforms x & y into the following Java code:

The transformation converts BoolScript code into a Java method inside a Java class. The class name and method name are hard coded. Each variable in BoolScript code becomes a parameter in the Java method.

Transforming BoolScript code to Java code is just half the story. The other half is about compiling the generated Java code into bytecode. I chose to compile the generated Java code in-memory using JSR 199, the Java Compiler API, another new feature in Java SE 6.0. Details of the Java Compiler API reach beyond this article's scope. See Resources for more information.

The Compilable interface dictates that the compile() method must return an instance of CompiledScript. The class CompiledScript is the construct JSR 223 defines to model the result of a compilation. No matter how we compile our script code, after all is said and done, we need to package the compilation result as an instance of CompiledScript. In the example code, we defined a class BoolCompiledScript and derived it from CompiledScript to store the compiled BoolScript code.

Once the script code is compiled, the client Java program can repeatedly execute the compiled code by calling the eval() method on the CompiledScript instance that represents the compilation result. In our case, as shown in the code excerpt from BoolScriptHostApp.java listed above, when we call the eval() method on the CompiledScript instance, we need to pass in a script context that contains the values for variables x and y.

The eval() method of CompiledScript is not the only way to execute compiled script code. If the script engine implements the Invocable interface, we can call the invoke() method of the Invocable interface to execute compiled script code too. In our simple example, there might not seem to be any difference between using CompiledScript and Invocable for script code execution. However, practically, users of a script engine will use CompiledScript to execute a whole script file and Invocable to execute individual functions (methods, in Java terms) in a script. And if we look at Invocable's invoke() method, distinguishing this difference between CompiledScript and Invocable is not difficult. Unlike CompiledScript's eval() method, which takes an optional script context as a parameter, the invoke() method takes as a parameter the name of the particular function you'd like to invoke in the compiled script.

In the code excerpt from BoolScriptHostApp.java above, bsEngine is an instance of ScriptEngine that we know also implements the Invocable interface. We cast it to an instance of Invocable and call its invoke() method. Invoking a compiled script function is much like invoking a Java method using Java reflection. You must tell the invoke() method the name of the function you want to invoke, and you also need to supply the invoke() method with the parameters required by the function. We know that in our generated Java code, the method name is hard coded as eval. So we pass the string eval as the first parameter to invoke(). We also know that eval() takes two Boolean values as its input parameters. So we pass two Boolean values to invoke() as well.

Conclusion

In this article, I've covered several major areas of JSR 223, such as the script engine discovery mechanism, Java bindings, Compilable, and Invocable. One part of JSR 223 not mentioned in this article is Web scripting. If we implement Web scripting in the BoolScript engine, then clients of our script engine will be able to use it to generate Web contents in a servlet container.

Developing a language compiler or interpreter is a huge undertaking, let alone integrating it with Java. Depending on the complexity of the language you want to design, developing a compiler or interpreter can remain a daunting task. Thanks to JSR 223, the integration between your language and Java has never been easier.

Chaur
Wu is a software developer and published author. He has
coauthored books on design patterns and software modeling. He is
the project administrator of Model4Lang, an open
source project dedicated to a model-based approach to language
design and construction.