The Roo Language Reference

This reference document describes the syntax and core semantics of the Roo programming language. The semantics of the built-in data types and of the built-in functions and modules are described in The Roo Standard Library. For information on embedding Roo in a Xojo application see Embedding Roo.

Introduction

This reference document describes the Roo programming language. It is not a tutorial. Whilst some implementation details are provided, I have deliberately not gone into too much detail about them as other implementations of the language (should any appear) may choose to implement a feature differently. For the purposes of this document, you can assume that I'm using the reference implementation of the interpreter which is written in Xojo.

Scanning

The first step in running a Roo program is lexical analysis or scanning. This is performed by a scanner. The purpose of the scanner is to split the source code into a stream of tokens. Tokens not only contain information about the type of token encountered but also many other pieces of metadata such as the line number that the token occurred on, the script file it is from, etc. Roo expects source code to be UTF-8 encoded. Odd things will happen if you feed it text of a different encoding.

The scanner handles newline characters by standardising them at the beginning of the scanning process to the Line Feed character (unicode U+000A). This ensures that programs written on different operating systems can run on any platform that has a Roo interpreter.

Line Structure

A Roo program is divided into one or more logical lines.

Physical lines

A physical line is a visibly distinct line of source code as visible to the human eye. They are never more than one line long.

Logical lines

A logical line can span multiple physical lines. Typically, the end of a logical line is represented by the newline character which, as described above, is considered to be U+000A. However, if the scanner reaches a newline character but has previously encountered unmatched (, [ or { tokens then it ignores the newline character and continues as if it is still on the same line of source code. This permits things like multiline declarations of Hash objects:

var h = {
"name" => "Garry",
"age" => 37
}

Text literals may also span more than one physical line. The scanner will continue looking for characters within a text literal if it has encountered unmatched ' or " tokens. For example:

Blank Lines

Comments

A comment starts with a hash character (#) that is not part of a text literal and ends at the end of the physical line. Comments are ignored by the scanner, they are not tokens.

Indentation

Leading tabs at the beginning of a logical line are used to determine the indentation level of the line which, in turn, is used to determine the grouping of statements and their scope. Tabs and spaces are not interchangeable. Indentation is determined purely by the number of contiguous tabs from the start of a line. In fact, if any whitespace other than a tab is encountered at the start of a line, a scanning error is raised. This decision was made because it seems nuts to mix spaces and tabs as although they look physically similar to the eye they confuse the heck out of a scanner.

The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens using a simple stack-based system. Before the first line of input is read, a single zero is pushed onto the stack; this will never be popped off. The numbers pushed onto the stack will always increase from top to bottom. At the beginning of each logical line, the line's indentation level (as determined by the number of leading tab characters) is compared to the number on the top of the stack. If the numbers are equal then nothing happens. If the current line's indentation level is greater than the number on the top of the stack, this number is pushed onto the stack and one INDENT token is generated. If the current line's indentation level is smaller than the number on the top of the stack it must match one of the numbers already on the stack. All numbers on the stack that are smaller than the current line's indentation level are popped off the stack and for each number popped off, a DEDENT token is generated. At the end of the source code, a DEDENT token is generated for each number remaining on the stack that is greater than zero.

Whitespace Between Tokens

Except for the beginning of logical lines, whitespace (tabs and spaces) can be used interchangeably to separate tokens. Whitespace is only needed between tokens if their concatenation coould otherwise be interpreted as a different token. For instance, a=b and a = b are both the same. foobar is one token but foo bar is two tokens.

Identifiers and Keywords

Identifiers (aka object names) begin with either an upper or lower case letter (a-z or A-Z, ASCII range) or an underscore (_). They then may have zero or more upper or lower case letters, underscores or digits (0-9). Optionally they may be suffixed with either ? or ! (but not both). Identifiers are unlimited in length and are case sensitive. Only ASCII characters are permitted for identifiers. Emoji are not supported (although work fine in text literals).

Literals

Literals are notations for constant values for some of the built-in types.

Text Literals

Text literals are a stream of characters enclosed by matching pairs of either single (') or double (") quotes. Smart quotes (e.g: “ or ”) cannot be used to enclose a text literal but can be included within one. A double quote can be included within a text literal that is declared with enclosing double quotes by escaping it with \". Similarly, a single quote can be included within a text literal that is declared with enclosing single quotes by escaping it with \'. Examples:

Numeric Literals

There are two types of numeric literals: integers and doubles (also known as floating point numbers). Note that numeric literals do not include a sign; a phrase like -5 is actually an expression composed of the unary operator - and the literal 5.

Integer Literals

Integer literals can be of four bases: decimal, hexadecimal, octal and binary. Decimal base integers are simply written as you would expect. Numbers like 10, 3 and 10456 are all decimal integers.

Hexadecimal numbers are prefixed with 0x. This prefix should immediately be followed by any number and combination of contiguous characters in the range a-f, A-F and 0-9. Case is ignored. For example, the decimal number 1234 can be expressed in hexadecimal as 0x4d2.

Octal numbers are prefixed with 0o. This prefix should be immediately followed by any number and combination of the digits 0-7. For example, the decimal number 1234 can be expressed in octal as 0o2322.

Binary numbers are prefixed with 0b. This prefix should immediately be followed by any number of 0 or 1 digits. For example, the decimal number 1234 can be expressed in binary as 0b10011010010.

Double literals

Below are a few examples of valid double literals:

100.5
0.01
3e4 # 30000
20.5e-2 # 0.205

Note that fractional numbers less than zero must be prefixed with a zero. For example, 0.5 is a valid number, .5 is not.

Operators

The following tokens are considered to be operators by the scanner:

+ - * / %
<< >> & | ^
< > <= >= ==
<>

Delimiters

The following tokens are delimiters in the grammar:

( ) [ ] {
} , . : ;
= => += -= *=
/= %= ?

The following ASCII characters have special meaning to the scanner:

' " # \

Programs With Multiple Source Files

Writing a program in a single file is fine for small tasks but more complicated programs are better when split across multiple files. They are easier to read and maintain and code can be reused more easliy.

To include one file within another use the require keyword. It accepts a single argument without parentheses (a Text literal). The Roo interpreter is smart enough to only require a file once and so subsequent calls to require the same file will be ignored.

The opposite of the equality operator is the "not equal" operator (<>).

100 <> 50 # True

Comparison Operators

You can determine if a Number object is greater than another with the > operator. The < operator determines if one Number object is less than another. DateTime objects can also be compared with the > and < operators. Attempting to compare other types of objects will result in a runtime error.

Variables

Variables must be declared before they are used. This is done with the var keyword. A variable may be assigned a value when declared (so called initialisation). If a value is not assigned at declaration time then it defaults to Nothing.

var age # Nothing
var name = "Spider-Man"

Scope

A scope is a region where a name maps to a certain entity. Multiple scopes enable the same name to refer to different things in different contexts. Variables declared at the top level of the script are global in scope. That is, they are visible to the whole program. Variables declared in functions, classes and modules are scoped to the containing function, class or module. They are encapsulated within their enclosing scope.

Thanks to encapsulation, it is possible to shadow a variable in an enclosing scope with a newly declared variable in an inner scope:

Simple Statements

A simple statement is contained within a single logical line. Several simple statements may occur on a single line separated by semicolons. Many of the simple statements are described elsewhere in this document where it seems more appropriate (e.g. break in the Control Flow section).

The pass Statement

pass is a null operation. When it's executed nothing happens. It's useful as a placeholder when a statement is required syntactically but no code needs to be executed:

# Use `pass` to stub out functions.
def empty1(): pass
def empty2():
pass
# `pass` is also useful for stubbing out non-implemented classes.
class Person: pass # A class with no methods (yet).

The return Statement

The return statement may only occur within the body of a function. It is used to return a value from a function. When encountered, execution of the function ends immediately and the value specified by the return statement is returned to the caller. By default, functions return Nothing unless commanded to return something with the return keyword.

def echo(what):
return what
var a = echo("Hello") # a is now "Hello"
def silence(what): pass
var b = silence("Hello") # b is now Nothing as no return value specified.

Functions

Functions allow you to encapsulate and reuse code. Defining a function is done with the def keyword.

Control Flow

Before we can talk about control flow, we need to understand what truthiness is. A truthy value is a value that is considered true for a logical operation. A falsey value is considered false for a logical operation. The only falsey values in Roo are Nothing and False. Everything else is truthy.

The exit Statement

The while Loop

A while loop executes its body whilst its condition is truthy.

while some_condition:
# Do something.

The condition is first tested and if truthy the body is executed. Note that it's therefore possible that the body never executes. If you need to execute the body at least once and then check for a breaking condition, you can do something like this:

while True:
# Do something.
break if some_condition

The for Loop

for loops are a powerful looping mechanism. The syntax of a for loop is:

As you can see, there are three for loop expressions, separated by semicolons and enclosed in parentheses. All of the expressions are optional but the semicolons are not. The for loop declaration is terminated by a colon (:).

The first expression is evaluated once, at the beginning of the loop. Typically this is used to set the value of an existing variable or to declare a new variable and assign a value to it. Often this is the counter for the loop.

The second expression is evaluated at the beginning of each iteration of the loop. If the expression is truthy then the body of the loop is executed. If this expression is falsey then the loop exits.

Finally, a third expression can be provided which is evaluated at the end of each iteration of the loop.

# Print the first 10 numbers.
for (var i = 1; i <= 10; i += 1):
print(i)
# As above but notice that we use an existing variable as our counter.
var j = 1
for (; j <= 10; j += 1):
print(j)

An infinite loop can be created by omitting the test (i.e: the second) expression. We can also omit the initial expression and the iteration expression if we like. Note how the semicolons are still required after an absent first and/or second expression:

for (;;):
print("Wakanda forever") # Bad idea. Prints this forever.

Of course the above infinite loop could just as easily be created with a more readable while loop:

while True:
print("Wakanda forever")

As with while loops, we can exit them at any point with the break statement.

for (var i = 0; ; i += 1):
print("Wakanda forever")
break if i == 10

if Constructs

The if construct evaluates its branch of code only if its condition is truthy. An if clause's condition may or may not be enclosed in parentheses (do whatever you feel makes your code more readable).

var name = "Garry"
if name.length == 5:
print("5 character name")

If the block of code that makes up an if branch is a single statement or expression then the if construct can be reduced to one line of code. Given the above example:

var name = "Garry"
if name.length == 5: print("5 character name")

If an if clause's condition is falsey then it will evaluate its else branch (optional):

if False:
print("This is never printed")
else:
print("This is shown")

Use the or keyword to add additional conditions that will be evaluated (sequentially) for their truthiness if the if condition is falsey. The first or condition that is truthy will have its branch evaluated:

The Object System

In Roo, everything (except for functions) are objects. The following points apply to all objects:

They have a type

The respond to methods

Methods are functions that belong to an object. The Roo object system provides a mechanism for determining an object's type at runtime and whether or not it responds to a particular method. The only way to query the internal state of an object is through that instance's methods:

Classes

A class is a blueprint from which individual objects or instances are created. The below example shows how to create an empty class blueprint for a Person class:

class Person: pass # The `pass` statement is required if the body of a class is empty.

It's convention that class (and module) names start with an uppercase letter although this is not enforced by the interpreter.

A new instance of a class is created by calling the class name like a function:

var peter = Person()

At the moment, the Person class is not very useful so let's add some functionality to it. A person should have a name and an age. These values will be stored internally in properties on the Person instance called name and age. Since every person needs these attributes, we want to make sure we specify them at the time of creation of the Person instance. We do this with a special init() method. The init() method is optional.

As we saw above, properties can also be added to an instance within a class definition, such as the name and age properties. Note that when in a class definition and you want to refer to the particular instance of that class, use the self keyword.

Getters

Getters are methods without parameters. They allow a class to return a computed value. They are invoked with their name without trailing parentheses. Let's add to the Person class the ability to print out how many days old they are:

Static Methods and Getters

Static methods and getters are associated with a particular class rather than a specific instance. To make a method or getter within a class definition static, simply prefix the its declaration with the static keyword:

The default text representation of "<Person instance>" is a little unhelpful. Fortunately, Roo allows us to provide a getter in the class definition that Roo will call whenever it needs a text representation of an instance of your custom class. Just define a getter named to_text:

Inheritance

A class can inherit from another class. The class that is inheriting is called the subclass and the class that is inherited from is the superclass. When a class inherits from another it gains all of the methods, getters and properties already defined in the superclass. Roo supports single inheritance. Inheritance is specified in the class declaration with the < operator:

Modules

Modules serve as a namespace for defining other classes, modules, methods and getters. They are an excellent way to reuse code and make creating libraries to share with other Roo users easy. Below is an example of a module as a namespace: