Functions

Functions in Cell have no side effects and are referentially transparent, but can be written in either a functional or a procedural style. We'll start with the functional style. Here are a few self-explanatory examples:

The signature of the function comes first, followed by the = sign, then an expression that forms the body of the function and a semicolon that terminates the declaration. The type of the arguments and the return type are mandatory. To compare two values for equality just use the == operator, and to check if they're different use !=. The if/then/else is a conditional expression, and can have any number of branches (note the use of the elif keyword):

< -1..1> sign(Int x) = if x > 0then1elif x < 0then -1else0;

Functions with no arguments in a referentially transparent language are of course just constants. They are declared and referenced without using parentheses:

undefined can be used in any context where an ordinary expression can go: but if it's ever evaluated, it throws an exception. Once an exception is thrown, there's no way to "catch" it inside functions, (only automata can recover from that) so if your program consists only of functional code the effect of evaluating undefined is to simply terminate the program.

Line comments begin with either // or ##. The two are equivalent, but, by convention, the former should be used for explanations, the latter for comments that require some sort of action, like a todo item, or a bug warning:

// This is a comment that explains something about a piece of code// that works fine and does not require any action## TODO: Handle this case properly## BUG: This code will fail if xs contains duplicates

Currently there's no support for multiline comments.

When symbols or tagged values are used inside expressions they have to be written with a leading : (save for true and false of course), so as to distinguish them from variables or constants in the case of symbols and function calls in the case of tagged values. That can be annoying especially when you're pasting inside your source code large chuncks of data that comes from somewhere else and which is usually written without the :, so in that case you can make use of a literal block #{...}. Inside it you don't have to write the leading :, and you cannot access variables, constants, functions or use any other computational feature of the language. It's just data. As an example, the following two definitions of the constant a_large_value are equivalent:

We've already seen the syntax for creating sequences. Their elements can of course be arbitrary expressions. The presence of some elements can also be made conditional. In the following example, the first element is included only if n is greater than 0:

Int* nearby(Nat n) = (n - 1if n > 0, n, n + 1);

The following expressions all evaluate to true:

nearby(0) == (0, 1)
nearby(1) == (0, 1, 2)
nearby(2) == (1, 2, 3)

The conditional inclusion notation applies to all collection types, in all of their syntactic forms: sequences, sets, relations, maps and records. A few examples:

Native collection values operations

Collection values have a number of built-in operators, in addition to the user-defined operators we'll see later. To get the size of a collection value, of any kind, just enclose the expression between pipes:

|any_coll|

If used on a sequence, the above expression will return its length; on a set, the number of (unique) elements it contains; and on a relation the number of unique entries.

and lookups (but note that this syntax is temporary, it will change in a future version of the language):

// If "a_bin_rel" contains one and only one entry// whose left element is equal to arg0, returns the// corresponding right element. Fails otherwise.
a_bin_rel(arg0, !!)
// If "a_bin_rel" contains one and only one entry// whose right element is equal to arg1, returns// the corresponding left element. Fails otherwise
a_bin_rel(!!, arg1)

A future version of the compiler will provide a more compact notation for the first of the above two operations:

a_map_or_bin_rel(key) // Same as a_map_or_bin_rel(key, !!)

and this notation is already available for mutable relation variables defined inside relational automata (we'll discuss them in the following chapters).

Records support all binary relation and map operations, plus access by field and field membership test:

Note that the above example (and many of those that follow) is just that, an example, and not the recommended way to write that function. The following implementation is logically equivalent, but a lot more efficient (when used properly, it runs in amortized O(1)):

Int* insert_right(Int* seq, Int elt) = (seq | elt);

(Of course, there's no point in defining such a function, since it's just easier to use the (xs | x) notation directly). We'll say more about functional concatenation in the next chapter, when we talk about imperative programming.

Set/relation/map comprehension

Comprehension can be applied to sets and relations as well. It's similar but a bit more complex. You can have multiple source expressions, multiple filters and more. Let's start by creating functions that merge two sets of values:

[T] set_union([T] set1, [T] set2) = [x : x <- set1 | x <- set2];

The source expression x <- set1 | x <- set2 iterates through all the elements of both set1 and set2. The iteraction is done in an unspecified order (since sets and relations are unordered collections), and all duplicates in the result are of course eliminated. Any single-letter uppercase symbol like the T that appears in the signature of set_union() is a type variables that can represent any type. Type variables are used for generic programming, and they are needed to preserve type information: if the above function is used, for instance, to merge two sets of integers, the typechecker will be able to figure out that the result, too, will be a set of integers. We'll discuss them in detail in another chapter. Merging relations is very similar:

Maps have a specific form of comprehension, that works in exactly the same way as binary relation comprehension, but will refuse to produce values that are not maps: if the resulting relation has duplicate keys, the computation will just fail. The only syntactic difference is that the comma between the two expression before the : is replaced with a ->. Here's how you define the map version of union:

The function iterates through all the elements in the first set and, for each of them, it iterates through the elements of the second set, and it ends up producing all the combinations.

Set/relation comprehension expressions can iterate through sequences as well. The following function turns a sequence into a set:

[T] set(T* s) = [x : x <~ s];

When the source is a sequence, rather than a set or a relations, you need to use the <~ arrow instead of <- and the same index variable and tuple destructuring functionalities that we saw for sequence comprehension are available. An example:

Just like with sequences we can have filter clauses, but here we can have more than one and they can be intermixed with generators. The following function produces the cartesian product of two sets of integers, but it filters out all negative integers from either set:

You can also calculate a new value and assign it to a variable inside the loop. The following expression iterates through all the elements of xs, stores the value of f(x) in y, skips the iteraction when p(y) is false, and finally inserts the value of g(x, y) in the output set.

[g(x, y) : x <- xs, y = f(x), p(y)]

There's one more type of clause that you can use in set/relation comprehension expression, but we'll defer its discussion until we've examined pattern matching.

You can also iterate through a projection of a relation, that is, a subset of it obtained by filtering it based on the value of some of the arguments. The following pairs of expressions are all equivalent, but the ones that are not commented out are a lot faster, because they don't need to do a linear scan of the entire relation:

Existential checks

If you need to check whether a set contains an element that satisfies a given predicate, you can do it like this:

// Returns true if and only if there's an// element x of xs such that p(x) is true
(x <- xs : p(x))

The above expressions iterates through the elements of xs, and for each of them it evaluates the boolean expression on the right of the column (p(x) in this case). If the expression evaluates to true, the loop is terminated and the overall expression evaluates to true. If no element in the set satisfies the predicate on the right, the overall expression evaluates to false.

The clause on the left can contains anything that can appear on the right of the column in a set/relation comprehension expression. A few examples:

// Evaluates to true if and only if the binary relation// r contains a pair x, y such that p(x, y) is true
(x, y <- r : p(x, y))
// Evaluates to true if and only if there's an// element x of xs and an element y of ys// such that p(x) and q(x, y) are both true
(x <- xs, p(x), y <- ys : q(x, y))
// Evaluates to true if and only if the sequence xs contains// an element x at index i such that p(x, i) is true
(x @ i <~ xs : p(x, i))

Note that the iteration order is specified only when the iterating through a sequence (it's the obvious one, from the first element to the last), while it's implementation-defined when iterating through the elements of an unordered collection (that is, a set or a relation), and this may introduce some nondeterminism in the language if the predicate on the right can fail and throw an exception, so even though the evaluation of this expression terminates as soon as the boolean expression on the right evaluates to true, you can rely on this behaviour only when dealing with sequences.

Operators

This is the list of all cell operators, in order of decreasing precedence:

and or

not

== != ::

< > <= >=

+ & - (binary)

* /

- (unary)

^

[] ()

.

Note that there are two versions of the - operator, the unary and binary ones.

All binary operators associate from left to right, except for those that don't associate at all: ^, ==, != and ::.

The arithmetic and comparison operators +-*/<><=>= have the obvious meanings, and are defined for any combination of integers and floating point numbers. The binary - is also used to denote set difference. The exponentiation operator ^ is too defined for any combination of integers and floating point numbers, but it always returns a floating point number.

The :: operator is used to test if a value belongs to a type, and returns a boolean value:

The operator & is defined as concatenation for sequences and strings, union for sets and merge for maps. In the case of maps it fails if the two maps have common keys (unless those duplicate keys map to the same values).

The following operators can be overloaded: +, -, *, /, ^, <, >, <=, >=, &, []. In order to overload them, they have to be treated like normal functions with the following names:

Operator

Function name

-

(-_)

unary

+

(_+_)

-

(_-_)

binary

*

(_*_)

/

(_/_)

^

(_^_)

<

(_<_)

>

(_>_)

<=

(_<=_)

>=

(_>=_)

&

(_&_)

[]

(_[_])

The function names are formed by the operator itself, with underscores where the operands should be, all enclosed in parentheses. As an example, let's define the operators +, * and unary - for boolean, as synonyms for or, and and not respectively:

Closures

Cell has support for closures, although it is very limited compared to what truly functional languages like Haskell (or even some object-oriented languages) can offer. The only thing you can do with closures at the moment is pass them as parameters to a function. This is the definition of one staple of every functional programming language, the map() function (note that it's just syntactic sugar for sequence comprehension, so it's not particularly useful in and of itself. It's just an example):

B* map(A* s, (A -> B) f) = (f(x) : x <- s);

The second argument to the map(..) function is a closure that takes an argument of a generic type A and returns a value of another generic type B. That closure is applied to each element of s in turn. The type (A -> B) is a closure type. For a closure with multiple arguments, the general form of its type is:

(A -> R)

(A1 A2 -> R)

(A1 A2 A3 -> R)

(A1 A2 A3 ... -> R)

where A, A1, A2, ... An are the types of the argument(s), and R is the type of the result. The only place where closure types can appear is in the argument list of a function. A function cannot return a closure, and type definitions cannot make use of closure types, since regular data and closures cannot be mixed. Also, a closure cannot take other closures in turn as parameters: all of its arguments have to be regular values.

Here are a few examples of how you can call a function that takes a closure argument:

The are two ways to provide a closure argument in a function call. The first option, if you don't need to capture any local variable, is to just pass the name of an existing function or closure. That's what square_all(..) and map2(..) do. The other option is to simply write the expression that constitutes the body of the closure, like in increment_all(..) and multiply_all(..). Inside that expression you can use local variables (which will be captured by the closure) and you can refer to the argument of the closure with the symbol $, or $a, $b, $c... if the closure has more than one argument.

Pattern matching

When writing a function over a union type, you may want to provide a different implementation for each of the cases in the union. There are two ways to do that: one is to write a set of polymorphic functions, the other is to use pattern matching. We'll postpone a discussion of the former until later, and describe the latter here. One of the simplest examples of a type union is the Maybe type. Let's say we want to write a function that applies a closure to the value contained inside, or do nothing is there's no value at all:

The match expression takes a value and compares it sequentially to any number of patterns. In the above expression, both nothing (the one on the left) and just(x?) are patterns. During evaluation the value being inspected (in this case m) is matched against each pattern in the order in which they appear, until a match is found: when that happens the corresponding expression on the right is evaluated, and its value becomes the value of the whole match expression. If no match is found, the execution fails, just like with undefined . Note that more than one pattern may be able to match a given value, but the search stops at the first one: the other ones are ignored.

Every pattern matches a (possibly infinite) set of values, and may bind new variables. The pattern nothing, for example, matches one and only one value, the symbol nothing. This is true in general: every symbol is also a pattern, that matches itself. The second pattern, just(x?) on the other hand matches any value tagged with the symbol just. It matches, for example, the values just(0), just("Hello world!"), just(day: 27, month: 4, year: 2017) or just(point(x: 2.5, y: 0.3)). The x? is a pattern variable: if the match succeeds, the value of x is the value of m without the tag. In the previous examples, x would end up having values 0, "Hello world", (day: 27, month: 4, year: 2017) and point(x: 2.5, y: 0.3) respectively.

A match expression can match any number of values, not just one: the following function, for example, applies a two-argument function to the content of two Maybe values if neither is nothing and returns the tagged result, or nothing otherwise:

When match expressions are the topmost expression in a function definition, and when the value that is being matched is the first argument of that function (or the first arguments, if the matching involves more than one value) you can omit the match (..) part of the expression. The two functions above can be rewritten as follow:

and tagged values with any tag. The following two functions accept as input any tagged value, and return the untagged value or the tag itself, respectively:

T untag(<+>(T)) =
t?(v?) = v;
Symbol tag(<+>(Any)) =
t?(v?) = t;

Union types often include related but different sets of values, each of which is tagged with a different symbol, and in this case pattern matching may be used not to break up a value, but simply to provide a different implementation for each type in the union:

You can also break up a value according to a pattern, and still bind the entire value to a variable, like the following function does: it binds both elements of ps to a variable, p1 and p2 respectively, and then further breaks up each of them in turn, finally binding the values x1, y1, x2, y2.

The above function iterates through all pairs in the set, and for each of them it pattern matches it in order to bind the two variables x and y, and finally produces a result using those variables. If the match fails, then that particular iteraction of the loop is cut short. The following radii(..) function, for example, takes a set of Shape values, and for each circle among them it returns its radius, skipping squares and rectangles in the process:

The combination of all the above patterns allow us to write universal functions, that is, function that can work on any value, without any prior knowledge about its structure. The following, for example, is a universal function for computing 32-bit hash codes (it's a pretty lame hash function, but never mind that, it's just an example):

Builtin functions

In the above example, isort(..) is a family of polymorphic functions defined in the standard library that take as argument either a set or a relation and return a sequence containing its entries sorted in an implementation-defined order. Functions whose names start and end with an underscore (like _print_(..) and _bits_(..) above) are builtin functions. Builtin functions provide functionalities that are either impossible to implement directly in Cell, or that cannot be implemented efficiently, or that are just convenient to have as builtins for whatever reason. Some of them have aliases in the standard library. The & operator for sequences, for example, is defined in terms of _cat_(..) builtin function (doing so will enable an O(1) implementation of sequence concatenation that uses ropes instead of arrays as the underlying physical data structure). Here's the most useful ones among those that don't have aliases:

// Returns a string containing the textual// representation of any valueString _print_(Any)
// Given the textual representation of a value returns either the// parsed value or the position of the error if parsing failsResult[Any, Nat] _parse_(String)
// Converts an integer number into a floating point oneFloat _float_(Int);
// Given a 64-bit floating point number, returns the// same bit pattern reinterpreted as a 64-bit integerInt _bits_(Float)

Polymorphic functions

Functions in Cell can be polymorphic, that is, you can declare multiple functions with the same name and arity, as long as they differ in the types of their arguments. For instance, you can split up the function area defined earlier in three different ones:

The three specialized area(..) functions just defined are completely equivalent to the previously defined single function that used a match statement. For polymorphic unary functions (i.e. function that only take one argument) to be compatible, the obvious requirement is that the types of their single argument are disjoint, so for every possible value the compiler knows which one to dispatch at runtime. In practice though, the current version of the compiler is more restrictive than that, in that it requires the types of the argument to be not just disjoint, but also "different enough". Consider for example the following polymorphic functions:

These three definitions of sign(..) seem reasonable enough. Their arguments are certainly disjoint: the first one only apply to positive integers, the second one to just 0 and the third one to negative integers. But the current version of the compiler won't accept two polymorphic functions if the types of their argument both contain integers, even if they contain disjoint subsets of the set of all integers. If this limitation only applied to integers, it wouldn't be much of a problem, but unfortunately it applies to other data types as well: sequences, sets, relations of the same arity (including maps and records, which are just binary relations) and values tagged with the same tag. The following polymorphic functions, for example, are rejected by the current version of the compiler, as the types of their arguments both contain sequence values, even though they are disjoint, as the first only accepts non-empty sequences of integers, and the second only non-empty sequences of floating point numbers:

By the way, a sum(..) function that works on both sequences of integer and floating point numbers (and any other type for which a + operator has been defined) can still be written using generic programming and protocols (we'll talk about that in a later chapter), but it would have to be restricted to non-empty sequences, as you would bump into a different problem if you tried to extend it so that it works on empty sequences as well. More on that in a minute.

Square2, Rectangle2 and Circle2, are almost identical to Square, Rectangle and Circle, the only difference being that they are not tagged. Though clearly disjoint, they are all record types, which are just a special case of binary relations, and the compiler will reject polymorphic functions if more than one of them accepts any binary relation value as argument. This is the reason user-defined types have to be tagged if polymorphic behaviour is expected, so as to make them "different enough" to be accepted by the compiler.

For functions that take more than one argument, it's sufficient that at least one of the arguments is "different enough", even if that argument alone is not enough to decide which function will be dispatched at runtime. Consider the following functions:

When covers(..) is invoked with arguments of type Shape (defined before, it's the union of Square, Rectangle and Circle) the value of both arguments has to be inspected in order to decide which of the nine polymorphic functions has to be dispatched. But for the purpose of polymorphic compatibility all that is required is that, for every possible pair of covers(..) functions, the types of either argument are "different enough", in the above-defined sense.

These stricter-than-necessary restrictions on the signatures of polymorphic functions are in place for two reason: first, they guarantee that the proper function can be dispatched quickly at runtime and second, they make the implementation easier. Future versions of the compiler will probably gradually relax them to some extent.

When writing polymorphic functions that operate over collection types, remember that in Cell, the same [] value is used to represent empty relations of any arity, and that includes the empty set, since sets are just relations of arity one. A consequence of that is that the compiler has no choice but to reject code like the this, because the types of both arguments overlap:

The & operator is defined as union between sets, and merge between maps. But in an expression like [] & [] which of the two functions should be used, given that [] is both the empty set and the empty binary relation? In order to avoid ambiguities, the definition of the operator & has to be split further. Here's how it is defined in the standard library:

All the possible combinations of empty set/map, non-empty set and non-empty map have to be dealt with individually, in order to get rid of any overlap among the types of the arguments.

There's another distinct but vaguely similar problem that presents itself with both the empty sequence () and the empty set/relation [], which was briefly mentioned when discussing the polymorphic sum(..) functions. In most typed languages, the empty sequence of, say, integers is different from the empty sequence of floating point numbers, or strings. But in Cell there's just a single empty sequence (and a single empty set/relation), and that's a natural consequence of the fact that values in Cell don't have a type (in the usual sense), so it would make no sense to talk of an empty sequence of integers, since an empty sequence, by definition, does not contain anything. So in cases like that of the sum(..) functions, which return different types (Int and Float respectively), that poses a problem: while such a function can be defined over non-empty sequences using protocols, if we were to extend it to the empty sequence (), what should the expressions sum(()) return? Summing over an empty sequence of integers should return the integer zero 0, while summing over an empty sequence of floating point numbers should return the floating point zero 0.0, which is a different entity in Cell's data model. But since there's only one empty sequence, we can only have one return value, and neither 0 nor 0.0 is acceptable in both cases. So here it's probably better to just give up polymorphism, and rename one of the functions. Solving this class of issues would require a non-trivial upgrade of the type system, but there are much more pressing issues at the moment, so that's not going to happen anytime soon.