Partcl - a tiny command language

Minimalist interpreters have always attracted me. First of all, they are a good
fit for embedded systems whenever you need a custom domain-specific language.

If one needs a scripting language for a microcontroller - there is not so many
choices. Instead there are lots of constraints, like the size of the compiled
binary code, RAM usage and memory management. Of course the language itself is
important, too.

Lua, a very stable, pragmatic and minimal language. It is easy to learn
and to extend. But the compiled code size would be around 200KB, so it barely
fits even on the most powerful ST ARMs. Also, Lua uses garbage collector, which
might not be good for realtime systems. Still, Lua is a perfect choice if you
can afford it. Apart from the original interpreter there’s also a good
implementation for microcontrollers called eLua.

JavaScript. I still can’t take it seriously and it feels more like a modern IoT
hype rather than a serious alternative to Lua. There are many implementations
(Espruino, duktape, v7, TinyJS,
MuJS, JerryScript), but most of them are not stable
enough and the minimal binary size easily grows over 200KB, RAM usage is high
and garbage collection is slow. But might be good for hobbyists, though.

Forth. Well, this is totally the opposite. Very minimal and simple,
one can create his own implementation over the weekends. Memory usage is close
to zero and code size is a couple of kilobytes. The performance should rock,
too, if you implement it properly. But the language itself quickly becomes
unreadable, as well as its concatenative nature is far from what people expect
to see in traditional programming languages. Absolutely worth trying, but
unlikely to be practical. I’m afraid the same applies for various Lisps.

Another once-so-popular scripting language is Tcl, and let’s have a closer look
at it.

Tcl

An extendable language, born out of frustration with having to build custom
command languages again and again, is in fact very simple.

A script is a sequence of commands, like “turn led on”, here “turn” is the
command name, “led” and “on” are just words passed as parameters to the
command.

There is no data types other than a string. Square brackets inside the
strings are interpolated, e.g. the expression inside the brackets is evaluated
and its result is put into the original string. For example, in puts [add 1
2] the square brackets part gets substituted by 3 and then puts 3 is
executed.

This approach is definitely slow, but it is what makes Tcl interpreters so
simple. All the language does is string substitution. The only place where no
substitution happens is braces. That’s why the following is just a 5-word
command, and not a special language syntax:

if {$x > 0} {
puts "Positive"
} else {
puts "Negative"
}

The command is if, the second word is $x > 0 (which is substituted later by
the command internally), the third word is {puts Positive}, then else, then
{puts Negative}. If substitutes the second word and evaluates the third or
the fifth one depending on the result. Loops, procedures etc - everything is
just a command operating with strings.

Existing interpreters

The most powerful one is tclsh, and it probably comes with your OS. It’s too
big for microcontrollers, but it has a large standard library of commands and
has lots of documentation.

There is also lil, the Little Interpreted Language. In fact, I got a
chance to use it on a MIPS controller once, it was a pleasant experience, the
source code is easy to understand. I met the performance bottleneck very
quickly, but I’ve rewritten some critical procedures as C commands, so it’s not
a big problem. Too bad the author have removed Git repos and his site looks
broken now.

Finally, there’s picol made by Antirez (the guy who created Redis and
JimTcl). It’s worth reading - only 600 lines of amazingly simple code.

Lexer should tell when the end of the command is met so that we could read
user input byte by byte and execute command only when it’s fully read.

All parts of the interpreter should be covered with tests.

It took Antirez three hours to finish his Picol, it took me 3 days to finish my
project. Apparently, I’m not that smart. Still, I’m very satisfied with the
results - I ran my TCL interpreter on a STM32F051 microcontroller and firmware
size was only 10KB!

Although partcl is a toy language, you might want to give it a try in your
next project, so let me explain how it works.

Lexer

Any symbol can be a part of the partcl word, except for the following special
symbols:

tcl_is_special behaves differently depending on the quoting mode (q
parameter) because semicolon, braces and new line lose their special meaning
and become regular printable characters inside the quoted strings.

tcl_next function finds the next token in the string s. from and to are
set to point to the token start/end, q denotes the quoting mode and is
changed if double quote is met. It’s very efficient because it doesn’t allocate
any memory and it doesn’t mutate the original string (making it possible to
store scripts in ROM).

A special macro tcl_each(s, len, skip_error) can be used to iterate over all
the tokens in the string. If skip_error is true - then unexpected EOF is not
considered to be an error. This allows to validate the input string without
evaluating it and detect when a full command has been read.

A good use case is reading commands from the serial port byte by byte. You can
accumulate data in a buffer without executing it until the lexer reports the
end of command. Then you can execute the command and reset the buffer.

Lexer allocates no memory, it even doesn’t use any of the libc functions. It
also has 100% test coverage.

Data types

Picol uses char * as the only data type. LIL uses a special structure that
keeps numeric values separately from string values and yet another data type
for lists. The first approach is very simple, the second one is more efficient,
but takes precious memory space.

Partcl has a special tcl_value_t type and a number of functions to work with
it. By default it’s just a char pointer, but one can rewrite ~100 lines to
optimize it for their needs (e.g. use a pool of strings, or handle lists
separately). In other words, we have an abstract type and the implementation
may vary.

The default implementation uses malloc/free and keeps lists as raw strings with
the items quoted. Despite the simplicity, it may break if you put some badly
escaped strings into a list. So probably avoid using unpaired braces in
strings if you use the default list implementation. Or rewrite it using proper
linked lists.

Environments

Tcl interpreter uses a stack of environments where variables are stored.
Environment is handled by 3 functions and it can be customized if needed (e.g.
use hash maps to speed up variable lookup):

Environments have links from child to parent, making it possible to implement
upeval and other commands.

Commands

The default set of commands includes “subst”, “set”, “while”, “if”, “proc”,
“return”, “break” and “continue”. There is also “puts” which can be disabled if
your target system has no stdout. Also there are some mathematical operations
taken from Picol that can be optionally disabled if you’re going to use Partcl
as a command shell and not as a programming language.

It takes an interpreter object, a list of arguments and an optional pointer to
some context. Some commands have fixed arity, then the interpreter controls it.
For zero arity the command must control it internally, which makes it possible
to use variadic arguments.

What’s next?

It was fun to write a real Tcl interpreter. It was even more fun to actually
use it on a real MCU. Good test coverage and no memory leaks (according to
valgrind) make it a good candidate for low-end scripting.

The whole interpreter is just a single file with no other dependencies than
libc (for <strings.h> and malloc/free).

I’m not sure what to do next with it, but here are some ideas:

Run some benchmarks to see how fast the lexer and the interpterer are.

Make an alternative implementation for tcl_value_t that uses real lists and
caches numbers.

Make a library of commands for lists, hash maps, strings, for loops etc.