Stack machine, part 3 - Burst into flames

Halt and catch fire

Granting the name to a highly enjoyable TV series, the fictive halt and catch fire instruction is too geeky not to include in our small, but growing instruction set. Use it anytime you want to make your virtual machine completely unresponsive.

defp _execute([:hcf, _code], _stack) do
:timer.sleep :infinity
end

The downside with this approach? It's too efficent. Your computer will not be set ablaze. You won't even hear your cooling fans revving.

We can add instructions all day long by implementing them in Elixir, but if we could make it possible to call a subroutine from itself we can create an endless loop that will at least consume some CPU cycles. It won't be enough to combust our computer, but at least it will closer to it.

Going back

In our current version, when calling a subroutine we will traverse the bytecode until we find a matching label. The bytecode beyond that label will be executed, typically until a return instruction is encountered at which point the subroutine will finish and bytecode execution will resume from the original call site.

This poses a number of problems, chief amongst them is that we can only invoke subroutines that are ahead of us in the bytecode. Thus the following bytecode cannot be executed.

Looks nice, but how do we get hold of the original bytecode without passing it through in every function call, thus changing every instruction we've implemented? Well, we could use an agent to hold it for us and wrap the fetching of the original bytecode in an original_code function. This requires expanding the public execute function to start the agent in addition to adding the function for fetching the original code.

Congratulations! You now have an infinite loop in your iex session. Hit ctrl-c to terminate it. Later in the series we'll look at conditional branching, but for now we'll resort to more basic ways to end our test runs.

Now, while we have solved the main problem of not being able to invoke a subroutine again, we haven't addressed the ineffeciency of traversing the bytecode whenever we want to find the subroutine being invoked.

Extracting subroutines on load

What if we not only stored the original bytecode in an agent, but also extracted the various subroutines? That way we need not go through the bytecode list more than once.

The ineffeciency mitigated, we have another problem to deal with and it's caused by having a single agent running. If we were to attempt executing something like the following, the problem will come out in the open.

Great! Now we can't even call the execute function twice without blowing up. By allowing the initialisation of the virtual machine to be entangled with the repeated use of it, we have failed to make the execute function re-entrant. We could rewrite our execute function to set up a supervision tree handling both the code/function-holding agent and the process actually executing the bytecode. Or we could skip registering the agent's process name and insted pass its PID (process ID) along to all functions needing it. Or we could...

Notice that sinking feeling? It's the one telling you that we would've been better off if we had test-driven our design. In fact, we'll get to that in the next episode, but first let's briefly consider the ramifications of transforming the bytecode on load.

Transforming the bytecode on load

As we clearly saw in our last few attempts, blindly typing code is equally good for exploring our options as it's to getting us into more and more trouble. So let's stop digging for now and just think a little about the next option.

Imagine that we had the code in a data structure that more acted like an array in other programming languages. Indexing any element would be a matter of O(1) rather than O(n). Thus, we could escape having to traverse our bytecode when looking for a subroutine by advancing the program counter to the index of the instruction to execute. When loading the bytecode we would remove the :label instructions and replace their use with the index of the instruction to jump to.

So what data structure can we use for this in Elixir? How about a tuple? It does have the O(1) property we desire and we will only need pay the performance cost of creating the tuple once, when loading the bytecode. Well, unless we start to dabble with self-modifying code which I'm sure we'll get to eventually.

This involves a bit more complexity than our subroutine-extracting solution as we need to juggle some state and indices (conceptually pointers, but it does remove the need for keeping a separate process around to figure out what code to execute when invoking a subroutine. In essence, it turns our indirect branching scheme into a direct branching one.

This does look like a reasonable path to explore. Just as the BEAM, the Erlang virtual machine, loads and transforms its bytecode on load, so can we. However, we'll not do that without getting some tests in place first.