Progress Report #5

State Restored!

That about describes everything about Copper right now, for the most part. The Copper interpreter was originally a state machine and the new version isn’t any different in that regard. But the “state restored” in this case primarily refers to the fact that everything I had in the original version of the interpreter has been restored in the new version including all the built-in functions. That means I’m back to where I was about 5 weeks ago, but this time, I have what I believe to be a MUCH faster interpreter, though I have yet to benchmark it. (Ok, so the ability to run Copper functions outside of Copper has not been finished and I’ll talk about that.)

The new version of the engine uses “opcodes” – more precisely, a switch statement and an enum for picking out operations. It’s more complex than assembly, of course, but that’s the benefit of using an higher-level language (even if only slightly higher). To give you an idea, here’s what the opcodes are:

Exit

FuncBuild_start

FuncBuild_createRegularParam

FuncBuild_assignToVar

FuncBuild_pointerAssignToVar

FuncBuild_execBody

FuncBuild_end

FuncFound_access

FuncFound_assignment

FuncFound_pointerAssignment

FuncFound_call

FuncFound_setParam

FuncFound_finishCall

Terminal

Goto

ConditionalGoto

Own

Is_owner

Is_pointer

CreateBoolTrue

CreateBoolFalse

CreateNumber

CreateString

Some of them are quite apparent by their names, but some of them work in tandem with each other and need some explanation. The ones working together are named as such, but there are a few more. They usually involve a starting point, which sets up a kind of context (or “Task”) in which associated data is stored until all the related operations can be completed. Among these are function building (a semi-complicated task, as you can see), function calling, if-structure handling, and loop handling.

Function calling only requires three opcodes:

FuncFound_call – sets up the task

FuncFound_setParam – (optional) sets one of the unset parameters

FuncFound_finishCall – calls the function, passing it the set parameters

If-structures and Loop-structures both use the Goto opcode. If structures also use the Conditional Goto. Being this way, I can think of how Copper could be turned into a compiled language. It would take a bit of work, but maybe one day I’ll use LLVM and do just that for the fun of it.

Assignment is a single step, containing the address of the variable receiving the assignment. Rather than parsing the variable address every cycle, I now create a “variable address” from parsing and use it right away. It’s not the fastest thing in the world because it still forces the interpreter to instantiate all the variables along the path if they don’t exist, but that’s part of the joy of using Copper.

Loops and More

As I was working on loops, I started to consider a couple of things. First, the keyword “skip” should probably be replaced by “cycle” or something. But this got be thinking about the second thing: What if I had a block of code that could be cycled but by default did not? Copper loops are already somewhat inconvenient in that you can’t specify some bit of code that runs every time the loop cycles (like an iterator) (… and the design decision behind that was in part because you don’t have iterators). You are expected to put the “cycle code” (code that should run every cycle) at the beginning. In any case, it seems as though it wouldn’t be much more tedious to add a structure that allowed you to cycle or break at any point, like a loop, but without the repetitious nature of a loop.

Why? One of the issues I often run across in C++ is having nested loops. Some of these are easy to break with a boolean, but they can be rather inconvenient. Triple loops are even worse. And many times, these often don’t need to loop, so I end up using a “goto” or some such odd structure. Copper has no “goto” command (and while it would be fairly easy to add one, it would probably mess up other things if mis-used, much like the C++ goto). If, instead, I had some control structure nested within a control structure with dissimilar breaking mechanisms, I could break from both of them without a mess. Heres an example:

This example is a bit contrived for the purposes of demonstrating how this would work. The “block” would also have the keyword “break”, used for escaping from the loop. The word “escape” might be better, but I don’t want the keywords to be too long. It makes typing them more annoying.

Quirks

The original interpreter was able to pause state and not care about when the user input arrived, just so long as it came. The current parser is significantly pickier at the moment. For example, these would all process correctly in the first interpreter but only the last one does what you’d expect in the second interpreter:

a = [
p
q
r
]
{ print
( p: " " q: " " r: "\n" ) }
a( 1 2 3 )

a = [ p q r ] { print( p: " " q: " " r: "\n" ) }
a(1 2 3)

What is happening in the new interpreter is that the pause in parsing (created by the newline) terminates the function-building parsing, resulting in an object being assigned to “a”, and a second function build is started, which is subsequently lost in space.

The reason for the termination of the function parsing is so that, if the input stream happens to end, at least an opcode is generated. Otherwise, we would be left at a strange pause: Both [] and []{} are perfectly valid syntax in Copper, so which one do you wait for? A user simply creating an object might expect a termination when the object closes.

The quirk only happens outside of function bodies. Inside of function bodies, the syntax is parsed as you would expect, preserving the nature of Copper.

The quirk is tolerable, though admittedly, it isn’t backwards-compatible (assuming I remember the older interpreter correctly).

To-Do List

Alas, there is still more to do.

First, debugging is not done. The overall functionality may be finished, and I’ve done quite a bit of debugging along the way to make sure things are working, but I’m a suspicious chap, and I’m quite sure there are a number of bugs lurking in the shadows. The last thing I want… er, make that the thing I don’t want… is to have an exploit. One of my goals is to have Copper so safe that you can run code from any random source on the internet and not be at risk of it hacking your computer. To have that assurance, the core of the engine needs a thorough overview. I have some coding standards and guides at my disposal, and I need to go through those. There is a ton of debug code, some of which I probably don’t need, but most of it will remain, messy as it may look.

Admittedly, the entire interpreter has not been rebuilt: I don’t have a number generator interface, though I don’t consider that very important. It’s worth adding again for the sake of convenience, but it’s not even remotely an essential feature.

The new FFI is fun, and now I need to use it to re-create all the integer, float, and double functions I was using, as well as the time functions so I can perform benchmarking. I also need wrappers to make using it even easier, but this time, I may make those wrappers part of the engine itself and accessible via the engine API rather than “exposed” components.

The ability to run Copper functions outside of Copper – used for function callbacks – was easier to implement in the first interpreter. This new interpreter needs a good entry point, and while I think I can do it, I’m slightly nervous at the prospect of someone trying to link it to some asynchronous thread and having it trash the state. It won’t crash, but it may give unexpected results depending on the input sources.

I’m looking forward to benchmarking this version. Part of my excitement is due to seeing fewer function calls in my debugging output. Being skeptical, I suspect there are definitely areas the interpreter will perform poorly in.

Among the to-dos is writing more documentation, including the API. There are 92 functions in the Engine, but I only need to write about 12 of them. Tee hee.

All that said… ALL that said… I’m very close to releasing the Copper interpreter.