Optimisation

Contents

WARNING: SPOILERS! While no particular solution to any particular puzzle is given here, no guarantee is made that examples aren't very relevant to certain puzzles. You may find it is more fun to discover these strategies for yourself. You are recommended to try optimising as best you can without this guide, and then - once you are stuck for ways to find improvements - come back and see what other tricks you can try.

For each challenge, split it up into tasks, and handle each task individually. Some tasks may require others to be completed first.

Some consideration should be given to how tasks are split up. For example: a task like:

"Find out which values of A between 10 and 29 can have some operation performed on them" could be split into "Can 10 have this operation performed on it?"; "Can 11 have this operation performed on it?" ... and so on.

Execution paths: if you have a task that can be done, but no EXA working on it, you could try adding more EXAs.

Communications: time spent sitting on the M channel to read/write is time spent not doing anything. This can be true even when your EXAs are well synchronised, since a send down M always takes at least two cycles.

Blocking conditions generally: if you find that any of these come up, see if you can figure a way to avoid waiting for stuff to clear. Sometimes this involves carefully synchronising your EXAs to dance a very precise dance. When this is done correctly, the low cycle count is is the second best feeling in the world (right behind seeing the dance you choreographed go correctly).

Excessive bookkeeping: mostly this affects loops and branches.

Too many or unnecessary instructions. Fewer instructions (normally) take less time to execute.

Many of the tricks to reduce size in the Space Optimisation section are useful here. Reducing instruction count as well as saving cycles (e.g. ADDI F 10 X over COPY F X ; ADD X 10 X) will also free up space for when you want to trade space for cycles with other optimisations.

It's always worth examining your COPY commands, to see if the copied data has any work done it. You may be able to combine some of the work with the copy.

Re-ordering instructions

It is worth examining instruction orders, particularly around M read/writes, and REPL instructions.

For example, the earlier an EXA starts work, the earlier it finishes, so if there are instructions before a REPL that could be done after the REPL, it might be worth moving the REPL forward. (Of course, if the new EXA then shows up too early and messes with orderings, or blocks another EXA that's not so good.)

When doing an M read or write, you want to do this as close to the other EXA doing the write or read. If you find an EXA is waiting for a cycle or two before reading M, consider moving the read so it is an instruction or two later. If there are useful things to do in this time even better. It might also be worth looking to see if the write can be done earlier.

This is 3 cycles and 4 instructions. It can be done with two cycles, however the EXA only stores values between -9999 and 9999. Trying to make it larger than 9999 will clamp it at 9999 (and making it smaller than -9999 will clamp at the other end). So you can limit X to 100 by going ADDI X (9999-100) X ; SUBI X (9999-100) X.

If you're passing this value to another EXA, you can even do the clamp for free:

You can also combine this with using SWIZ to multiply, divide, or modulo by 1000, 100 or 10, and so get both in a single instruction.

SWIZ can also be used to pack data, if values are between 0 and 99 it can be used to retrieve two values in one register.

There are lots of tricks with SWIZ.

Loop unrolling

When to use it: when you have loops - particularly small loops - and lots of spare space.

Consider this code, which writes the numbers 0 to 9 into a file:

COPY 10 T
MARK WRITELOOP
SUBI T 1 T
SUBI 9 T F
TJMP WRITELOOP

This takes 31 cycles to run, and spends about 2/3rds of the time doing bookkeeping instead of writing to the file.

We can do this in 10 cycles (the theoretical minimum - the file cannot receive values faster than that), and (as a bonus) this doesn't even clobber T:

@REP 10
COPY @{0,1} F
@END

The main disadvantage here is that the first option is 5 instructions, and the second is 9. In this case it's not such a big difference, but if you were writing 0 to 99 to a file this would probably use too much space (100 instructions). In that case, though, you can always adopt a hybrid solution:

This is 15 instructions altogether, but runs in 141 cycles, as opposed to the 301 it would take using a simple loop (and compared to the 100 cycles of the theoretical best case).

Loop unrolling can also be used when you don't know the full size of the loop, by testing if the loop size is greater than a certain threshold and then using an unrolled loop, and then having a standard loop for the last few items.

Parallelisation

Doing lots of things at once is often better than doing one thing at a time. The main way to parallelise in these puzzles is to form a pipeline. Messengers are good for this as you can have one EXA at every step of transferring the message. It means the message takes longer to get there, but you no longer stall reading from M, so the actual number of digits processed per second increases. If you had a loop where you did this: COPY #INPT M, you might stall waiting for M to be read. However, if instead you did a COPY #INPT X ; REPL MESSAGE, and have MESSAGE transfer the data directly, this would be less of a problem. Of course, it's two cycles to do this, but the other parallelisation strategy is "going wide", where two EXAs do the same thing in sync with each other. This means you could have one EXA reading from #INPT while the other REPLs, and then they switch. (This doesn't work if #INPT is a file.)

When things need to be done in a particular order, parallelisation can be tricky since you need to introduce synchronisation, but they are difficult fun problems to solve. While M channels can be used to synchronise EXAs, it's more fun to make it work without them.

One trick that can work is to save yourself a TEST and TJMP/FJMP by assuming either case could be true, and REPLing an EXA to follow the FALSE path, while following the TRUE path locally. This works if the wrong EXA will crash without causing any damage.