You might want to consider making a just-in-time compiler that translates P1 Spin byte codes into P2 assembly instructions.

An Interpreter keeps the bytecode small. By using a jit compiler, the bytecodes will get expanded and therefore restrict the code size.
Am I following your suggestion properly???

I'm assuming that a JIT compiler does not need to keep the translated code around all the time. It just translates bits of code and then executes them and keeps the translation in a cache that can be purged if memory gets low.

You might want to consider making a just-in-time compiler that translates P1 Spin byte codes into P2 assembly instructions.

For bytecodes I suspect it'll be hard to beat the performance of XBYTE, which would be the way to go if you really want a high performance bytecode interpreter. For programs that will fit in the JIT's cache the JIT will win, but as soon as you start getting cache misses I think XBYTE will win, since it's effectively using hardware to do the cache fills and instruction decode.

For P2 there's another interesting option. Since the bytecode interpreter is included with the application (rather than in ROM) it can be optimized for the particular program it's running. It'd be kind of interesting to see domain specific bytecode interpreters for different classes of applications. I don't know if p1spin or Cluso's prospective spin interpreter would be good starting points for this, but perhaps they might? You'd have a kind of "base" interpreter with the instructions everyone uses, and then replace some specific bytecode instructions with ones to speed up the application it's being linked with.

@Dave Hein I seem to recall that you modified a Spin interpreter to run LMM PASM in a similar way, by replacing one of the lesser used Spin opcodes?

Do you mind then if I use your code as a base as you've already unrolled the code better than I was able to do on the P1?

Feel free to use anything you want from p1spin. It has an MIT license.

@Eric, there is one unused bytecode. I think it is 3F, but I may be wrong. I wrote a Spin object that patched the Spin interpreter in cog memory, and inserted a small LMM interpreter. When the previously unused bytecode was encountered the interpreter would execute LMM code. The LMM code would jump back to the Spin interpreter to continue running Spin bytecodes. The object is in the OBEX, and I believe I called it SpinLMM.

Look at the file p1spin.spin2 in the zip file. Search for the label "loop". That's the main loop for the interpreter. It reads a byte, which is used as an index into the jump table. Each bytecode is implemented by a small piece of PASM code. The PASM code then either jumps back to loop or to pushx1 depending on whether it saves the result on the stack.

You might want to consider making a just-in-time compiler that translates P1 Spin byte codes into P2 assembly instructions.

For bytecodes I suspect it'll be hard to beat the performance of XBYTE, which would be the way to go if you really want a high performance bytecode interpreter. For programs that will fit in the JIT's cache the JIT will win, but as soon as you start getting cache misses I think XBYTE will win, since it's effectively using hardware to do the cache fills and instruction decode.

I may have been premature with this statement. I've been doing more research on JIT compiling and on cache strategies (some of which is discussed in my LMM thread). Some stack based virtual machines may benefit a lot from a JIT compiler, if it can take advantage of knowledge of basic blocks to elide stack pushes and pops. The HUB RAM looks like it may be a bottleneck in a stack based system, so I think there actually is real potential in a JIT compiler for a stack based VM (like the spin1 one).