Suppose I'm writing my own virtual machine interpreter, and I want to target it with a 'real' language like C. I could retarget the backend of gcc, but I hear that work is horrendous. What I would rather do is abuse GCC already written for another processor. Back in school, we had a stripped-down gcc that output to a subset of MIPS, and we had to write a MIPs emulator, so it would be similar to that, just much more feature-rich.

I suppose this more of a question for those who have written a lot of assembly language: What CPU do you think is most ideally suited for writing an interpreter? Keep in mind that I don't necessarily need to run a modern style OS on the VM, and the memory architecture can be simple. I'm even considering subsets of certain processors. For instance, a 386 in real mode has 32-bit registers, floating point support (387), and there's a trick to address 4gb flat from real mode (called unreal mode.) If my interpreter only acted as a 386 in 'unreal mode', that would be fine by me. But x87 opcodes are complex... I want a simpler processor.

Why do I want to do this? For fun. Several years ago, I wrote a little VM and my own language and gave it bindings to a C library I wrote that did basic graphics. The vm would call functions like, load this image file, draw is image here, etc, and the C library did all the grunt work. I kind of want to do the same thing again, but with OpenGL 3d graphics. I could invent a new pet language again, and I might, but if I could get plain C compiling to vm bytecode that would be great. Some of the optimizations gcc does (loop unrolling, moving constants out of loops etc) would benefit bytecode programs probably quite a bit.

And before someone lists it as an x86 interpreter, I am considering just adding opengl calls to dosbox. If I reserve certain addresses for communication to the outside of the VM, there wouldn't be anything stopping me from writing 3d accelerated apps in qbasic or turbo c!

If you want to use an existing and functional instruction set in your VM, you need to build a pretty rich VM.

On the other hand, if you want to build a simple VM that you can reasonably finish in a moderate timespan, you'll need to pick an instruction set that isn't terribly complicated.

x86 is probably the worst instruction set to try and emulate, at least as far as popular contemporary instruction sets go. x64 is a little better in terms of cleanliness but it's still huge.

I'd recommend a RISC architecture to begin with, since by definition they're mostly simple operations and their complexity comes from having lots of registers. Something old is probably good, too, but there are good modern RISCs out there like the ATMega line.

Actually, come to think of it, your sweet spot might be to do something like an Arduino software emulator. The hardware is well-specified and toolchains exist for it already, so you don't need to roll your own compiler; the instruction set is pretty clean; and the VM implementation can be trivially checked against other emulators or actual Arduino hardware if you really want.

I'd suggest going for something for which there are existing emulators. For example the 68000 might be simple enough, but still has plenty of support around because it was used in the Atari ST and Amiga 500. It also has 32-bit registers so it's not too limited. It's also documented - http://www.easy68k.com/paulrsm/doc/68kprm.pdf

Of course you will have the added fun of it being big endian, so there might be better options around.

The 68k is a total CISC beast. I wouldn't recommend trying to emulate it as something "simple."

If you want extreme in simple, do 6502. It only has a handful of registers and no more than 256 opcodes.

Of course, if you don't mind using someone else's emulator core to create your VM, you could use just about anything. You could also create your own VM machine code, but that'd require you to write the assembly back end for any C compiler you used, which is fairly nontrivial.

MMIX is a nice instruction set, but I'd be careful about using it learn how to build a VM. It's a very idealistic design and while it's excellent stuff (hell, it's Knuth!) it won't teach you the ugly reality of dealing with actual hardware instruction sets etc.

I stick by ATMega because you can buy an Arduino and verify your VM's behavior. Trying to simulate a fast CPU is going to be extremely hard; trying to simulate a slow but effective CPU gives you room to eat the software emulation overhead without making the VM useless.

In other words, write a program that works nicely on an Arduino or similar ATMega chipset, and then pump it through the VM. If all goes well, your VM should be competitive in performance with the real hardware (which ain't gonna happen if you want to do modern PowerPC, and MMIX in particular has no reference hardware) and you can validate the VM's behavior for free.

0. Read about SPARC. It's a free architecture with full micro-architecture documentation!

1. As ApochPiq said, the Atmel AVR family is quite popular due to the Arduino platform.

2. The ARM architecture is the common/industry standard among hardware manufactures.

3. Also, to whet your appetite: PowerPC is used in the Curiosity rover..

MMIX is a nice instruction set, but I'd be careful about using it learn how to build a VM. It's a very idealistic design and while it's excellent stuff (hell, it's Knuth!) it won't teach you the ugly reality of dealing with actual hardware instruction sets etc.

Indeed, I incremented RatTrap's post for reading "The art of Computer Programming" series !

Something else that no one has mentioned yet is LLVM (Low Level Virtual Machine). It is designed to be a platform for exactly the kind of thing you are thinking of. It is used as a platform onto which higher level virtual machines can be implemented. I would highly recommend that you check it out as it is very cool. Plus it is licensed under a very liberal open source license so if you have commercial aspirations that shouldn't be a problem.

LLVM is an awesome instruction set, but it's not the easiest thing ever to actually build a VM for. It's more designed for translation down to machine code than live execution. There's a lot of very "strange" instructions that don't correspond in any way to hardware CPUs that you have to understand very well in order to execute LLVM bitcode (phi nodes and GEPs come to mind as just a couple of examples).

Thanks for all your replies. Yes, both x86 and 68k are CISC monsters. 6502 and Auduino are both fairly clean, but they have 16-bit addresses and data words. Access to 32-bit data types would be ideal. But still, I could probably still do a lot with a 64k address space, especially since the VM will call out to the host for things like loading images and rendering. If all the vm needs is a handle to an image or VBO that is already in the host's memory, then reading the keyboard/mouse, performing in-game logic and deciding when and what to render could be handled fairly well in such a small vm. The VM won't need direct access to the framebuffer, it's just going to call out to the host and say 'draw sprite #123 at x,y' or 'draw 3d model #99 at x,y,z'. I'll look at some classic 8-bit designs. Programming it will be a bit like a gtx560 strapped to a really fast commodore 64.

I could retarget the backend of gcc, but I hear that work is horrendous.

Why not retarget something simple instead, like Small-C? Writing a backend for a portable-by-design compiler like that is likely to be a lot simpler than emulating even a fairly simple real-world chip.

But I'm also a little unclear as to your goals here. Usually one writes a VM for one of two reasons: (a) to emulate a real-world piece of hardware for which no emulator exists, or (b) to provide a runtime for a new high-level programming language. Your use case doesn't seem to resemble either one of those - is this just a learning exercise?

Thanks, small-c looks almost perfect! It seems like its missing structs? A quick googling shows many variants. I'll probably learn more retargeting a simpler compiler to my own cpu than emulating something else. Thanks a lot!