I had a fun week ago, and wanted to simulate some more elaborate loop patterns on Mill-like, with multiple op per instruction, and belt like data flow, NaRs, deferred loads, etc.

It is very simplified (took maybe 2 hours to write and make it running on a first simple for loop), but it works, has single bundle stream, doesn’t simulate full memory hierarchy (or L1 cache misses, or fetch / missprediction penalties), just maybe dozen of essential opcodes and not super fast. But I do have few sorting algorithms, few computational kernels, micro benchmarks implemented by hand in assembly and it is nice to play.

It doesn’t read binary files or compile stuff, everything is by hand in “conAsm”, but it is still very educational and can be used to estimate performance, IPC, code density, play with call stacks, deferred loads, NaRs, belt addressing, scratchpad, from programmer/compiler point of view.

Is anybody interested in using it? I can publish it on github (It is about 200 lines of nice hacker friendly code). Or improve if somebody have some specific need for a simulator.