Oh that's interesting, the VM is written in smalltalk then transpiled to C?
Hmm I really wish it was open sourced because then I can recompile with a different compiler and see if that makes it any better

Yes, that is still planned. Originally, the idea was to wait until GP had gotten to a 1.0 stable release. But with the loss of funding, progress on GP has become pretty slow.

If you're interested in virtual machines for blocks languages, I suggest that you look at a related project called "MicroBlocks" (http://microblocks.fun). The source code for that virtual machine is available and it's simpler than GP. It also includes a "compiler" (written in GP!) that translates blocks into interpreted instructions. (Such instructions are sometimes called "bytecodes". The ones in MicroBlocks are actually 32-bits long, but the idea is the same.)

Oh that's interesting, the VM is written in smalltalk then transpiled to C?

No, it's directly written in C. But funny you should ask, because I was one of the original creators of Squeak, an open source Smalltalk-80 system. I wrote the Smalltalk-to-C translator that allowed the virtual machine to be written (and debugged!) in Smalltalk, then translated into C. The result was an extremely portable system that had decent performance for an interpreter. (Eliot Miranda later built a "just in time" compiler for Squeak that doubled or tripled performance.)

Anyhow, for anyone curious about virtual machines, the MicroBlocks virtual machine is a really great starting point. The interpreter itself is under 1000 lines of code and the entire virtual machine is under 5000 lines of code.

I really wish it was open sourced because then I can recompile with a different compiler and see if that makes it any better

It's currently compiled with the LLVM C compiler with -O3 optimization (the highest setting). Interesting, using the GCC compiler does give a bit better performance -- about 10%, I believe. I don't think you'll get much more that a 10% improvement regardless of the compiler or optimization settings.

The reason that Java, Javascript, webassembly, and other modern "interpreted" languages have such good performance is because they typically compile into native code dynamically, often using the actual usage patterns of the program to guide optimization. It's amazing technology but it takes a big investment. For example, the original V8 Javascript engine was created by a team of four to six people working for several years, and I suspect the teams currently working on Javascript at Google, Mozilla, Apple, and Microsoft are much larger than that. Also, because such system generate low-level machine instructions, targeting a new architecture (e.g. switching from Intel to ARM) is an major undertaking.

GP's interpreter is simple enough for a single person to implement and maintain, is portable across a wide range of systems, and gives decent performance for many applications. Since it's optimized for blocks, it's a great next language for people who stated with Scratch. However, GP is definitely not ideal for heavy number crunching. For that, you might want to try Python with the PyNum package or a compiled language like C or Go.

I really wish it was open sourced because then I can recompile with a different compiler and see if that makes it any better

It's currently compiled with the LLVM C compiler with -O3 optimization (the highest setting). Interesting, using the GCC compiler does give a bit better performance -- about 10%, I believe. I don't think you'll get much more that a 10% improvement regardless of the compiler or optimization settings.

The reason that Java, Javascript, webassembly, and other modern "interpreted" languages have such good performance is because they typically compile into native code dynamically, often using the actual usage patterns of the program to guide optimization. It's amazing technology but it takes a big investment. For example, the original V8 Javascript engine was created by a team of four to six people working for several years, and I suspect the teams currently working on Javascript at Google, Mozilla, Apple, and Microsoft are much larger than that. Also, because such system generate low-level machine instructions, targeting a new architecture (e.g. switching from Intel to ARM) is an major undertaking.

GP's interpreter is simple enough for a single person to implement and maintain, is portable across a wide range of systems, and gives decent performance for many applications. Since it's optimized for blocks, it's a great next language for people who stated with Scratch. However, GP is definitely not ideal for heavy number crunching. For that, you might want to try Python with the PyNum package or a compiled language like C or Go.

What about using the -Ofast flag during compilation? It seems as if it would be many times more helpful due to it having loop optimizations and fast math support (both of which aren't in the -O3 flag)
It's always worth trying and I think the results would be very interesting...

I tried -Ofast but, unfortunately, it gives the same performance as -O3.

BTW, a quick performance check is the "tinyBenchmarks" function. You can make a block for that by opening a workspace (in dev mode), typing it in, selecting it, and using the "blockify" menu command (ctrl-B) to make a block: