Still couldn䂟t figure out Unicode.

Mat and I were scanning through github one day and a pretty lengthy, complex piece of code caught our eye (Caution: Do not read if you’re prone to seizures or have a heart condition). This code is one of the many intricacies involved in mono’s bytecode interpreter, and it was beautiful, at least to us. Why must it be so complex? How hard can it be? After a lengthy discussion, we decided the best thing to do at this point was have a competition to see which one of us can write the fastest VM bytecode interpreter in two hours. A few insults later (“Your mother’s filesystem is so fat etc.”), we decided to set a few ground rules and agree on a benchmark.

I came up with a pretty simple piece of code that contains addition, multiplication, and conditional branches. I then generated a generic bytecode equivalent of the benchmark to be used in our interpreter.

As a control, the C benchmark runs on the native machine at an average time of 0.233 secs. In my first attempt, I wrote a simple C program that reads each instruction and jumps to a corresponding block of code.

This works by creating an array of goto pointers (I believe this is a gcc extension), a pseudo-stack, and a list of registers, then jumping to each instruction while increasing the instruction pointer. This simple virtual machine executed the bytecode in 6.421 secs, which was way too slow for my taste, so I had to figure out another approach.

Why don’t I just compile the bytecode into x86 machine code, like modern JIT VMs? That could easily be my ticket to victory. I had about an hour left in the competition so I made haste. I began to replace the optable full of goto addresses into x86 instructions, then allocated some executable memory, copied the instructions, and jumped to it.

On runtime this created a small, 122 byte x86 program based upon the benchmark bytecode which clocked in at an average speed of 0.518 secs. This was only around twice as slow as the control so I was fairly confident at this point.

I slickly inquired into what Mat was working on, and he informed me he was writing his bytecode interpreter in Visual Basic.NET. I was a bit skeptical at first, considering he did not know Visual Basic, but was reassured he wasn’t joking. Evidently he taught himself Visual Basic in the span of 2 hours to what amounts to be the ultimate coding troll. He’s not one to lose these competitions, so I assumed he has some trick up his sleeve. He submitted his code for approval:

I wouldn’t have believed it if I didn’t see it myself. My code generates native Assembly… Assembly! And his is written in Visual Basic. I’m sure there is some trickery going on, like mono optimizing the emitted instructions, but I haven’t as of yet ruled out witchcraft. I was forced to conclude that Visual Basic is faster than Assembly, that I’m a horrible coder, and Mat wins.

Program

Benchmark Time

Control

0.233 secs

Ben’s C/x86 ASM VM

0.518 secs

Mat’s Visual Basic VM

0.127 secs

UPDATE: Its been mentioned that I didn’t compile the control with optimization on. I turned optimization off because gcc is way too damn smart. It almost literally translated the code into ‘printf(“857419840\n”);’. I think a better example would be if we didn’t give gcc the answer on compile-time, since none of the VM’s were given that opportunity until it read the instructions on run-time. The VM’s did not, and could not know ahead of time the loop amount, or even the general flow of the bytecode for that matter. So by saving the loop amount in a variable declared as volatile, you prevent gcc from optimizing it out:

Why spend money on expensive CDN hosting when there’s a perfectly good, free, global one available? Thats right, DNS cache. Most open recursive DNS servers will cache requests (A, CNAME, PTR, TXT, etc.) for the length of the specified TTL value, and there’s millions of them worldwide. Once a public DNS server has the records […]

For those of you who have the new Nehalem processor from Intel, there’s an interesting new instruction that is used to speed up calculating checksums called CRC32. This instruction is part of the SSE4.2 set, and just like most SSE instructions, its fairly useless. But I just spent my hard earned money on a new […]

Not many people know this, but the C language (with the help of gcc extensions) can support templates and lambda expressions. I know I’m going to get emails / comments about how I butchered the C language. So let me start out with a word of caution, this is for educational use only, and is […]