23 Oct 2012

Ok, let me just say that I went from "Saulus to Paulus" (as we say in Germany) in the past few days. In my ongoing stealth mission to evaluate all the C++-to-Web technologies currently available (Google Native Client, Adobe's Flash C/C++ compiler, and Mozilla's emscripten) I actually wanted to pick Adobe's solution next, since I didn't really believe that emscripten's approach of compiling C++ to Javascript could possibly work. I had a fixed idea in my mind, how fast a C++-to-Bytecode VM solution would be (that's what Adobe is doing), and how fast Javascript could possibly be, and Javascript would lose by a long shot. No way I thought it possible to run really math heavy code in a language with such a shitty type system (excuse my french).

There are numbers flying around like 25% to 50% of native performance (even up to 80% for Adobe's solution) which I thought to be extremely optimistic even for hand-picked benchmarks. For instance if you look at Epic's famous Unreal Flash demo, there's not a lot of dynamic stuff happening in the 3D world you're moving through. Sure it looks impressive, but it mainly demonstrates a good art style and how fast your GPU is, but doesn't say much about how efficient the "CPU code" is running in the Adobe VM.

Then I started to look closer at emscripten, spent a few days with porting, and imagine my surprise when I first started the optimized version of this:

...and I added dragons and more dragons, and even more dragons, until the frame rate finally started to drop. Of course it's not Native Client performance, but it is much (much!) better then I expected.

Let me explain what you're seeing:

The demo is built from a set of Nebula3 modules consisting of about 120k lines of C++ code cross-compiled to Javascript+WebGL through the emscripten compiler infrastructure. There is a lot (really a LOT) of vector floating point math C++ code running in the animation engine because I must admit that I actually wanted to "break" the JS engine and show how incredibly much faster NaCl would be. Well, that didn't quite work out ;)

Of those 120k lines of code, only a few hundred lines are actually specific to the emscripten platform. So there's less then 0.5% of platform specific code, and about 99.5% of the code is exactly the same as in the NaCl demo, or an actual "native" (OpenGL-based) desktop version of the demo. If you take all of Nebula3 (about half a million lines of C++ code), then the ratio of platform specific code is even more impressive.

Let this sink in for a while:

You can take a really big C++ code base with dozens of man years of engineering effort and a mature asset pipeline attached to it, spend about 2 weekends(!) of tinkering, and can run your code at a really good performance level in a browser without plugins! You still have to be realistic about CPU performance of course. It helps if a game is designed to make relatively little use of the CPU, and move as much work as possible onto the GPU, but these are the normal realities of game development. Of all the target platforms for a project you should choose the weakest as the "lead platform", make the game run well there, and use the extra power of the other platforms for non-essential, but pretty "bells'n'whistles".

And you don't have to burn bridges behind you: You can still use the exact same code base and asset pipeline and create traditional native applications for mobile platforms, desktop apps, or game consoles. And all of this in a programming language and a graphics API which many considered almost on its death bed a few years ago.

I'd say that C++ and OpenGL are on a goddamn come-back tour right now :)

Not all is golden though, each of the C++-to-web solutions has at least one serious weakness:

Native Client: extremely fast and feature rich, but only supported by Google

emscripten (more specifically WebGL): not supported by Microsoft

Adobe flascc: Adobe wants a "speed tax" if you're using certain "premium features" required for high-performance 3D games and earn money with it

Sucks badly, since most of this is purely politics-driven, not for technological reasons.

So... originally I wanted this blog post to be a technical post-mortem of the emscripten port, but on one hand it's already quite long, and on the other hand: there's really not much to write about, since it went so smooth.

I installed the SDK and a few required tools to my MacBook, wrote (yet another) simple CMake toolchain file, and was able to compile most of Nebula3 out of the box within an hour. The most exciting event was that I found a minor bug in the OpenGL wrapper code, which was fixed within the day by the emscripten team (kudos and many thanks to kripken (aka azakai) for being so incredibly fast and helpful).

The only area where I had to spend a bit more time was on threading (or rather the lack thereof). Since emscripten (like NaCl) is running in the browser context, it suffers from many of the same limitations - like not being able to do synchronous IO, and you cannot "own the game loop" but need to run in little per-frame time-slices inside callbacks from the browser. NaCl offers a working pthreads API to workaround these limitations, but emscripten cannot support true threading since the underlying Javascript runtime doesn't allow sharing state between threads. This looked like a real show stopper to me in the beginning, but after a few nights of sleeping over this problem I found a really simple solution by moving to a higher level into Nebula3's asynchronous messaging system. Up there it was relatively easy to replace the multithreaded message handler code with a frame-callback system with minimal code changes.

It sucks a bit right now that everything runs on the main thread (so the current N3 demo cannot take advantage of multiple CPU cores), but a solution for a higher level multithreaded API based on HTML5 webworkers is in the works right now (I really can't believe how fast these guys are!).

I'm tempted to write a lot more of how incredibly clever the C++-to-JS cross-compilation is, and how the generated JS code can be even faster then handwritten code, and how surprisingly compact the generated code is, but if you're getting all exited about this type of stuff it's better if you're reading it first hand:

So where next? I'll polish the emscripten port a bit more and implement support for the new web worker API to load and uncompress asset files in the background, and then take a little detour and port the Dragons demo to Adobe's flascc (perfect timing since they just went into open beta). After that I need to do some cleanup work on all three ports, and on the higher level parts of the render pipeline since the low level rendering code has been replaced with the new "Twiggy" stuff.