{ throw new NoFunnyProverbFoundException(); }

Main menu

Post navigation

A first look at WebAssembly performance

WebAssembly gives us the promise to run high performance code in the browser in a standardized way. Now that there are a few WebAssembly previews available I decided it’s time to take a look at their performance. One source for benchmarks is the well known Computer Language Benchmarks Game and I decided to pick nbody (it’s almost four years ago since I did so last time…).

After playing a bit with the results I decided to put the code on github. I’m looking forward to your corrections, improvements and feedback. I’m already excited what the results will look like in a few months…

The following versions were compared:

webAssembly: A WebAssembly version compiled from the original c version, because this turned out to be faster than the other version I checked

object: The fastest javascript version from the Computer Language Benchmarks Game. It uses a javascript objects for each body to store the data.

arrayPerObject: Each body’s data is stored in a plain javascript array.

floatArrayPerObject: Each body’s data is stored in a typed array

oneTypedArray: All body’s data is stored in a single typed array and the advance function is programmatically unrolled (quite crazy, isn’t it).

To get a baseline the fastest java and the original c version were added.

Firefox does pretty well. The WebAssembly implementation is the fastest browser version and close to the java baseline, but the pure javascript implementation isn’t really much behind. Seems like Javascript VMs are already pretty good at simple numeric code.
For the other browsers WebAssembly couldn’t beat the javascript versions yet. And Safari has a completely different idea what Javascript version it can optimize best.

The c version was compiled with gcc -O3 nbody.c -o nbody (which is Apple LLVM version 8.0.0 (clang-800.0.42.1))
This version took 4.4 seconds on my machine and was faster than the fastest C version from the shootout, compiled with gcc -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 nbody_fastest.c -o nbody_fastest, which took 4.9 seconds on my machine

Infrastructure:

All tests were performed on a 2015 MacBook Pro, 2.5 GHz Intel Core i7, 16 GB 1600 MHz DDR3. For all tests the best of three runs was selected for the result.

I collected three runs for each browser and version and took the best run. Data varies with each run of course, but I believe (without statistical proof) that the best run rounded to one decimal is acceptable.

I would strongly recommend doing several more than three runs and reporting the average and standard deviation for each version. Taking only the best results is not really representative of real world performance.

If possible, I recommend not using pre-release browser engines for performance benchmarks. I don’t know specifically about WebAsm or about Firefox, but Safari Tech Preview and Chrome Canary are generally not as fast as their production-release counterparts.

I second Fancher’s suggestion. Additionally, you want to detail how many iterations per run, which helps to isolate the cache performance properties inherent within a benchmark. Many benchmark platforms use thousands of iterations per run to do this.

WAVM uses LLVM to generate code, so it’s not too surprising that it gets close to native performance. There are still some inefficiencies in how WAVM maps WebAssembly into LLVM IR, but overall it will generate code closer to an offline compiler than a browser JIT. The remaining difference should just be sandboxing overhead. Hopefully browsers can eventually match the performance of the code generated by WAVM without the expensive LLVM codegen!

– I agree, 3 runs is not enough.
– What about the compile time and the size? I quote: “WebAssembly or wasm is a new portable, size- and load-time-efficient format suitable for compilation to the web.”…. those are the reasons why wasm exists. Can you provide your experience on that end?

For your firefox example that would be 0.168,85
(from firefox 5.860,16 6.141,81 6.088,67)

If you want to know the number of runs to do, calculate that unbiased standard deviation and increase the number of runs until interesting values differ by at least 2x the largest unbiased standard deviation of the values.