2013-12-02

Emscripten
is a C/C++ -> LLVM -> JavaScript compiler. It's useful
and interesting, but one of its biggest limitations is about the
sync/async model: JavaScript is single-threaded event-driven languages,
therefore you cannot do a sleep() in the hope of receiving and processing extra events.

Demo

First of all, please enjoy this demo, on which I really spent some time.

tl;dr

Motivation: to make `sleep` and other similar functions actually
work in our Runtime, without lots of labor work porting the C/C++ code

I actually made it worked

`sync->async` is doable and I think it's not too hard - there are already implementations for JS

I think this is a feature that cannot be supported by merely an external library

Does it fit emscripten? Or is it possible to write a plugin for emscripten?

Demo explained(?)

The demo is basically a stop-time animation:
//draw something
//wait for some while
//draw something else
//wait...

I fed emscripten with the code and blablabla — and
the demo works. But wait! The code should be identical to the original
code, which is synced!
Well, please let me explain a few more things before I reveal the secrets.

Another Demo

Here's another demo, which is... the same as above. So what's the deal?

We
may imagine that, to really 'go to sleep', we need to store all the
context and restore it when we come back again. Indeed, I did so in the source code, whenever you see a `ASYNC_` macro, it involves pushing and poping to maintain the async stack.

Well, I'm NOT going to propose a set of API or a library, instead I'm proposing a way of pre-processing the code, and I did that myself manually.
It's doable and there're patterns, you may see how a for-loop is broken
down according to the comments. I'll put technical details in the end.

The
porting experience may not be as smooth as it looks like, actually
`xmas` is rather straightforward, where there are rarely recursive
for-loops or branches. But if you take a look at other demos, it is a
nightmare to define callbacks and maintain the stack manually, just
imagine that there's no `call` and `ret ASM macros, and you have to do
`push`, `pop` and `jump` manually.

My point is that: the sync->async process can, and should be done by the pre-processor/compiler

The Secrets of the 1st Demo

It's
not a library, but a parser/compiler instead. I didn't go too deep into
its mechanism, but from the results it generated, the mechanism should
be similar as what I'll mentioned below. You may read this article for more details.

To build
the first demo, all the placeholders are replace with underscores,
which will be recognized by streamlinejs (as placeholders for calback),
fortunately un-optimized JS generated by emscripten can be parsed
without any problem — at lesat my demo.

Technical stuffs

Imagine that there a stack dedicated for async
function calls, it is different from traditional stacks in that this
stack is not cleared when a function exits.

Async
function calls are different from (normal) sync funtion calls, an async
call pushes the context into the async stack, including the callback
(similar as the return address in the synced case) and returns. The
central event dispatcher (the JS engine in our case) will call the
callback eventually.

So the central idea is to identify all the async function calls, which are usually casuse by two reasons:

Calling an async function

`jump` over an async call

The first one should be easy: some functions are
async natively, e.g. `SDL_Delay`. And if a function calls any other
async funtions inside, it is async.

The second one is usually originated from loops and branches, which will be explained later.

I think that these can be identified by the compiler, in one of following stages:

- Pre-processing C/C++ — I did that manually myself
- LLVM bitcode — which I'm not so sure
- JavaScript — streamline itself is an example

There
are advantages and disadvantages in different stages, for example it
might be easier to optimize the code when parsing the C code; while it
may be more light-weighted to store the local variables using JavaScript
closures.

Braches (if, switch etc) are similar, as long as we consider them as `goto`'s.

local variables and return values

local variables may be stored and retrieved when we push/pop the async stack,
and so are return values.

Compiler/Preprocessor Integration: Step 1

It should be clear now
that this feature is kind of transformation, which cannot be supported
by linking to an external library. Of course the pre-condition is that
the transformation should be (almost) transparent, it should not be
necessary for developers to maintain the stack manually.

The
first step, I'd imagine, is that the async functions are explicitly
marked through some mechanism. In my example, a placeholder is used.

Developers
may still write programs in the sync fashion, for two reasons: one for
the convenience writing new program, and the other for porting existing
ones.

The compiler should detect, split and setup async
functions automatically, the async stack should be managed by standard
library while some API might be exposed.

There are two ways of managing the local variables, let me call them the C style and the JavaScript style:

The C style:
Local variables of async functions are stored in dedicated area in the
memory (HEAP or a special stack for async functions), instead of the
normal stack. To avoid lots of memcpy's, the variables may be directly
allocate there. Some push/pop operations may be optimized if the
caller/callee is known (e.g. loops/branches)

The JavaScript style:
streamlinejs is a good example. Async functions are broken into a
series of resursive functions, and local variables are stored into the
closures.

The JavaScript style is easy and intuitive,
but the hidden overhead might not be negligible. It may be too late to
optimize when the LLVM bitcode have been transformed into JavaScript.

Compiler/Preprocessor Integration: Step 2

It might be possible to
further reduce the work of writing/porting, as even marking async
functions and define the placeholders for every async function declaration and every async function call is boring and error-prone.

My
(naive & wild) imagination is that by defining a few essential async functions
(such as SDL_Delay), the compiler would
automatically recognize async functions, and set up the hidden
parameter. It's not perfect, especially when we need to link a number
libraries, but at least I think a C/C++ transformer would be possible and nice, perhaps based on LLVM?

Limitations

It might not work for muti-threading. Indeed I've been only thinking about
single-threaded programs, especially most ones for terminal — But this
should not affect the importance of this issue I think.

Lots of
overhead might be introduced in this way — But I guess the performance should not be affected much if well optimized

C++: ctr/copy/dectr of objects might be a problem, or maybe not since they can be `flattened` into C-style?

1. A few JS libraries for plotting
Ext JS, with rather complicated data models, but worthy. Produced figures look good.
flot, a smooth coding experience with jQuery, but also lack of features. E.g. by default the only available symbol is circle. There are plugins available for extra symbols, but only in legend...Also I have to specify all the ticks...