One of the main advantages of debugging cross-platform Emscripten code is that the same cross-platform source code can be debugged on either the native platform or using the web browser’s increasingly powerful toolset — including debugger, profiler, and other tools.

Emscripten provides a lot of functionality and tools to aid debugging:

Compiler debug information flags that allow you to preserve debug information in compiled code and even create source maps so that you can step through the native C++ source when debugging in the browser.

Emcc strips out most of the debug information from optimized builds by default. Optimisation levels -01 and above remove LLVM debug information, and also disable runtime ASSERTIONS checks. From optimization level -02 the code is minified by the Closure Compiler and becomes virtually unreadable.

The emcc-g flag can be used to preserve debug information in the compiled output. By default, this option preserves white-space, function names and variable names.

The flag can also be specified with one of five levels: -g0, -g1, -g2, -g3, and -g4. Each level builds on the last to provide progressively more debug information in the compiled output. The -g3 flag provides the same level of debug information as the -g flag.

The -g4 option provides the most debug information — it generates source maps that allow you to view and debug the C/C++ source code in your browser’s debugger on Firefox, Chrome or Safari!

Note

Some optimizations may be disabled when used in conjunction with the debug flags. For example, if you compile with -O3-g4 some of the normal -O3 optimizations will be disabled in order to provide the requested debugging information.

Emscripten has a number of compiler settings that can be useful for debugging. These are set using the emcc -s option, and will override any optimization flags. For example:

./emcc -01 -s ASSERTIONS=1 tests/hello_world

The most important settings are:

ASSERTIONS=1 is used to enable runtime checks for common memory allocation errors (e.g. writing more memory than was allocated). It also defines how Emscripten should handle errors in program flow. The value can be set to ASSERTIONS=2 in order to run additional tests.

ASSERTIONS=1 is enabled by default. Assertions are turned off for optimized code (-01 and above).

Passing the STACK_OVERFLOW_CHECK=1 linker flag adds a runtime magic token value at the end of the stack, which is checked in certain locations to verify that the user code does not accidentally write past the end of the stack. While overrunning the Emscripten stack is not a security issue (JavaScript is sandboxed already), writing past the stack causes memory corruption in global data and dynamically allocated memory sections in the Emscripten HEAP, which makes the application fail in unexpected ways. The value STACK_OVERFLOW_CHECK=2 enables slightly more detailed stack guard checks, which can give a more precise callstack at the expense of some performance. Default value is 2 if ASSERTIONS=1 is set, and disabled otherwise.

A number of other useful debug settings are defined in src/settings.js. For more information, search that file for the keywords “check” and “debug”.

You can also manually instrument the source code with printf() statements, then compile and run the code to investigate issues.

If you have a good idea of the problem line you can add print(newError().stack) to the JavaScript to get a stack trace at that point. Also available is stackTrace(), which emits a stack trace and tries to demangle C++ function names (if you don’t want or need C++ demangling, you can call jsStackTrace()).

Generally it is best to avoid unaligned reads and writes — often they occur as the result of undefined behavior. In some cases, however, they are unavoidable — for example if the code to be ported reads an int from a packed structure in some pre-existing data format.

Emscripten supports unaligned reads and writes, but they will be much slower, and should be used only when absolutely necessary. To force an unaligned read or write you can:

Manually read individual bytes and reconstruct the full value

Use the emscripten_align* typedefs, which define unaligned versions of the basic types (short, int, float, double). All operations on those types are not fully aligned (use the 1 variants in most cases, which mean no alignment whatsoever).

If you get an abort() from a function pointer call to nullFunc or b0 or b1 (possibly with an error message saying “incorrect function pointer”), the problem is that the function pointer was not found in the expected function pointer table when called.

Note

nullFunc is the function used to populate empty index entries in the function pointer tables (b0 and b1 are shorter names used for nullFunc in more optimized builds). A function pointer to an invalid index will call this function, which simply calls abort().

There are several possible causes:

Your code is calling a function pointer that has been cast from another type (this is undefined behavior but it does happen in real-world code). In optimized Emscripten output, each function pointer type is stored in a separate table based on its original signature, so you must call a function pointer with that same signature to get the right behavior (see Function Pointer Issues in the code portability section for more information).

Your code is calling a method on a NULL pointer or dereferencing 0. This sort of bug can be caused by any sort of coding error, but manifests as a function pointer error because the function can’t be found in the expected table at runtime.

In order to debug these sorts of issues:

Compile with -Werror. This turns warnings into errors, which can be useful as some cases of undefined behavior would otherwise show warnings.

Use -sASSERTIONS=2 to get some useful information about the function pointer being called, and its type.

Look at the browser stack trace to see where the error occurs and which function should have been called.

Build with SAFE_HEAP=1 and function pointer aliasing disabled (ALIASING_FUNCTION_POINTERS=0). This should make it impossible for a function pointer to be called with the wrong type without raising an error: -sSAFE_HEAP=1-sALIASING_FUNCTION_POINTERS=0

Another function pointer issue is when the wrong function is called. SAFE_HEAP=1 can help with this as it detects some possible errors with function table accesses.

ALIASING_FUNCTION_POINTERS=0 is also useful because it ensures that calls to function pointer addresses in the wrong table result in clear errors. Without this setting such calls just execute whatever function is at the address, which can be much harder to debug.

Infinite loops cause your page to hang. After a period the browser will notify the user that the page is stuck and offer to halt or close it.

If your code hits an infinite loop, one easy way to find the problem code is to use a JavaScript profiler. In the Firefox profiler, if the code enters an infinite loop you will see a block of code doing the same thing repeatedly near the end of the profile.

Note

The Browser main loop may need to be re-coded if your application uses an infinite main loop.

The AutoDebugger is the ‘nuclear option’ for debugging Emscripten code.

Warning

This option is primarily intended for Emscripten core developers.

The AutoDebugger will rewrite the LLVM bitcode so it prints out each store to memory. This is useful because you can compare the output for different compiler settings in order to detect regressions, or compare the output of JavaScript and LLVM bitcode compiled using LLVM Nativizer or LLVM interpreter.

The AutoDebugger can potentially find any problem in the generated code, so it is strictly more powerful than the CHECK_* settings and SAFE_HEAP. One use of the AutoDebugger is to quickly emit lots of logging output, which can then be reviewed for odd behavior. The AutoDebugger is also particularly useful for debugging regressions.

The AutoDebugger has some limitations:

It generates a lot of output. Using diff can be very helpful for identifying changes.

It prints out simple numerical values rather than pointer addresses (because pointer addresses change between runs, and hence can’t be compared). This is a limitation because sometimes inspection of addresses can show errors where the pointer address is 0 or impossibly large. It is possible to modify the tool to print out addresses as integers in tools/autodebugger.py.

To run the AutoDebugger, compile with the environment variable EMCC_AUTODEBUG=1 set. For example:

Compile the working code with EMCC_AUTODEBUG=1 set in the environment.

Compile the code using EMCC_AUTODEBUG=1 in the environment again, but this time with the settings that cause the regression. Following this step we have one build before the regression and one after.

Run both versions of the compiled code and save their output.

Compare the output using a diff tool.

Any difference between the outputs is likely to be caused by the bug.

Note

False positives can be caused by calls to clock(), which will differ slightly between runs.

You can also make native builds using the LLVM Nativizer tool. This can be run on the autodebugged .ll file, which will be emitted in /tmp/emscripten_temp when EMCC_DEBUG=1 is set.

Note

The native build created using the LLVM Nativizer will use native system libraries. Direct comparisons of output with Emscripten-compiled code can therefore be misleading.

Attempting to interpret code compiled with -g using the LLVM Nativizer or lli may crash, so you may need to build once without -g for these tools, then build again with -g. Another option is to use tools/exec_llvm.py in Emscripten, which will run lli after cleaning out debug info.

The Emscripten Test Suite contains good examples of almost all functionality offered by Emscripten. If you have a problem, it is a good idea to search the suite to determine whether test code with similar behavior is able to run.

If you’ve tried the ideas here and you need more help, please Get in touch.