I'm building a data processing system where users can submit hooks to execute on incoming data.

The hooks are untrusted and should execute in a sandbox with access only to a limited API that I expose – essentially like a DSL. Ideally, users write hooks in ES6 or Python. The code should preferably be executable from most runtimes, but definitely from Python.

From my perspective, I'm looking for this workflow:

Users submits source code for a hook, which I compile and store

I retrieve the compiled code and execute it

Only calls to the predefined APIs have side-effects

What technologies do you recommend to achieve this?

These are the ideas I'm currently exploring:

Users write hooks in TypeScript, which is compiled to WebAssembly with AssemblyScript. I have not found any guidelines on the feasibility, security or overhead of running sandboxed WebAssembly in Python.

I check user-submitted code for imports and system/networking calls and then eval it in the current process. I have not found any guidelines on exactly which checks are necessary to securely eval Python and/or ES6.

Sharing your research helps everyone. Tell us what you've tried and why it didn't meet your needs. This demonstrates that you've taken the time to try to help yourself, it saves us from reiterating obvious answers, and most of all it helps you get a more specific and relevant answer. Also see How to Ask
– gnatFeb 21 at 22:21

2 Answers
2

Python has no security model that would allow you to safely execute untrusted code. If you want to execute untrusted Python code you need to apply operating system level safety measures, for example by running the code as a separate process in a highly restricted Linux container. For example you could prevent the code from accessing the file system of the host operating system, or can set a quota for CPU use for that container.

Javascript VMs take security much more seriously, because executing untrusted code is what browsers do all the time. Assuming you configure the VM correctly and that your API that you offer to the JavaScript code is secure, a JavaScript VM can be a very good approach for executing untrusted hooks. In particular, such VM-based isolation has a much lower latency and memory overhead than running a separate Python process in a container. This approach is also a core part of Cloudflare's serverless offerings.

The use of WebAssembly is irrelevant for the purpose of security, as long as you have a VM that can execute WebAssembly with suitable isolation guarantees.

Whatever approach you use, you fundamentally cannot check the code before execution to ensure that it doesn't do anything unsafe. Both JavaScript and Python are much too flexible to prevent effective static analysis, furthermore such program properties are provably unprovable (see also the Halting Problem). In any case, something like a regex that tries to find illegal imports is easy to circumvent for attackers. Please don't try to base your security models on a provably impossible approach.