Splash can execute custom rendering scripts written in the Lua
programming language. This allows us to use Splash as a browser automation
tool similar to PhantomJS.

To execute a script and get the result back send it to the execute
endpoint in a lua_source argument.

Note

Most likely you’ll be able to follow Splash scripting examples even
without knowing Lua; nevertheless, the language is worth learning.
With Lua you can, for example, write Redis, Nginx, Apache,
World of Warcraft scripts, create mobile apps using
Moai or Corona SDK or use the state of the art Deep Learning
framework Torch7. It is easy to get started and there are good online
resources available like the tutorial Learn Lua in 15 minutes and the
book Programming in Lua.

If we submit this script to the execute endpoint in a lua_source
argument, Splash will go to the example.com website, wait until it loads,
wait aother half-second, then get the page title (by evaluating a JavaScript
snippet in page context), and then return the result as a JSON encoded object.

Note

Splash UI provides an easy way to try scripts: there is a code editor
for Lua and a button to submit a script to execute. Visit
http://127.0.0.1:8050/ (or whatever host/port Splash is listening to).

The “main” function receives an object that allows us to control the “browser
tab”. All Splash features are exposed using this object. By convention, this
argument is called “splash”, but you are not required to follow this convention:

The code looks like standard procedural code; there are no callbacks or fancy
control-flow structures. It doesn’t mean Splash works in a synchronous
way; under the hood it is still async. When you call splash.wait(0.5),
Splash switches from the script to other tasks, and comes back after 0.5s.

It is possible to use loops, conditional statements, functions as usual
in Splash scripts which enables more straightforward coding.

The code is (arguably) tricky: process function implements a loop
by creating a chain of callbacks; followers function doesn’t return a value
(it would be more complex to implement) - the result is logged to the console
instead.

some Lua knowledge is helpful to be productive in Splash Scripts:
ipairs, [[multi-linestrings]] or string concatenation via
.. could be unfamiliar;

in Splash variant followers function can return a result
(a number of twitter followers); also, it doesn’t need a “callback” argument;

instead of a page.open callback which receives “status” argument
there is a “blocking” splash:go call which returns “ok” flag;

error handling is different: in case of an HTTP 4xx or 5xx error
PhantomJS doesn’t return an error code to page.open callback - example
script will try to get the followers nevertheless because “status” won’t
be “fail”; in Splash this error will be detected and ”?” will be returned;

process function can use a standard Lua for loop without
a need to create a recursive callback chain;

instead of console messages we’ve got a JSON HTTP API;

apparently, PhantomJS allows to create multiple page objects and
run several page.open requests in parallel (?); Splash only provides
a single “browser tab” to a script via its splash parameter of main
function (but you’re free to send multiple concurrent requests with
Lua scripts to Splash).

There are great PhantomJS wrappers like CasperJS and NightmareJS which
(among other things) bring a sync-looking syntax to PhantomJS scripts by
providing custom control flow mini-languages. However, they all have their
own gotchas and edge cases (loops? moving code to helper functions? error
handling?). Splash scripts are standard Lua code.

Note

PhantomJS itself and its wrappers are great, they deserve lots of
respect; please don’t take this writeup as an attack on them.
These tools are much more mature and feature complete than Splash.
Splash tries to look at the problem from a different angle, but
for each unique Splash feature there are ten unique PhantomJS features.

Internally, “main” function is executed as a coroutine by Splash,
and some of the splash:foo() methods use coroutine.yield.
See http://www.lua.org/pil/9.html for Lua coroutines tutorial.

In Splash scripts it is not explicit which calls are async and which calls
are blocking; this is a common criticism of coroutines/greenlets. Check
this article
for a good description of the problem.

However, these negatives have no real impact in Splash scripts which: are
meant to be small, where shared state is minimized, and the API is designed to
execute a single command at a time, so in most cases the control flow is linear.

If you want to be safe then think of all splash methods as async;
consider that after you call splash:foo() a webpage being
rendered can change. Often that’s the point of calling a method,
e.g. splash:wait(time) or splash:go(url) only make sense because
webpage changes after calling them, but still - keep it in mind.

There are async methods like splash:go, splash:wait,
splash:wait_for_resume, etc.; most splash methods are currently
not async, but thinking of them as of async will allow your scripts
to work if we ever change that.

Unlike in many languages, methods in Lua are usually separated from an object
using a colon :; to call “foo” method of “splash” object use
splash:foo() syntax. See http://www.lua.org/pil/16.html for more details.

There are two main ways to call Lua methods in Splash scripts:
using positional and named arguments. To call a method using positional
arguments use parentheses splash:foo(val1,val2), to call it with
named arguments use curly braces: splash:foo{name1=val1,name2=val2}:

-- Examples of positional arguments:splash:go("http://example.com")splash:wait(0.5,false)localtitle=splash:evaljs("document.title")-- The same using keyword arguments:splash:go{url="http://example.com"}splash:wait{time=0.5,cancel_on_redirect=false}localtitle=splash:evaljs{source="document.title"}-- Mixed arguments example:splash:wait{0.5,cancel_on_redirect=false}

For convenience all splash methods are designed to support both styles
of calling: positional and named. But since there are no “real” named
arguments in Lua most Lua functions (including the ones from the
standard library) choose to support just positional arguments.

To convert “status flag” errors to exceptions Lua assert function can be used.
For example, if you expect a website to work and don’t want to handle errors
manually, then assert allows to stop processing and return HTTP 400
if the assumption is wrong:

localok,msg=splash:go("http://example.com")ifnotokthen-- handle error somehow, e.g.error(msg)end-- a shortcut for the code above: use assertassert(splash:go("http://example.com"))

By default Splash scripts are executed in a restricted environment:
not all standard Lua modules and functions are available, Lua require
is restricted, and there are resource limits (quite loose though).

To disable the sandbox start Splash with --disable-lua-sandbox option:

To setup the path for Lua modules start Splash with --lua-package-path
option. --lua-package-path value should be a semicolon-separated list
of places where Lua looks for modules. Each entry should have a ? in it
that’s replaced with the module name.

If you use Splash installed using Docker see
Folders Sharing for more info on how to setup
paths.

Note

For the curious: --lua-package-path value is added to Lua
package.path.

When you use a Lua sandbox (default) Lua require
function is restricted when used in scripts: it only allows to load
modules from a whitelist. This whitelist is empty by default, i.e. by default
you can require nothing. To make your modules available for scripts start
Splash with --lua-sandbox-allowed-modules option. It should contain a
semicolon-separated list of Lua module names allowed in a sandbox:

Another way to write such module is to add a method to splash
object. This can be done by adding a method to its Splash
class - the approach is called “open classes” in Ruby or “monkey-patching”
in Python.

-- wait_for.lua-- Sandbox is not enforced in custom modules, so we can import-- internal Splash class and change it - add a method.localSplash=require("splash")functionSplash:wait_for(condition)whilenotcondition()doself:wait(0.05)endend-- no need to return anything

Which style to prefer is up to the developer. Functions are more explicit
and composable, monkey patching enables a more compact code. Either way,
require is explicit.

As seen in a previous example, sandbox restrictions for standard Lua modules
and functions are not applied in custom Lua modules, i.e. you can use
all the Lua powers. This makes it possible to import third-party Lua modules
and implement advanced features, but requires developer to be careful.
For example, let’s use os
module:

-- evil.lualocalos=require("os")localevil={}functionevil.sleep()-- Don't do this! It blocks the event loop and has a startup cost.-- splash:wait is there for a reason.os.execute("sleep 2")endfunctionevil.touch(filename)-- another bad ideaos.execute("touch "..filename)end-- todo: rm -rf /returnevil