Splash is controlled via HTTP API. For all endpoints below parameters
may be sent either as GET arguments or encoded to JSON and
POSTed with Content-Type:application/json header.

The most versatile endpoint that provides all Splash features
is execute (WARNING: it is still experimental).
Other endpoints may be easier to use in specific
cases - for example, render.png returns a screenshot in PNG format
that can be used as img src without any further processing, and
render.json is convenient if you don’t need to interact with a page.

Base HTML content will be feched from the URL given in the url
argument, while relative referenced resources in the HTML-text used to
render the page are fetched using the URL given in the baseurl argument
as base. See also: render.html result looks broken in a browser.

timeout :float:optional

A timeout (in seconds) for the render (defaults to 30).

By default, maximum allowed value for the timeout is 60 seconds.
To override it start Splash with --max-timeout command line option.
For example, here Splash is configured to allow timeouts up to 2 minutes:

Time (in seconds) to wait for updates after page is loaded
(defaults to 0). Increase this value if you expect pages to contain
setInterval/setTimeout javascript calls, because with wait=0
callbacks of setInterval/setTimeout won’t be executed. Non-zero
wait is also required for PNG and JPEG rendering when doing
full-page rendering (see render_all). Maximum
allowed value for wait is 10 seconds.

Comma-separated list of allowed domain names.
If present, Splash won’t load anything neither from domains
not in this list nor from subdomains of domains not in this list.

allowed_content_types :string:optional

Comma-separated list of allowed content types.
If present, Splash will abort any request if the response’s content type
doesn’t match any of the content types in this list.
Wildcards are supported using the fnmatch
syntax.

forbidden_content_types :string:optional

Comma-separated list of forbidden content types.
If present, Splash will abort any request if the response’s content type
matches any of the content types in this list.
Wildcards are supported using the fnmatch
syntax.

Note that cached images may be displayed even if this parameter is 0.
You can also use Request Filters to strip unwanted contents based on URL.

headers :JSON array or object:optional

HTTP headers to set for the first outgoing request.

This option is only supported for application/json POST requests.
Value could be either a JSON array with (header_name,header_value)
pairs or a JSON object with header names as keys and header values
as values.

“User-Agent” header is special: is is used for all outgoing requests,
unlike other headers.

body :string:optional

Body of HTTP POST request to be sent if method is POST.
Default content-type header for POST requests is application/x-www-form-urlencoded.

A list of argument names to put in cache. Splash will store each
argument value in an internal cache and return X-Splash-Saved-Arguments
HTTP header with a list of SHA1 hashes for each argument
(a semicolon-separated list of name=hash pairs):

Client can then use load_args parameter
to pass these hashes instead of argument values. This is most useful
when argument value is large and doesn’t change often
(js_source or lua_source
are often good candidates).

load_args :JSON object or a string:optional

Parameter values to load from cache.
load_args should be either {"name":"<SHA1hash>",...}
JSON object or a raw X-Splash-Saved-Arguments header value
(a semicolon-separated list of name=hash pairs).

For each parameter in load_args Splash tries to fetch the
value from the internal cache using a provided SHA1 hash as a key.
If all values are in cache then Splash uses them as argument values
and then handles the request as usual.

If at least on argument can’t be found Splash returns HTTP 498 status
code. In this case client should repeat the request, but
use save_args and send full argument values.

Splash uses LRU cache to store values; the number of entries is limited,
and cache is cleared after each Splash restart. In other words, storage
is not persistent; client should be ready to re-send the arguments.

Resize the rendered image to the given width (in pixels) keeping the aspect
ratio.

height :integer:optional

Crop the renderd image to the given height (in pixels). Often used in
conjunction with the width argument to generate fixed-size thumbnails.

render_all :int:optional

Possible values are 1 and 0. When render_all=1, extend the
viewport to include the whole webpage (possibly very tall) before rendering.
Default is render_all=0.

Note

render_all=1 requires non-zero wait parameter. This is an
unfortunate restriction, but it seems that this is the only way to make
rendering work reliably with render_all=1.

scale_method :string:optional

Possible values are raster (default) and vector. If
scale_method=raster, rescaling operation performed via width parameter is pixel-wise. If scale_method=vector, rescaling
is done element-wise during rendering.

Note

Vector-based rescaling is more performant and results in crisper fonts and
sharper element boundaries, however there may be rendering issues, so use
it with caution.

Whether to include the executed javascript console messages in output.
Possible values are 1 (include) and 0 (exclude). Default is 0.

history :integer:optional

Whether to include the history of requests/responses for webpage main
frame. Possible values are 1 (include) and 0 (exclude).
Default is 0.

Use it to get HTTP status codes and headers.
Only information about “main” requests/responses is returned
(i.e. information about related resources like images and AJAX queries
is not returned). To get information about all requests and responses
use ‘har’ argument.

har :integer:optional

Whether to include HAR in output. Possible values are
1 (include) and 0 (exclude). Default is 0.
If this option is ON the result will contain the same data
as render.har provides under ‘har’ key.

By default, response content is not included. To enable it use
‘response_body’ option.

response_body :int:optional

Possible values are 1 and 0. When response_body=1,
response content is included in HAR records. Default is
response_body=0. This option has no effect when
both ‘har’ and ‘history’ are 0.

Splash supports executing JavaScript code within the context of the page.
The JavaScript code is executed after the page finished loading (including
any delay defined by ‘wait’) but before the page is rendered. This allow to
use the javascript code to modify the page being rendered.

To execute JavaScript code use js_source parameter.
It should contain JavaScript code to be executed.

Note that browsers and proxies limit the amount of data can be sent using GET,
so it is a good idea to use content-type:application/json POST request.

Splash supports “javascript profiles” that allows to preload javascript files.
Javascript files defined in a profile are executed after the page is loaded
and before any javascript code defined in the request.

The preloaded files can be used in the user’s POST’ed code.

To enable javascript profiles support, run splash server with the
--js-profiles-path=<pathtoafolderwithjsprofiles> option:

Then create a directory with the name of the profile and place inside it the
javascript files to load (note they must be utf-8 encoded).
The files are loaded in the order they appear in the filesystem.
Directory example:

/etc/splash/js-profiles/
mywebsite/
lib1.js

To apply this javascript profile add the parameter
js=mywebsite to the request:

then javascript code is allowed to access the content of iframes
loaded from a security origin diferent to the original page (browsers usually
disallow that). This feature is useful for scraping, e.g. to extract the
html of a iframe page. An example of its usage:

The javascript function ‘getContents’ will look for a iframe with
the id ‘external’ and extract its html contents.

Note that allowing cross origin javascript calls is a potential
security issue, since it is possible that secret information (i.e cookies)
is exposed when this support is enabled; also, some websites don’t load
when cross-domain security is disabled, so this feature is OFF by default.

The folder --filters-path points to should contain .txt files with
filter rules in Adblock Plus format. You may download easylist.txt
from EasyList and put it there, or create .txt files with your own rules.

For example, let’s create a filter that will prevent custom fonts
in ttf and woff formats from loading (due to qt bugs they may cause
splash to segfault on Mac OS X):

! put this to a /etc/splash/filters/nofonts.txt file
! comments start with an exclamation mark
.ttf|
.woff|

To use this filter in a request add filters=nofonts parameter
to the query:

If default.txt file is present in --filters-path folder it is
used by default when filters argument is not specified. Pass
filters=none if you don’t want default filters to be applied.

Only related resources are filtered out by request filters; ‘main’ page loading
request can’t be blocked this way. If you really want to do that consider
checking URL against Adblock Plus filters before sending it to Splash
(e.g. for Python there is adblockparser library).

Splash doesn’t support full Adblock Plus filters syntax, there are some
limitations:

element hiding rules are not supported; filters can prevent network
request from happening, but they can’t hide parts of an already loaded page;

only domain option is supported.

Unsupported rules are silently discarded.

Note

If you want to stop downloading images check ‘images’
parameter. It doesn’t require URL-based filters to work, and it can
filter images that are hard to detect using URL-based patterns.

Warning

It is very important to have pyre2
library installed if you are going to use filters with a large number
of rules (this is the case for files downloaded from EasyList).

Without pyre2 library splash (via adblockparser) relies on re module
from stdlib, and it can be 1000x+ times slower than re2 - it may be
faster to download files than to discard them if you have a large number
of rules and don’t use re2. With re2 matching becomes very fast.

Make sure you are not using re2==0.2.20 installed from PyPI (it is broken);
use the latest version.

whitelist and blacklist are newline-separated lists of regexes.
If URL matches one of whitelist patterns and matches none of blacklist
patterns, proxy specified in [proxy] section is used;
no proxy is used otherwise.

Then, to apply proxy rules according to this profile,
add proxy=mywebsite parameter to request:

To get debug information about Splash instance (max RSS used, number of used
file descriptors, active requests, request queue length, counts of alive
objects) send a GET request to the /_debug endpoint: