Project description

Chrome browser as an HTTP service with an splash compatible HTTP API

Chromewhip is an easily deployable service that runs headless Chrome process wrapped with an HTTP
API. Inspired by the `splash <https://github.com/scrapinghub/splash>`__ project, we aim to
provide a drop-in replacement for the splash service by adhering to their documented API.

It is currently in early alpha and still being heavily developed. Please use the issue tracker
to track the progress towards beta. For now, the required milestone can be summarised as
implementing the entire Splash API.

Python 3.6 asyncio driver for Chrome devtools protocol

Chromewhip communicates with the Chrome process with our own asyncio driver.

Can bind events to concurrent commands, which is required for providing a robust HTTP service.

Some example code on how to use it:

importasyncioimportloggingfromchromewhipimportChromefromchromewhip.protocolimportpage,dom# see logging from chromewhiplogging.basicConfig(level=logging.DEBUG)HOST='127.0.0.1'PORT=9222loop=asyncio.get_event_loop()c=Chrome(host=HOST,port=PORT)loop.run_until_complete(c.connect())tab=c.tabs[0]loop.run_until_complete(tab.enable_page_events())cmd=page.Page.navigate(url='http://nzherald.co.nz')# send_command will return once the frameStoppedLoading event is received THAT matches# the frameId that it is in the returned command payload.await_on_event_type=page.FrameStoppedLoadingEventresult=loop.run_until_complete(tab.send_command(cmd,await_on_event_type=await_on_event_type))# send_command always returns a dict with keys `ack` and `event`# `ack` contains the payload on response of a command# `event` contains the payload of the awaited event if `await_on_event_type` is providedack=result['ack']['result']event=result['event']assertack['frameId']==event.frameIdcmd=page.Page.setDeviceMetricsOverride(width=800,height=600,deviceScaleFactor=0.0,mobile=False,fitWindow=False)loop.run_until_complete(tab.send_command(cmd))result=loop.run_until_complete(tab.send_command(dom.DOM.getDocument()))dom_obj=result['ack']['result']['root']# Python types are determined by the `types` fields in the JSON reference for the# devtools protocol, and `send_command` will convert if possible.assertisinstance(dom_obj,dom.Node)print(dom_obj.nodeId)print(dom_obj.nodeName)

‘viewport’ parameter is more important for PNG and JPEG rendering; it is supported for all
rendering endpoints because javascript code execution can depend on viewport size.

/render.png

Query params (including render.html):

render_all : int : optional

Possible values are 1 and 0. When render_all=1, extend the viewport to include the
whole webpage (possibly very tall) before rendering.

Why not just use Selenium?

chromewhip uses the devtools protocol instead of the json wire protocol, where the devtools
protocol has greater flexibility, especially when it comes to subscribing to granular events from
the browser.

Bug reports and requests

Please simply file one using the Github tracker

Contributing

Please :)

Implementation

Developed to run on Python 3.6, it leverages both aiohttp and asyncio for the implementation
of the asynchronous HTTP server that wraps chrome.