Scrapy comes with a built-in web service for monitoring and controlling a
running crawler. The service exposes most resources using the JSON-RPC 2.0
protocol, but there are also other (read-only) resources which just output JSON
data.

The web service contains several resources, defined in the
WEBSERVICE_RESOURCES setting. Each resource provides a different
functionality. See Available JSON-RPC resources for a list of
resources available by default.

Although you can implement your own resources using any protocol, there are
two kinds of resources bundled with Scrapy:

The list of web service resources available by default in Scrapy. You shouldn’t
change this setting in your project, change WEBSERVICE_RESOURCES
instead. If you want to disable some resource set its value to None in
WEBSERVICE_RESOURCES.

The name by which the Scrapy web service will known this resource, and
also the path where this resource will listen. For example, assuming
Scrapy web service is listening on http://localhost:6080/ and the
ws_name is 'resource1' the URL for that resource will be:

This is a subclass of JsonResource for implementing JSON-RPC
resources. JSON-RPC resources wrap Python (Scrapy) objects around a
JSON-RPC API. The resource wrapped must be returned by the
get_target() method, which returns the target passed in the
constructor by default

#!/usr/bin/env python"""Example script to control a Scrapy server using its JSON-RPC web service.It only provides a reduced functionality as its main purpose is to illustratehow to write a web service client. Feel free to improve or write you own.Also, keep in mind that the JSON-RPC API is not stable. The recommended way forcontrolling a Scrapy server is through the execution queue (see the "queue"command)."""importsys,optparse,urllib,jsonfromurlparseimporturljoinfromscrapy.utils.jsonrpcimportjsonrpc_client_call,JsonRpcErrordefget_commands():return{'help':cmd_help,'stop':cmd_stop,'list-available':cmd_list_available,'list-running':cmd_list_running,'list-resources':cmd_list_resources,'get-global-stats':cmd_get_global_stats,'get-spider-stats':cmd_get_spider_stats,}defcmd_help(args,opts):"""help - list available commands"""print"Available commands:"for_,funcinsorted(get_commands().items()):print" ",func.__doc__defcmd_stop(args,opts):"""stop <spider> - stop a running spider"""jsonrpc_call(opts,'crawler/engine','close_spider',args[0])defcmd_list_running(args,opts):"""list-running - list running spiders"""forxinjson_get(opts,'crawler/engine/open_spiders'):printxdefcmd_list_available(args,opts):"""list-available - list name of available spiders"""forxinjsonrpc_call(opts,'crawler/spiders','list'):printxdefcmd_list_resources(args,opts):"""list-resources - list available web service resources"""forxinjson_get(opts,'')['resources']:printxdefcmd_get_spider_stats(args,opts):"""get-spider-stats <spider> - get stats of a running spider"""stats=jsonrpc_call(opts,'stats','get_stats',args[0])forname,valueinstats.items():print"%-40s%s"%(name,value)defcmd_get_global_stats(args,opts):"""get-global-stats - get global stats"""stats=jsonrpc_call(opts,'stats','get_stats')forname,valueinstats.items():print"%-40s%s"%(name,value)defget_wsurl(opts,path):returnurljoin("http://%s:%s/"%(opts.host,opts.port),path)defjsonrpc_call(opts,path,method,*args,**kwargs):url=get_wsurl(opts,path)returnjsonrpc_client_call(url,method,*args,**kwargs)defjson_get(opts,path):url=get_wsurl(opts,path)returnjson.loads(urllib.urlopen(url).read())defparse_opts():usage="%prog [options] <command> [arg] ..."description="Scrapy web service control script. Use '%prog help' " \
"to see the list of available commands."op=optparse.OptionParser(usage=usage,description=description)op.add_option("-H",dest="host",default="localhost", \
help="Scrapy host to connect to")op.add_option("-P",dest="port",type="int",default=6080, \
help="Scrapy port to connect to")opts,args=op.parse_args()ifnotargs:op.print_help()sys.exit(2)cmdname,cmdargs,opts=args[0],args[1:],optscommands=get_commands()ifcmdnamenotincommands:sys.stderr.write("Unknown command: %s\n\n"%cmdname)cmd_help(None,None)sys.exit(1)returncommands[cmdname],cmdargs,optsdefmain():cmd,args,opts=parse_opts()try:cmd(args,opts)exceptIndexError:printcmd.__doc__exceptJsonRpcError,e:printstr(e)ife.data:print"Server Traceback below:"printe.dataif__name__=='__main__':main()