The Scrapy shell is an interactive shell where you can try and debug your
scraping code very quickly, without having to run the spider. It’s meant to be
used for testing data extraction code, but you can actually use it for testing
any kind of code as it is also a regular Python shell.

The shell is used for testing XPath or CSS expressions and see how they work
and what data they extract from the web pages you’re trying to scrape. It
allows you to interactively test your expressions while you’re writing your
spider, without having to run the spider to test every change.

Once you get familiarized with the Scrapy shell, you’ll see that it’s an
invaluable tool for developing and debugging your spiders.

If you have IPython installed, the Scrapy shell will use it (instead of the
standard Python console). The IPython console is much more powerful and
provides smart auto-completion and colorized output, among other things.

Scrapy also has support for bpython, and will try to use it where IPython
is unavailable.

Through scrapy’s settings you can configure it to use any one of
ipython, bpython or the standard python shell, regardless of which
are installed. This is done by setting the SCRAPY_PYTHON_SHELL environment
variable; or by defining it in your scrapy.cfg:

shelp() - print a help with the list of available objects and shortcuts

fetch(url[,redirect=True]) - fetch a new response from the given
URL and update all related objects accordingly. You can optionaly ask for
HTTP 3xx redirections to not be followed by passing redirect=False

fetch(request) - fetch a new response from the given request and
update all related objects accordingly.

view(response) - open the given response in your local web browser, for
inspection. This will add a <base> tag to the response body in order
for external links (such as images and style sheets) to display properly.
Note, however, that this will create a temporary file in your computer,
which won’t be removed automatically.

Here’s an example of a typical shell session where we start by scraping the
https://scrapy.org page, and then proceed to scrape the https://reddit.com
page. Finally, we modify the (Reddit) request method to POST and re-fetch it
getting an error. We end the session by typing Ctrl-D (in Unix systems) or
Ctrl-Z in Windows.

Keep in mind that the data extracted here may not be the same when you try it,
as those pages are not static and could have changed by the time you test this.
The only purpose of this example is to get you familiarized with how the Scrapy
shell works.

First, we launch the shell:

scrapyshell'https://scrapy.org'--nolog

Then, the shell fetches the URL (using the Scrapy downloader) and prints the
list of available objects and useful shortcuts (you’ll notice that these lines
all start with the [s] prefix):

Sometimes you want to inspect the responses that are being processed in a
certain point of your spider, if only to check that response you expect is
getting there.

This can be achieved by using the scrapy.shell.inspect_response function.

Here’s an example of how you would call it from your spider:

importscrapyclassMySpider(scrapy.Spider):name="myspider"start_urls=["http://example.com","http://example.org","http://example.net",]defparse(self,response):# We want to inspect one specific response.if".org"inresponse.url:fromscrapy.shellimportinspect_responseinspect_response(response,self)# Rest of parsing code.