Navigation

Typically, Request objects are generated in the spiders and pass
across the system until they reach the Downloader, which executes the request
and returns a Response object which travels back to the spider that
issued the request.

A Request object represents an HTTP request, which is usually
generated in the Spider and executed by the Downloader, and thus generating
a Response.

Parameters:

url (string) – the URL of this request

method (string) – the HTTP method of this request. Defaults to 'GET'.

meta (dict) – the initial values for the Request.meta attribute. If
given, the dict passed in this parameter will be shallow copied.

body (str or unicode) – the request body. If a unicode is passed, then it’s encoded to
str using the encoding passed (which defaults to utf-8). If
body is not given,, an empty string is stored. Regardless of the
type of this argument, the final value stored will be a str` (never
unicode or None).

headers (dict) – the headers of this request. The dict values can be strings
(for single valued headers) or lists (for multi-valued headers).

When some site returns cookies (in a response) those are stored in the
cookies for that domain and will be sent again in future requests. That’s
the typical behaviour of any regular web browser. However, if, for some
reason, you want to avoid merging with existing cookies you can instruct
Scrapy to do so by setting the dont_merge_cookies key in the
Request.meta.

encoding (string) – the encoding of this request (defaults to 'utf-8').
This encoding will be used to percent-encode the URL and to convert the
body to str (if given as unicode).

priority (int) – the priority of this request (defaults to 0).
The priority is used by the scheduler to define the order used to process
requests.

dont_filter (boolean) – indicates that this request should not be filtered by
the scheduler. This is used when you want to perform an identical
request multiple times, to ignore the duplicates filter. Use it with
care, or you will get into crawling loops. Default to False.

callback (callable) – the function that will be called with the response of this
request (once its downloaded) as its first parameter. For more information
see Passing additional data to callback functions below.
If a Request doesn’t specify a callback, the spider’s
parse() method will be used.

errback (callable) – a function that will be called if any exception was
raised while processing the request. This includes pages that failed
with 404 HTTP errors and such. It receives a Twisted Failure instance
as first parameter.

A dict that contains arbitrary metadata for this request. This dict is
empty for new Requests, and is usually populated by different Scrapy
components (extensions, middlewares, etc). So the data contained in this
dict depends on the extensions you have enabled.

Return a Request object with the same members, except for those members
given new values by whichever keyword arguments are specified. The
attribute Request.meta is copied by default (unless a new value
is given in the meta argument). See also
Passing additional data to callback functions.

The callback of a request is a function that will be called when the response
of that request is downloaded. The callback function will be called with the
downloaded Response object as its first argument.

Example:

defparse_page1(self,response):returnRequest("http://www.example.com/some_page.html",callback=self.parse_page2)defparse_page2(self,response):# this would log http://www.example.com/some_page.htmlself.log("Visited %s"%response.url)

In some cases you may be interested in passing arguments to those callback
functions so you can receive the arguments later, in the second callback. You
can use the Request.meta attribute for that.

Here’s an example of how to pass an item using this mechanism, to populate
different fields from different pages:

The FormRequest class extends the base Request with functionality for
dealing with HTML forms. It uses the ClientForm library (bundled with
Scrapy) to pre-populate form fields with form data from Response
objects.

Keep in mind that this method is implemented using ClientForm whose
policy is to automatically simulate a click, by default, on any form
control that looks clickable, like a <inputtype="submit">. Even
though this is quite convenient, and often the desired behaviour,
sometimes it can cause problems which could be hard to debug. For
example, when working with forms that are filled and/or submitted using
javascript, the default from_response() (and ClientForm)
behaviour may not be the most appropiate. To disable this behaviour you
can set the dont_click argument to True. Also, if you want to
change the control clicked (instead of disabling it) you can also use
the clickdata argument.

Parameters:

response (Response object) – the response containing a HTML form which will be used
to pre-populate the form fields

formname (string) – if given, the form with name attribute set to this value
will be used. Otherwise, formnumber will be used for selecting
the form.

formnumber (integer) – the number of form to use, when the response contains
multiple forms. The first one (and also the default) is 0.

formdata (dict) – fields to override in the form data. If a field was
already present in the response <form> element, its value is
overridden by the one passed in this parameter.

clickdata (dict) – Arguments to be passed directly to the ClientForm
click_request_data() method. See ClientForm homepage for
more info.

dont_click (boolean) – If True, the form data will be sumbitted without
clicking in any element.

The other parameters of this class method are passed directly to the
FormRequest constructor.

It is usual for web sites to provide pre-populated form fields through <inputtype="hidden"> elements, such as session related data or authentication
tokens (for login pages). When scraping, you’ll want these fields to be
automatically pre-populated and only override a couple of them, such as the
user name and password. You can use the FormRequest.from_response()
method for this job. Here’s an example spider which uses it:

The Request object that generated this response. This attribute is
assigned in the Scrapy engine, after the response and the request have passed
through all Downloader Middlewares.
In particular, this means that:

HTTP redirections will cause the original request (to the URL before
redirection) to be assigned to the redirected response (with the final
URL after redirection).

Response.request.url doesn’t always equal Response.url

This attribute is only available in the spider code, and in the
Spider Middlewares, but not in
Downloader Middlewares (although you have the Request available there by
other means) and handlers of the response_downloaded signal.

A list that contains flags for this response. Flags are labels used for
tagging Responses. For example: ‘cached’, ‘redirected‘, etc. And
they’re shown on the string representation of the Response (__str__
method) which is used by the engine for logging.

Returns a Response object with the same members, except for those members
given new values by whichever keyword arguments are specified. The
attribute Response.meta is copied by default (unless a new value
is given in the meta argument).

TextResponse objects adds encoding capabilities to the base
Response class, which is meant to be used only for binary data,
such as images, sounds or any media file.

TextResponse objects support a new constructor argument, in
addition to the base Response objects. The remaining functionality
is the same as for the Response class and is not documented here.

Parameters:

encoding (string) – is a string which contains the encoding to use for this
response. If you create a TextResponse object with a unicode
body, it will be encoded using this encoding (remember the body attribute
is always a string). If encoding is None (default value), the
encoding will be looked up in the response headers and body instead.