automatically adjust scrapy to the optimum crawling speed, so the user
doesn’t have to tune the download delays to find the optimum one.
The user only needs to specify the maximum concurrent requests
it allows, and the extension does the rest.

AutoThrottle extension adjusts download delays dynamically to make spider send
AUTOTHROTTLE_TARGET_CONCURRENCY concurrent requests on average
to each remote website.

It uses download latency to compute the delays. The main idea is the
following: if a server needs latency seconds to respond, a client
should send a request each latency/N seconds to have N requests
processed in parallel.

because the download delay is small there will be occasional bursts
of requests;

often non-200 (error) responses can be returned faster than regular
responses, so with a small download delay and a hard concurrency limit
crawler will be sending requests to server faster when server starts to
return errors. But this is an opposite of what crawler should do - in case
of errors it makes more sense to slow down: these errors may be caused by
the high request rate.

In Scrapy, the download latency is measured as the time elapsed between
establishing the TCP connection and receiving the HTTP headers.

Note that these latencies are very hard to measure accurately in a cooperative
multitasking environment because Scrapy may be busy processing a spider
callback, for example, and unable to attend downloads. However, these latencies
should still give a reasonable estimate of how busy Scrapy (and ultimately, the
server) is, and this extension builds on that premise.

Average number of requests Scrapy should be sending in parallel to remote
websites.

By default, AutoThrottle adjusts the delay to send a single
concurrent request to each of the remote websites. Set this option to
a higher value (e.g. 2.0) to increase the throughput and the load on remote
servers. A lower AUTOTHROTTLE_TARGET_CONCURRENCY value
(e.g. 0.5) makes the crawler more conservative and polite.