Make it possible to cancel getPage and downloadPage

Description

Currently there is no way to cancel the deferreds returned by getPage or downloadPage. The attached patch adds a canceller to these deferreds, so calling d.cancel() will actually close the connection and fail with CancelledError. The implementation is similar to twisted.web.xmlrpc.Proxy.callRemote, with a different way of calling the errback.

I had to change HTTPClientFactory.clientConnectionFailed, because even though the docstring suggests it should be possible to send a result to the deferred before this method is called, it wouldn't fire _disconnectedDeferred in that case and the user-specified callback/errback would never be actually called.

This branch looks pretty good; coding-standard compliant, starting with unit tests. However, it has one significant flaw in its test cases. They use the real reactor and wait for real time. To be fair, they took from a bad example sitting right next to them, but the 10 second delay is noticeably slow when running the tests. Tests should always run quickly; it's important to be able to get through the whole suite as fast as is reasonable, so development doesn't bog down in endless waiting.

Rather than just shorten the delay though, I suggest you follow the example ​set forth in the trial tutorial of how to test code that relies on time passing, with twisted.internet.task.Clock.

Please re-submit for review when you have tuned up the unit tests, and submit a patch against the branch I created rather than trunk.

I'm not 100% sure now, but I think it did not wait the full 10 seconds when I tested it. I wouldn't have submitted the patch with that. It does slow the tests now, so I'll fix it and add the new patch.

Ok, it turns out the patch wasn't correctly merged and the web-specific canceller wasn't called at all. However, the default canceller from Deferred was used, so it still raised the CancelledError exception. See the new patch with the missing change.

This means that the code it not in fact properly tested, but I'm afraid I do not know how to test it. The end result, with or without the specific canceller implementation, is the same -- the deferred raises CancelledError. The only thing that comes to my mind is a time-based test, e.g. if the client timeout is 10 seconds (can be lowered) and it took 10 or more seconds to raise the CancelledError exception, consider it a failure.

Do you have any ideas how to fix the tests?

Another problem is the test_getPageCancelLater which calls callLater. The getPage API doesn't have any kind of hook, where I could detect that I'm in the middle of a request, but it's not finished yet. I used 1ms as an estimate, but it's not an explicit test.

I think the Clock example will not help me, because when using getPage, I do not have access to the low level protocol and I'm not really testing timeouts. I'm testing if I can disconnect from the server in various stages of the connection and I want the disconnection to happen almost immediately after I call d.cancel().

The way to test this is to use a fake reactor (probably subclassing twisted.test.proto_helpers.InMemoryReactor and Clock). That way you can manually push the system through various stages and make sure cancellation happens correctly in each particular phase. For example, to test cancellation after connection was made, you would call buildProtocol on the factory in the reactor, and attach a proto_helpers.StringTransport to the protocol, then call .cancel(). Then you can write a deterministic, fast test for each possible place where cancellation may happen.

As you say, there's no way to get to the reactor right now. Therefore what you'll need to do is add an extra parameter to twisted.web.client.getPage, namely reactor, which defaults to current reactor and used by both getPage and the various classes (factory and protocol) it is integrated with.