Description

Change History (14)

From the end of a recent buildbot run:
twisted.trial.test.test_failure_formatting.TestFailureFormatting.testFormatErroredMethod
...
command timed out: 1200 seconds without output, killing pid 27106
process killed by signal 9
program finished with exit code -1
Suspiciously, the end of the test.log has this traceback:
2005/11/01 11:43 EST [-] Traceback (most recent call last):
File "/usr/lib/python2.2/threading.py", line 402, in run
apply(self.__target, self.__args, self.__kwargs)
File "/home/buildbot/run/full2.2/Twisted/twisted/python/threadpool.py", line
149, in _worker
context.call(ctx, function, *args, **kwargs)
File "/home/buildbot/run/full2.2/Twisted/twisted/python/context.py", line 59,
in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/home/buildbot/run/full2.2/Twisted/twisted/python/context.py", line 37,
in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "/home/buildbot/run/full2.2/Twisted/twisted/internet/threads.py", line
25, in _putResultInDeferred
result = f(*args, **kwargs)
File "/home/buildbot/run/full2.2/Twisted/twisted/enterprise/adbapi.py", line
380, in _runInteraction
conn.rollback()
File "/home/buildbot/run/full2.2/Twisted/twisted/enterprise/adbapi.py", line
46, in rollback
self._connection.rollback()
File "/usr/lib/python2.2/site-packages/pyPgSQL/PgSQL.py", line 2542, in rollback
if self.__closeCursors():
File "/usr/lib/python2.2/site-packages/pyPgSQL/PgSQL.py", line 2442, in
__closeCursors
i._Cursor__reset()
exceptions.AttributeError: 'NoneType' object has no attribute '_Cursor__reset'
Which seems to suggest a failure to isolate the adbapi tests from others in the
suite, and may have caused the hang. Or it might be unrelated.

Without information on the platform and without details on how trial was
invoked, I can't really debug this. Please let me know if it happens again.
My hunch is that the error occured in threading cleanup code.

Worry and stress and bite our nails.
Seriously: not much. Maybe something will come up that sheds more light on the
issue. If it does, then maybe some progress can be made.
It does look like it is thread-related. Trial does not properly deal with the
thread queue between test methods, so if that bug were to be fixed, I would be
sufficiently satisfied to see this issue resolved as well.

Trial has this code in twisted/trial/util.py:
def do_cleanThreads(cls):
from twisted.internet import reactor
if interfaces.IReactorThreads.providedBy(reactor):
reactor.suggestThreadPoolSize(0)
if hasattr(reactor, 'threadpool') and reactor.threadpool:
reactor.threadpool.stop()
reactor.threadpool = None
This is currently run at the end of each class. I don't think we can
realistically run it at the end of each method, because there may be threads set
up as part of a setUpClass call. (Assuming of course that this code properly
deals with the thread queue)

Cleanup is certainly a difficult problem.
There are at least two problems with the current approach (and this applies to
pretty much all the cleanup trial does):
By doing it at the end of each class, an unnecessary and undesirable measure of
isolation is lost, since it only protects threads created as part of setUpClass,
and this certainly does not account for all possible threads (nor even all
actual threads).
Threads created in a *greater* scope than setUpClass (for example, those
belonging to a reporter) are destroyed here, which is completely wrong.
Fortunately trial has been so hideously broken in the recent past that no code
could possibly have done anything *nearly* so interesting as use threads in a
reporter, but this should not be the perpetual state of things.
I have suggested in the past that the only way to accurately track resources for
cleanup is to wrap the entire reactor, rather than handing it directly to test
methods. This still seems like the only viable course of action in order to
achieve *correctly* working cleanup code, both for threads and otherwise.

"I have suggested in the past that the only way to accurately track resources
for cleanup is to wrap the entire reactor, rather than handing it directly to
test methods."
I'm not sure what this means. Do you mean running all of the tests inside a
single reactor.run()?
Can you please elaborate more on how this helps Trial properly deal with the
thread queue between tests?

Sorry, what I suggested is this:
Trial implements a reactor. All Trial's reactor implementation does is wrap a
real reactor (whichever is specified). All calls into the reactor are noted and
tracked. When a test method is finished, any event sources it created but did
not clean up are marked as errors and cleaned up. When a test class is
finished, any event sources its setUpClass created but did not clean up are
marked as errors and cleaned up. Anything that is trial internal (say, a
connection to a remote observer used by the reporter) can be ignored safely, and
object lifetimes can be discovered correctly.
This is not a silver bullet. It will be tedious to implement. There are cases
it will not automatically handle and will need to be assisted with. I think a
few simple heuristics can cover most circumstances, but trial may also need an
annotation API to explicitly delineate certain edge cases.