Here is what I found just by analyzing the logs. It seems the first
failures appeared after this change:
http://svn.python.org/view/python/branches/release30-maint/Objects/object.c?rev=67888&view=diff&r1=67888&r2=67887&p1=python/branches/release30-maint/Objects/object.c&p2=/python/branches/release30-maint/Objects/object.c
The logs of failing test runs all shows the same error message:
[31481 refs]
* ob
object : <refcnt 0 at 0x3a97728>
type : str
refcount: 0
address : 0x3a97728
* op->_ob_prev->_ob_next
object : <refcnt 0 at 0x3a97728>
type : str
refcount: 0
address : 0x3a97728
* op->_ob_next->_ob_prev
object : [31776 refs]
This is the output of _Py_ForgetReference (which calls _PyObject_Dump)
called either from _PyUnicode_New or unicode_subtype_new. In both
cases, this implies PyObject_MALLOC returned NULL when allocating the
internal array of a str object. However, I have no idea why malloc()
is failing there.
By counting the number of [reftotal] printed in the log, I found that
the failing test could be one of the following: test_invalid_args,
test_invalid_bufsize, test_list2cmdline, test_no_leaking. Looking at
the tests, it seems only test_no_leaking could be problematic:
* test_list2cmdline checks if the subprocess.line2cmdline function
works correctly, only Python code is involved here;
* test_invalid_args checks if using an option unsupported by a
platform raises an
exception, only Python code is involved here;
* test_invalid_bufsize only checks whether Popen rejects non-integer
bufsize, only
Python code is involved here.
And unsurprisingly, that is the failing test:
test test_subprocess failed -- Traceback (most recent call last):
File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/test/test_subprocess.py",
line 423, in test_no_leaking
data = p.communicate(b"lime")[0]
File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py",
line 671, in communicate
return self._communicate(input)
File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py",
line 1171, in _communicate
bytes_written = os.write(self.stdin.fileno(), chunk)
OSError: [Errno 32] Broken pipe
It seems one of the spawned processes goes out of memory while
allocating a new PyUnicode object. I believe we don't see the usual
MemoryError because the parent process catches stderr and stdout of
the children.
Also, only klose-*-sparc buildbots are failing this way; loewis-sun is
failing too but for a different reason. So, how much memory is
available on this machine (or actually, on this virtual machine)?
Now, I wonder why manipulating the GIL caused the bug to appear in
3.0, but not in 2.x. Maybe it is related to the new I/O library in
Python 3.0.
Regards,
-- Alexandre
On Tue, Dec 30, 2008 at 4:20 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Does anyone have local access to a sparc machine to try to track down
> the ongoing buildbot failures in test_subprocess?
>> (I think the problem is specific to 3.x builds on sparc machines, but I
> haven't checked the buildbots all that closely - that assessment is just
> based on what I recall of the buildbot failure emails).
>> Cheers,
> Nick.
>> --
> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
> ---------------------------------------------------------------
> _______________________________________________
> Python-Dev mailing list
>Python-Dev at python.org>http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexandre%40peadrop.com>