Created attachment 718249[details]
GDB session log that shows stacktrace and some values
During the test run of thunderbird using "make mozmill" test, I have
encountered uninitialized memory usage errors T
where a service is called via XPCOM interface.
It looks to me that JavaScript interpreter tries to import a value,
created *OUTSIDE* the interpreter itself, into the
internal data structure of JavaScript interpreter at runtime. But
unfortunately, despite the data tag telling the interpreter to handle
it as double, the passed value is actually uninitialized garbage.
I wonder how I can debug this further. It seems to occur
at the interfade of XPCOM -> JavaScript.
Background:
I noticed uninitialized memory usage errors when I ran
TB (compiled source from comm-central under 32-bit Debian GNU/Linux).
I ran the test "make mozmill" by running TB under valgrind/memcheck
and encounted the problem(s) reported here.
Since I could not load all the symbol files using --read-var-info due
to the limited 32 bits memory space, and obtained detailed enough
information about the variable or memory buffer, etc. where the
uninitialized memory is located, I inserted valgrind probe into
VALGRIND_CHECK_MEM_IS_DEFINED() (should return 0 if memory is valid
and defined) suspicious areas so that I can abort() under "make
mozmill" test framework if valgrind determines that an uninitialized
value is used. This makes it easier for me to trace exactly what
values are used, etc. "make mozmill" test framework waits for five
minutes before running the next target, and in that 5 minutes, I can
use remote gdb to attach to the aborted process image and look at
stack and the like.
Ok, this is the background.
After many debug sessions, one problem still persist and is for real
while one other mysterious case noticed earlier disappeared during
investigation. Maybe the latest source changes in the last couple of
weeks solved it (or GCC miscompilation is now avoided due to subtle
code changes due to the insertion of valgrind probes. If so, this is a
problem. I will investitage this further and report later on about
this issue.) This disappearing case(s) kept me from reporting the problem here until today.
Problem:
It turns out that Java interpreter tries to convert non-initialized 8
byte area into a double in NativeData2JS routine.
The problem is noticed in the following routine DOUBLE_TO_JSVAL(). This is an inline function in mozilla/js/public/Value.h
static MOZ_ALWAYS_INLINE jsval
DOUBLE_TO_JSVAL(double d)
{
/*
* This is a manually inlined version of:
* d = JS_CANONICALIZE_NAN(d);
* return IMPL_TO_JSVAL(DOUBLE_TO_JSVAL_IMPL(d));
* because GCC from XCode 3.1.4 miscompiles the above code.
*/
jsval_layout l;
*=> MOZ_ASSERT(VALGRIND_CHECK_MEM_IS_DEFINED(&d, sizeof(d)) == 0);
if (MOZ_UNLIKELY(d != d))
l.asBits = 0x7FF8000000000000LL;
else
l.asDouble = d;
return IMPL_TO_JSVAL(l);
}
Note my insertion of MOZ_ASSERT() and VALGRIND_CHECK_MEM_IS_DEFINED()
macros in the quoted code above.
Basically the problem is the passed value d is uninitialized.
[Now that I think about it, can "d != d" check have something to do with the
uninitialized value issue? The way code is written I suspect it must be related to
some sort of NaN handling. Hmm...]
The original comment in the source file makes me wonder if there is
another GCC miscompilation issue, but I think this time the problem
occurs earlier than the call to this function itself and seems to be
related to an uninitialized memory area that is passed by XPCOM
interface (and the data probably can be traced to the remote service
that sits behind the XPCOM interface mechanism.)
The above routine is called in XPCConvert::NatvieDATA2JS() [
mozilla/js/xpconnect/src/XPCConvert.cpp ]
JSBool
XPCConvert::NativeData2JS(XPCLazyCallContext& lccx, jsval* d, const void* s,
const nsXPTType& type, const nsID* iid, nsresult* pErr)
{
NS_PRECONDITION(s, "bad param");
NS_PRECONDITION(d, "bad param");
JSContext* cx = lccx.GetJSContext();
// Allow wrong compartment or unset ScopeForNewObject when the caller knows
// the value is primitive (viz., XPCNativeMember::GetConstantValue).
NS_ABORT_IF_FALSE(type.IsArithmetic() ||
js::IsObjectInContextCompartment(lccx.GetScopeForNewJSObjects(), cx),
"bad scope for new JSObjects");
if (pErr)
*pErr = NS_ERROR_XPC_BAD_CONVERT_NATIVE;
switch (type.TagPart()) {
case nsXPTType::T_I8 : *d = INT_TO_JSVAL(int32_t(*((int8_t*)s))); break;
case nsXPTType::T_I16 : *d = INT_TO_JSVAL(int32_t(*((int16_t*)s))); break;
case nsXPTType::T_I32 : *d = INT_TO_JSVAL(*((int32_t*)s)); break;
Here=> case nsXPTType::T_I64 : *d = DOUBLE_TO_JSVAL(double(*((int64_t*)s))); break;
case nsXPTType::T_U8 : *d = INT_TO_JSVAL(int32_t(*((uint8_t*)s))); break;
case nsXPTType::T_U16 : *d = INT_TO_JSVAL(int32_t(*((uint16_t*)s))); break;
case nsXPTType::T_U32 : *d = UINT_TO_JSVAL(*((uint32_t*)s)); break;
case nsXPTType::T_U64 : *d = DOUBLE_TO_JSVAL(double(*((uint64_t*)s))); break;
case nsXPTType::T_FLOAT : *d = DOUBLE_TO_JSVAL(*((float*)s)); break;
case nsXPTType::T_DOUBLE: *d = DOUBLE_TO_JSVAL(*((double*)s)); break;
case nsXPTType::T_BOOL :
{
bool b = *((bool*)s);
When the problem is noticed that (void*) |s| points to an uninitialized malloc'ed area consisting of 8 octets.
I upload the ellided gdb session record that allowed me to figure out what was happening as attachment.
Up a few call chain, I see the call to a function, CallMethodHelper::GatherAndConvertResults() in
mozilla/js/xpconnect/src/XPCWrappedNative.cpp
and the file has a comment near the beginning
/* Wrapper object for reflecting native xpcom objects into JavaScript. */
So I think the problem is in the interface between XPCOM and JavaScript. Or that the routine on the other side of XPCOM forgets to return a valid value and instead leaves uninitialized memory in the data passed to the XPCOM interface. Ugh, really hard bug to debug.
In my log, I see three such occurrences of the uninitialized
usage, and all these problems originate from
GatherAndConvertResults(). I only checked the first abort() using gdb
so far. But I am farily sure that the conversion routine is the culprit in all cases from the previous session logs. (Except for the mysterious disappearing case I earlier mentioned.)
In my use of valgrind, I instruct valgrind to fill newly malloc'ed area with 0xA5: the options passed valgrind were
"--malloc-fill=0xA5 "
"--free-fill=0xC3 "
(Note 0xA5 = 0245. This value is printed when accessing non-initialized octet.)
# This is the compilation option in my MOZCONFIG which is used for
# compling TB. (There are other options, but basically this is it.)
# I also enabled debug. So TB is debug build version.
ac_add_options --enable-valgrind
ac_add_options --enable-optimize="-g -O -freorder-blocks"
ac_add_options --disable-jemalloc
I set the following before running "make mozmill". I found the tip in mozilla web page for running valgrind/memcheck for mozilla software.
# G_SLICE=always-malloc
export G_SLICE
Version of comm-central I am using:
/home/ishikawa/TB-NEW/TB-3HG/new-src
hg identify
db8918e79907+ tip
/home/ishikawa/TB-NEW/TB-3HG/new-src/mozilla
hg identify
3b0b1092f9b4+ tip
Here is the back trace and values of a few variables for debugging from within gdb.
If anyone has a good idea how to debug this further BEYOND this XPCOM interface, I will appreciate it very much.
I want to know WHAT JS file, what line, etc., the JavaScript interpreter is calling, etc. Maybe this time, we need to learn WHAT SERVICE is invoked using XPCOM interface?
In a follow-up to an earlier news post, it was suggested
that calling DumpJSStack() may give us a clue.
(mozilla.dev.platform <69WdnZv7fKpGqbbMnZ2dnUVZ_uOdnZ2d@mozilla.org>)
As for service, I think the particular instance which is investigated
by the gdb session must be related to
==27358== Uninitialised value was created by a heap allocation
==27358== at 0x40273B8: malloc (vg_replace_malloc.c:270)
==27358== by 0x404BE13: moz_xmalloc (mozalloc.cpp:54)
==27358== by 0xA4563AF: nsNewsDatabaseConstructor(nsISupports*, nsID const&, void**) (mozalloc.h:200)
==27358== by 0xAD86523: mozilla::GenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (GenericFactory.cpp:16)
==27358== by 0xAD7ADC9: nsCreateInstanceByContractID::operator()(nsID const&, void**) const (nsComponentManagerUtils.cpp:178)
==27358== by 0xA461165: nsCOMPtr<nsIMsgDatabase>::assign_from_helper(nsCOMPtr_helper const&, nsID const&) (nsCOMPtr.h:1226)
==27358== by 0xA6DF17F: nsMsgDBService::CreateNewDB(nsIMsgFolder*, nsIMsgDatabase**) (nsCOMPtr.h:605)
I suspect somewhere in the nsMsgDBService, a double is returned, but where?
I will investigate the use of DumpJSStack() and report the
progress later, but if there are other tips in debugging
XPCOM -> JavaScript interface, I will appreciate it very much.
I would like my mailer to be rock solid.
Seeing that it generates uninitialized memory usage warnings is not
very reassuring, is it?
TIA

The cause of this appears to be straightforward.
nsMsgDatabase has an m_lastUseTime member which does not appear to get initialised.
So if someone calls GetLastUseTime before SetLastUseTime or GetMsgHdrForKey then they will get an uninitialised read.
The fix would appear to be to initialise m_lastUseTime, but I'm not sure whether to initialise it to 0 or to gLastUseTime.
Also, thanks to chiaki for finding this bug.

Created attachment 720200[details][diff][review]
A patch to call RemeberLastUseTime in CreateNewDB
Hi,
I have been running "make mozmill" test with the attached patch.
I have figoured out that the m_lastUseTime is not set when a new DB is
created and thus the problem described by Neil.
The patch seems to have cured the particular problem. The test is still running (sinceI am running TB under valgrind, it is slow.)
There were three such un-initialised cases noticed during the previous test run.
One down, two more to go.
So if the remaining two would not show up in the log, I think, we can claim we fixed it.
I would like to thank Neil again for the helpful tips for calling DumpJSStack()
in strategic place.
TIA

Hi, (In reply to ISHIKAWA, chiaki from comment #4)
> Created attachment 720200[details][diff][review]> One down, two more to go.
> So if the remaining two would not show up in the log, I think, we can claim
> we fixed it.
The particular problem about uninitialized value usage is gone.
I have finished running "make mozmill" of TB under valgrind/memcheck.
(I have noticed that another bug is now recognized in check() routine, which
probably was masked by this earlier occurrence of the uninitialised value error. I will file the bugzilla entries for them later).
So who should be asked to do the review for this patch?
TIA

Comment on attachment 720200[details][diff][review]
A patch to call RemeberLastUseTime in CreateNewDB
I am setting David as the reviewer since the particular position seems to be
last modified by him.
The patch fixed the particular issue of undefined value accessed during "make mozmill" run of comm-central TB under valgrind/memgrind.
TIA