Default compartments make our book keeping tricky in a number of ways:
* When we're in the default compartment, enterCompartmentDepth is zero but cx->compartment() is non-null.
* The default compartment on a cx can change at any time, even when we've already got code running on that cx.
* The default compartment is actually tracked by a "default compartment object", which is the outer window in Gecko. But this object gets transplanted, so even if it stays the same, the compartment we compute from it varies with time.
Meanwhile, saveFrameChain slurps up the current compartment and enterCompartmentDepth of a cx and pushes it onto a stack, putting the cx in whatever compartment currently holds the associated default compartment object. restoreFrameChain pops the stack and resets the previous compartment and enterCompartmentDepth.
This whole setup significantly complicates our attempt to mark all the live globals in the system. Currently, we do three things:
(A) (In the JS Engine): Mark the global associated with each frame on the stack.
(B) (In the JS Engine): Iterate over all the compartments, and mark those we've called enter() on.
(C) (In XPConnect): Iterate over all the contexts, and mark the global associated with the default compartment object.
This usually works. But it breaks in the following case:
1 - cx->enterCompartmentDepth is zero, and cx is in the default compartment. There is no code running on cx.
2 - We push cx, causing us to call saveFrameChain, stashing the tuple (0, defaultCompartmentObject->compartment()) onto the SavedFrameChain stack.
3 - We either call JS_SetGlobalObject(cx, someOtherObject), or transplant defaultCompartmentObject to another compartment.
4 - We pop cx, causing us to call restoreFrameChain, setting cx->enterCompartmentDepth to 0, and putting cx in the old compartment, which is no longer equal to cx->defaultCompartmentObject->compartment().
After (3), nothing is keeping the saved compartment alive (See A-C above):
(A) No code was running in (1), we don't have any stack frames to trace.
(B) We never call enter() on default compartments.
(C) The compartment doesn't match that of the defaultCompartmentObject on the cx, so the XPConnect hook misses it.
We can keep the compartment alive between (3) and (4) by calling enter() on all the compartments we push onto the SavedFrameChain stack (luke suggested this on monday). But once (4) happens, we'd have to call leave(), and still be screwed.
I then thought that we could check in saveFrameChain whether we either (a) have manually entered the current compartment or (b) have a non-empty scope-chain in the current compartment. If we have neither of the above, we'd set the compartment of the SavedFrameChain to null, and compute the default compartment when it comes time to actually use it. But if (b) is the case, then we're still in trouble, because as soon as the associated code is finished running, the cx will remain in that compartment, but we'll have no way to mark it (since none of A-C will apply once the last stack frame is popped).
In general, the solution here is to rip out default compartment objects, and I have a plan to do that. But I'd like to find a well-contained fix here that we can backport. Thoughts, luke?

Nice catch with the problem of enter()/leave() on save/restore.
This defaultCompartment stuff is madness. Agreed that the long-term fix is to remove it all, but how about this as an intermediate solution: we have cx->compartment = NULL by default (until a compartment is explicitly via AutoCompartment). This means adding "AutoCompartment ac(cx, GetDefaultGlobalForContext(cx)" in the few remaining places that depend on the defaultCompartmentObject. (So JS_SaveFrameChain would reset cx->compartment = NULL.)
This would also help out bug 722345 which wants to remove the request API to use "cx->compartment == NULL" in place of "cx->outstandingRequests > 0". I started to do this but I didn't know enough to pick the right compartment to enter in all cases; istr you saying this would be much easier now.

(In reply to Luke Wagner [:luke] from comment #1)
> This defaultCompartment stuff is madness. Agreed that the long-term fix is
> to remove it all
Yes.
> but how about this as an intermediate solution: we have
> cx->compartment = NULL by default (until a compartment is explicitly via
> AutoCompartment). This means adding "AutoCompartment ac(cx,
> GetDefaultGlobalForContext(cx)" in the few remaining places that depend on
> the defaultCompartmentObject. (So JS_SaveFrameChain would reset
> cx->compartment = NULL.)
This is tantamount to "removing it all", AFAICT. We'd either need to fix a bunch of consumers to use JSAutoCompartments, or fix a bunch of places in the JS engine to handle cx->compartment being null. I have a plan for doing the former, but I don't think it's easily backportable. Do you have any ideas for a more localized fix?

(In reply to Bobby Holley (:bholley) from comment #2)
> This is tantamount to "removing it all", AFAICT.
Well, except that there is still a GetDefaultGlobalForContext/JS_SetGlobalObject available. I thought that any path into the JS engine already had to funnel through one of a few of this CxPusher paths (which is where I assumed you'd add the AutoCompartment).
> We'd either need to fix a
> bunch of consumers to use JSAutoCompartments, or fix a bunch of places in
> the JS engine to handle cx->compartment being null.
The former is what I intended; you have to enter a compartment to enter the JS engine.
> I have a plan for doing
> the former, but I don't think it's easily backportable. Do you have any
> ideas for a more localized fix?
Nope, this is all pretty gnarly. OTOH, other than these ASan bugs, do you think this is crashing much in the wild? If not, letting the fix ride the trains seems preferable.

(In reply to Luke Wagner [:luke] from comment #3)
> (In reply to Bobby Holley (:bholley) from comment #2)
> > This is tantamount to "removing it all", AFAICT.
>
> Well, except that there is still a
> GetDefaultGlobalForContext/JS_SetGlobalObject available. I thought that any
> path into the JS engine already had to funnel through one of a few of this
> CxPusher paths (which is where I assumed you'd add the AutoCompartment).
As of recently yeah, but that doesn't help us on branches.
> > I have a plan for doing
> > the former, but I don't think it's easily backportable. Do you have any
> > ideas for a more localized fix?
>
> Nope, this is all pretty gnarly. OTOH, other than these ASan bugs, do you
> think this is crashing much in the wild? If not, letting the fix ride the
> trains seems preferable.
I'm concerned about letting a pretty exploitable gc hazard go unfixed for the lifetime union of all our supported branches. It was less detectable before when we were pushing/popping less. But now that it is, it seems pretty likely that somebody is going to found it. Nils found it, and a lot of those tools are open-source.
Is there a choke-point where the last frame pops off the callstack for a given cx, and would doing a little bit of work there be acceptable performance-wise? If so, we could do the following:
* If restoreFramechain pops a SavedFrameChain with enterCompartmentDepth == 0, it throws away the stashed compartment and uses the current defaultCompartmentObject, _unless_ there are frames on the stack.
* In the event of the above, we rely on code in the aforementioned choke point to reset cx->compartment based on the defaultCompartmentObject when the stack becomes empty.
I know the above is kind of gross, but the code complexity will be short-lived, since we'll follow it up with a removal of all this gunk. But if it's feasible from a performance standpoint, I think we should do it for the sake of branch security.

(In reply to Bobby Holley (:bholley) from comment #4)
> Is there a choke-point where the last frame pops off the callstack for a
> given cx, and would doing a little bit of work there be acceptable
> performance-wise? If so, we could do the following:
Yeah, on branches, ContextStack::popSegment() (if !cx->hasfp()), on trunk/aurora, Activation::~Activation() (if !currentlyRunning()).
> * If restoreFramechain pops a SavedFrameChain with enterCompartmentDepth ==
> 0, it throws away the stashed compartment and uses the current
> defaultCompartmentObject, _unless_ there are frames on the stack.
I didn't consider this an option because it would change observable behavior (what if someone assumed that, after the JS_RestoreFrameChain that the compartment was as it was before JS_SaveFrameChain? Maybe you understand more about this, but in the abstract this seems scary for a Beta branch patch.

(In reply to Luke Wagner [:luke] from comment #5)
> (In reply to Bobby Holley (:bholley) from comment #4)
> > Is there a choke-point where the last frame pops off the callstack for a
> > given cx, and would doing a little bit of work there be acceptable
> > performance-wise? If so, we could do the following:
>
> Yeah, on branches, ContextStack::popSegment() (if !cx->hasfp()), on
> trunk/aurora, Activation::~Activation() (if !currentlyRunning()).
Ok, I will give that a try.
> I didn't consider this an option because it would change observable behavior
> (what if someone assumed that, after the JS_RestoreFrameChain that the
> compartment was as it was before JS_SaveFrameChain? Maybe you understand
> more about this, but in the abstract this seems scary for a Beta branch
> patch.
Hm. I guess it might be an issue if someone was in the default compartment on a cx, allocated an object, then did a scoped push/pop of that cx, then used that object. That doesn't seem super likely, but it's hard to say. I'm going to give this some thought and dig into the code a bit.

Talked with luke about this. I think we have no other good option but to remove default compartment objects. We can backport that to aurora24 (and definitely should, given esr24), but probably not anything earlier. dveditz is in the loop on this too.

As an update - I'd run into some nasty GC crashes in IPC xpcshell tests, which I fixed by spending a week gutting XPCShellTestEnvironment and friends to be much simpler and single-cx-ready (bug 889714). The patches in this bug are now green. Uploading and flagging for review.

Created attachment 773637[details][diff][review]
Part 2 - Rejigger the string manipulation in OnJSContextNew to avoid depending on being in a compartment. v1
The current code makes calls that assume (implicitly, via assertions) that |cx|
is in a compartment, which isn't a valid assumption going forward.

Created attachment 773639[details][diff][review]
Part 3 - Null-check compartment() in JS_GetGlobalForScopeChain(). v1
cx->global() assumes a non-null compartment(). When we fix up various bugs
related to being in a compartment when we shouldn't be, we start to crash here.
Fix it.

Created attachment 773640[details][diff][review]
Part 4 - Move faulty JSAutoRequest in initSelfHosting. v1
The call to JS_SetGlobalObject causes cx->compartment_ to be set to the self-
hosting global, which means that the JSAutoCompartment picks up that compartment
as the 'previous' compartment. So despite the attempt to restore things with
JS_SetGlobalObject at the end of the function, the JSAutoCompartment destructor
actually ends up leaving cx in the self-hosting global's compartment at the end
of this function. Moving the JSAutoCompartment construction above the call to
JS_SetGlobalObject fixes the problem.

Created attachment 773645[details][diff][review]
Part 8 - Add a JSAutoCompartment to AutoCxPusher. v1
This should hopefully take care of any cases where consumers expect to be in
the default compartment.

Created attachment 773646[details][diff][review]
Part 9 - Enter a compartment between manual calls to JS_{Save,Restore}FrameChain. v1
The stuff in nsXBLProtoImplMethod is doing the same thing, so let's just have
it call into nsJSUtils.

Created attachment 773653[details][diff][review]
Part 15 - Stop setting the compartment to defaultCompartmentObject_->compartment(). v1
With this change, defaultCompartmentObject_ is just an opaque (though traced)
piece of embedder state on the cx.

Created attachment 773657[details][diff][review]
Part 16 - Require cx->compartment() to be null when destroying a context. v1
If it's non-null, that means we've got a JSAutoCompartment on the stack, which
is going to run into trouble when it tries to restore the old compartment on
the now-dead cx.

Comment on attachment 773651[details][diff][review]
Part 13 - Conditionally enter a compartment in WorkerPrivate::DoRunLoop. v1
Review of attachment 773651[details][diff][review]:
-----------------------------------------------------------------
::: dom/workers/WorkerPrivate.cpp
@@ +1696,5 @@
> + // lazily create a global, in which case we need to be in its compartment
> + // when calling PostRun() below. Maybe<> this time...
> + if (mTarget == WorkerThread && ac.empty() &&
> + js::GetDefaultGlobalForContext(cx))
> + {
Nit: { on previous line
@@ +2869,5 @@
> + // "operate in the compartment of the worker global if it exists"
> + // behavior here. This could probably be improved with some refactoring.
> + Maybe<JSAutoCompartment> maybeAC;
> + if (js::GetDefaultGlobalForContext(aCx))
> + maybeAC.construct(aCx, js::GetDefaultGlobalForContext(aCx));
So I don't know if it's a good idea to do this each time in the loop. Couldn't we just do it the first time and be done?
Also, nit: Use braces

Comment on attachment 773642[details][diff][review]
Part 6 - Don't rethrow in quickstubs if there's already an exception pending. v1
This is in xpc, so your call, but I assume the "we used to" comment will become stale/irrelevant in a few months and so maybe you'd rather remove it (and but it in the cset comment for anyone that notices the change and hg annotates).

(In reply to Luke Wagner [:luke] from comment #34)
> Comment on attachment 773642[details][diff][review]
> Part 6 - Don't rethrow in quickstubs if there's already an exception
> pending. v1
>
> This is in xpc, so your call, but I assume the "we used to" comment will
> become stale/irrelevant in a few months and so maybe you'd rather remove it
> (and but it in the cset comment for anyone that notices the change and hg
> annotates).
Quickstubs will be gone in a few months, so I think it should be ok.

Created attachment 776726[details][diff][review]
Part 7 - Use the new AutoCompartment overload for the atoms compartment and remove AutoEnterAtomsCompartment. v2
We have to do some temporary hackiness to deal with some of the new PJS work.
This patch stays as true to the old world as possible, so that we can more
easily backport it.

Comment on attachment 773658[details][diff][review]
Part 17 - Reorder some bookkeeping and assert in setCompartment that both the old and new compartments are marked as entered. v1
Flagging these patches, along with dependencies, for Firefox 24 uplift, given that this fixes a major otherwise unfixable sec-critical that fuzzers are starting to find (see dependent bug). dveditz is in the loop here, and definitely wants this on 24 (see comment 7).
This request applies to the patches in this bug, as well as those in bug 889714 and bug 889911. Note that the fixes in those bugs are _only_ needed to fix failures in the IPC XPCshell test harness, and they largely only affect testing infrastructure.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): longstanding
User impact if declined: sec-critical
Testing completed (on m-c, etc.): Just landed on m-c
Risk to taking this patch (and alternatives if risky): Moderate risk, but no less-risky alternatives
String or IDL/UUID changes made by this patch: None

Comment on attachment 773658[details][diff][review]
Part 17 - Reorder some bookkeeping and assert in setCompartment that both the old and new compartments are marked as entered. v1
Approving, given we need this on Fx24.
Given, moderate risk here if there is any QA testing/verification needed here that could mitigate that , please loop in QA here(Matt Wobensmith). And an eyeout on possible regression this may have in the next couple of days will be helpful as well, given the bake time on m-c.

(In reply to bhavana bajaj [:bajaj] from comment #61)
> :bholley looks like this will need a backport to mozilla-b2g18 given its
> still affected. Can you please help with a follow-up patch?
This is not backportable. See comment 7.

Note

You need to
log in
before you can comment on or make changes to this bug.