[Async shutdown] Handle C++ code

Categories

(Toolkit :: Async Tooling, defect, minor)

The Mozilla Toolkit is a set of APIs, built on top of Gecko, which provide advanced services to XUL applications. These services include Profile Management, Chrome Registration, Browsing History, Extension and Theme Management, Application Update Service, and Safe Mode. (More info)

At the moment, AsyncShutdown.jsm is available only for JS code. I believe that we should rewrite its core as a C++ component and make it available for both C++ and JS consumers. This will be useful e.g. for places and mozStorage.

This would be very helpful for IPDL channel shutdown. Our current problem with channel shut down, is that destroying shared resources typically generates async messages, so we can't "just" close a channel. Right now we first need to send a sync message from the child to the parent that says "I will stop, destroy all of your resources", then when the sync message returns, we have the guarantee that all of the generated messages that were sent from the parent are already in our event loop, so we schedule a task in our own event loop to close the channel. The problem is that we are late in Gecko's shutdown sequence and we don't always have the guarantee that posting things to an event loop is still a valid thing to do.
So it'd be great if we had a way to do all of this asynchronously and express dependencies to make sure that the part of the xpcom shutdown that kills our ability to use event loops happens after all of the IPC channels are gone.
This (maybe) doesn't apply to b2g because IPDL setup/shutdown is different there than with mac/fennec.

It may be nice to have async shutdown in the video/audio playback code.
Our shutdown is async because we need to wait for our decoders, threadpools and other threads to shutdown, before our shutdown observer can return, and additionally there are some dependencies in the order in which our objects can shutdown.
I recently refactored our shutdown code to go through a single path in MediaShutdownManager.h. This centralizes all shutdown logic, and spins the event loop (http://dxr.mozilla.org/mozilla-central/source/content/media/MediaShutdownManager.cpp#134) waiting for all the threads/pools to shutdown.
Ideally I'd like to not have to join threads, but instead have an event run on the main thread once a thread has been shutdown. It's not immediately obvious how to do that correctly though.
We occasionally get bugs like 966966, but they're often caused by bugs in platform media libraries, or how we use them, causing threads to deadlock and thus not being able to be joined. Async shutdown won't help in those cases. Having async shutdown may make these cases harder to debug, as there won't be a callstack with a shutdown observer blocking on it waiting for a thread to die. So you can't tell if we've actually started the shutdown process or not as easily.
Is it obvious from the callstacks of shutdown hangs how far through the shutdown process we are? I'm thinking specifically in the case of tinderbox runs, where we inject crashes on hangs to get call stacks.

(In reply to Chris Pearce (:cpearce) from comment #3)
> Is it obvious from the callstacks of shutdown hangs how far through the
> shutdown process we are? I'm thinking specifically in the case of tinderbox
> runs, where we inject crashes on hangs to get call stacks.
For the moment, we have been dealing only with JavaScript code, so that's hard to say.

My initial plan was to rewrite AsyncShutdown entirely in C++ with a XPCOM API and a small JS layer on top of it, but I realize that we could just XPCOM-ify the existing (and already tested) JS code to make it usable by C++.
Benjamin, do you think that this makes sense?

Comment on attachment 8458303[details][diff][review]
xpcom.idl
Review of attachment 8458303[details][diff][review]:
-----------------------------------------------------------------
::: toolkit/components/asyncshutdown/nsIAsyncShutdown.idl
@@ +47,5 @@
> + /**
> + * Inform the blocker that the stage of shutdown has started.
> + * Shutdown will NOT proceed until `aOnReady` has been called.
> + */
> + void start(nsIAsyncShutdownCompletionCallback aOnReady);
If this function returned an nsresult, the asynchronous shutdown implementation could be simplified, because it wouldn't need to add code to handle synchronous failure. For example, it's possible that the function needs to post an event, and this could fail synchronously.
Also, if the function in nsIAsyncShutdownCompletionCallback accepted an nsresult argument, the JavaScript side could do a more detailed reporting of what happened in case of errors.

(In reply to :Paolo Amadini from comment #10)
> If this function returned an nsresult, the asynchronous shutdown
> implementation could be simplified, because it wouldn't need to add code to
> handle synchronous failure. For example, it's possible that the function
> needs to post an event, and this could fail synchronously.
Well, right now, the implementation is along the lines of
let promisified = () => new Promise(resolve => blocker.start(resolve));
AsyncShutdown.addBlocker(name, promisified, ...)
which handles errors and is a bit hard to simplify :)
> Also, if the function in nsIAsyncShutdownCompletionCallback accepted an
> nsresult argument, the JavaScript side could do a more detailed reporting of
> what happened in case of errors.
Good point, although `nsresult` would not be sufficient for most errors – especially if we decide to migrate the back-end to C++, we won't want to lose the ability to print out stacks, lineNumber, etc. Perhaps also a `nsIPropertyBag` would be more useful.

Comment on attachment 8458303[details][diff][review]
xpcom.idl
Review of attachment 8458303[details][diff][review]:
-----------------------------------------------------------------
API looks pretty reasonable, just a translation of what we already have today.
I am unfamiliar with crash reporting specifics; do we have to care about a scenario like:
1) Crash.
2) nsIAsyncShutdownService iterates through its barriers, and those through blockers, etc.;
3) We try to fetch the nsIPropertyBag for something;
4) Dynamic allocation of nsIPropertyBag happens, dynamic allocation of things inside it, which doesn't work all that great since we've already crashed.
I guess that's not a concern, because we already do something similar on the JS side with all the descriptions and extra properties and JSON there?
::: toolkit/components/asyncshutdown/nsIAsyncShutdown.idl
@@ +14,5 @@
> +
> +#include "nsISupports.idl"
> +
> +/**
> + * Callback invoked upon completino of an asynchronous operation.
Nit: "completion".
@@ +52,5 @@
> +
> + /**
> + * The current state of the blocker.
> + *
> + * In case of crash, this is converted to JSON and returned to
Incomplete description.
@@ +66,5 @@
> + /**
> + * Add a blocker.
> + */
> + void addBlocker(nsIAsyncShutdownBlocker aBlocker);
> + void removeBlocker(nsIAsyncShutdownBlocker aBlocker);
I am confused about what this method does, as it is lacking the fine documentation that |addBlocker| has. ;)
Document behavior if |aBlocker| has not been added via |addBlocker|?

(In reply to Nathan Froyd (:froydnj) from comment #12)
> Comment on attachment 8458303[details][diff][review]
> xpcom.idl
>
> Review of attachment 8458303[details][diff][review]:
> -----------------------------------------------------------------
>
> API looks pretty reasonable, just a translation of what we already have
> today.
>
> I am unfamiliar with crash reporting specifics; do we have to care about a
> scenario like:
>
> 1) Crash.
> 2) nsIAsyncShutdownService iterates through its barriers, and those through
> blockers, etc.;
> 3) We try to fetch the nsIPropertyBag for something;
> 4) Dynamic allocation of nsIPropertyBag happens, dynamic allocation of
> things inside it, which doesn't work all that great since we've already
> crashed.
>
> I guess that's not a concern, because we already do something similar on the
> JS side with all the descriptions and extra properties and JSON there?
You lost me here. AsyncShutdown doesn't mean to take care of crashes. If the crash has already happened, well, AsyncShutdown has crashed with the rest of the process.
> Document behavior if |aBlocker| has not been added via |addBlocker|?
Will do.

Changes from the previous version:
- there is now only one way to remove a blocker, and this is method `RemoveBlocker` (the callback has been removed);
- renamed `Start()` to `BlockShutdown()`, which should be more readable;
- method `BlockShutdown()` now passes the client and the client now has an attribute with the barrier name, both of which should make it easier to have the same object act as blocker for several barriers;
- instances of `nsIAsyncShutdownBlocker` should now have a *unique* name;
- instances of `nsIAsyncShutdownClient` now have a field `jsvalue` that may be used to get back to the AsyncShutdown.jsm of the client (I expect that this will become quite useful for e.g. Places that have both C++ and JS components that need to share the same barriers);
- added a new barrier `xpcomThreadsShutdown` (necessary for porting mozStorage to AsyncShutdown).

This is the meat of the patch.
A few small changes to AsyncShutdown.jsm & co:
- changed directory;
- as third argument, `addBlocker` now accepts not only a function but also an object which may contain line number, file name, stack information;
- split tests in two files;
- more tests.
nsAsyncShutdown.js is itself a rather straight proxy for AsyncShutdown.jsm into XPCOM-land.

As a proof of concept, port mozStorage to AsyncShutdown. This is a proof of concept, so the code is not entirely cleaned up, and I haven't checked all tests on Earth, but it passes all existing xpcshell tests.
The main advantages of this version are:
- crash reports if a database takes waaaay to long to close;
- in case of crash, the crash report contains the name (and state) of all databases that are still opened.
It would be quite feasible to extend mozIStorageService to provide a shutdown barrier that will let further clients block this specific shutdown, should they need to e.g. open/update/close a database just prior to mozIStorageService shutdown, but this went way beyond the patch.

Comment on attachment 8461110[details][diff][review]
1. XPIDL
I'm not entirely comfortable landing this without feedback from at least one potential user. Nical, could you take a look and tell me whether this API would fulfill your needs?

Comment on attachment 8461110[details][diff][review]
1. XPIDL
Review of attachment 8461110[details][diff][review]:
-----------------------------------------------------------------
::: toolkit/components/asyncshutdown/nsIAsyncShutdown.idl
@@ +26,5 @@
> + * If you wish to use AsyncShutdown, you will need to implement this
> + * interface (and only this interface).
> + */
> +[scriptable, uuid(4ef43f29-6715-4b57-a750-2ff83695ddce)]
> +interface nsIAsyncShutdownBlocker: nsISupports {
Sad that you would have to use XPIDL for a new interface in 2014. Have you considered going down the route of getting any blockers fixed to be able to use WebIDL instead?
Using WebIDL would be particularly beneficial for this interface because:
1) you want it to be useful for both C++ and JS callers; WebIDL gives you that; XPIDL generates clunky C++ interfaces.
2) using WebIDL might help removing some of the nsIPropertyBag/nsIVariant/jsval usage (see below).
If what's holding you back is that some things are not available late enough during shutdown, that should be fixable, and anyway XPCOM stops working too at some late stage during shutdown.
@@ +56,5 @@
> + * This field may be used to provide JSON-style data structures.
> + * For this purpose, use
> + * - nsIPropertyBag to represent objects;
> + * - nsIVariant to represent field values (which may hold nsIPropertyBag
> + * themselves).
Ouch, this problem, of passing JSON-like objects between C++ and JS, really needs to receive a simpler solution than that... Not your fault, but this is propagating unnecessary complexity in many places, like here.
@@ +93,5 @@
> + */
> + void addBlocker(in nsIAsyncShutdownBlocker aBlocker,
> + in AString aFileName,
> + in long aLineNumber,
> + in AString aStack);
Just a suggestion: don't you want to replace aFileName, aLineNumber and aStack by a single free-form string, "aCallSiteInfo"? In this way, you would leave the door open to adding more contextual information as appropriate, such as the value of a particular local variable in the caller.
@@ +112,5 @@
> + * `jsclient` instead of the `nsIAsyncShutdownClient`. See
> + * AsyncShutdown.jsm for more information on the JS version of
> + * this API.
> + */
> + readonly attribute jsval jsclient;
How carefully have you examined alternatives to having a jsval here? It's unfortunate to force users of your API to use the jsapi themselves.

(In reply to Benoit Jacob [:bjacob] from comment #32)
> Sad that you would have to use XPIDL for a new interface in 2014. Have you
> considered going down the route of getting any blockers fixed to be able to
> use WebIDL instead?
>
> Using WebIDL would be particularly beneficial for this interface because:
> 1) you want it to be useful for both C++ and JS callers; WebIDL gives you
> that; XPIDL generates clunky C++ interfaces.
> 2) using WebIDL might help removing some of the
> nsIPropertyBag/nsIVariant/jsval usage (see below).
>
> If what's holding you back is that some things are not available late enough
> during shutdown, that should be fixable, and anyway XPCOM stops working too
> at some late stage during shutdown.
Right, this could certainly have been done in WebIDL. I don't have enough experience with WebIDL to know whether the change would be useful.
Note that I don't want JS callers to use this API (except the jsval), as they a much more idiomatic API is available for them. This API is designed for C++ users. Would WebIDL help here?
> @@ +56,5 @@
> > + * This field may be used to provide JSON-style data structures.
> > + * For this purpose, use
> > + * - nsIPropertyBag to represent objects;
> > + * - nsIVariant to represent field values (which may hold nsIPropertyBag
> > + * themselves).
>
> Ouch, this problem, of passing JSON-like objects between C++ and JS, really
> needs to receive a simpler solution than that... Not your fault, but this is
> propagating unnecessary complexity in many places, like here.
In XPIDL, I believe that this is the only solution. Would anything in WebIDL make it less clumsy for the C++ caller and the JS implementation?
> @@ +93,5 @@
> > + */
> > + void addBlocker(in nsIAsyncShutdownBlocker aBlocker,
> > + in AString aFileName,
> > + in long aLineNumber,
> > + in AString aStack);
>
> Just a suggestion: don't you want to replace aFileName, aLineNumber and
> aStack by a single free-form string, "aCallSiteInfo"? In this way, you would
> leave the door open to adding more contextual information as appropriate,
> such as the value of a particular local variable in the caller.
Well, both the CrashManager and TBPL make use of the file name and line number to sort crash signatures. What you suggest would complicate their task. Since we already have the ability to put anything in the nsIPropertyBag, I don't think that there would be any gain here.
> @@ +112,5 @@
> > + * `jsclient` instead of the `nsIAsyncShutdownClient`. See
> > + * AsyncShutdown.jsm for more information on the JS version of
> > + * this API.
> > + */
> > + readonly attribute jsval jsclient;
>
> How carefully have you examined alternatives to having a jsval here? It's
> unfortunate to force users of your API to use the jsapi themselves.
Native clients of this API do not need to manipulate that jsval, ever. The only clients that need to manipulate this jsval are JS clients of instances of nsIAsyncShutdownClient that wish to access the more idiomatic JS API instead of XPIDL API. Does this make sense to you?

(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #33)
> (In reply to Benoit Jacob [:bjacob] from comment #32)
> > Sad that you would have to use XPIDL for a new interface in 2014. Have you
> > considered going down the route of getting any blockers fixed to be able to
> > use WebIDL instead?
> >
> > Using WebIDL would be particularly beneficial for this interface because:
> > 1) you want it to be useful for both C++ and JS callers; WebIDL gives you
> > that; XPIDL generates clunky C++ interfaces.
> > 2) using WebIDL might help removing some of the
> > nsIPropertyBag/nsIVariant/jsval usage (see below).
> >
> > If what's holding you back is that some things are not available late enough
> > during shutdown, that should be fixable, and anyway XPCOM stops working too
> > at some late stage during shutdown.
>
> Right, this could certainly have been done in WebIDL. I don't have enough
> experience with WebIDL to know whether the change would be useful.
>
> Note that I don't want JS callers to use this API (except the jsval), as
> they a much more idiomatic API is available for them. This API is designed
> for C++ users. Would WebIDL help here?
If most of this API is only meant to be called from C++, then why does it need to be reflected in any kind of IDL? I seem to be missing something basic here.

(In reply to Benoit Jacob [:bjacob] from comment #34)
> > Note that I don't want JS callers to use this API (except the jsval), as
> > they a much more idiomatic API is available for them. This API is designed
> > for C++ users. Would WebIDL help here?
>
> If most of this API is only meant to be called from C++, then why does it
> need to be reflected in any kind of IDL? I seem to be missing something
> basic here.
Because we have an implementation in JS that has landed in bug 941918 ~1 year ago and is now used pretty pervasively in JS-based code. By now, we are pretty sure that it works, so reusing that code instead of reimplementing it from scratch is, I believe, the right thing to do.

I see now, thanks for the explanation. That sounds reasonable, but I am not used at all to thinking of bindings going in that direction --- I only have experience with the other direction :-) So I may not be a great choice of person to give feedback on that. At least I don't see anything wrong at this point :-) I was initially concerned about adding a shutdown dependency on the JS engine, but I see (http://dxr.mozilla.org/mozilla-central/source/xpcom/build/nsXPComInit.cpp?from=shutdownxpcom&case=true#) that it is shut down quite late in ShutdownXPCOM. At the moment, some IPC work takes place even later, but so far we've not being shy of blocking ShutdownXPCOM as needed, so we could do that again.

The approach here is noble: this API allows to build a shutdown sequence "graph" in an almost "declarative" way. That would be a nice improvement over the current situation where on shutdown we run huge functions such as ShutdownXPCOM and depend on the precise ordering of the 100 different things that that function does. Another kind of code that this could usefully supersede is the manual barriers that we have in the shutdown code for ImageBridgeChild and CompositorParent.
So yes, if this actually gets used, this could help make the shutdown code a lot easier to maintain. Kudos for that!
Ways in which this could not help so much:
- details of IPDL/actors/chromium-ipc
- adding a dependency on the JavaScript engine to things that were so far unrelated to it, should be thought of as... a liability :-)
Does that answer your question? It's all I can think of at this point.

Comment on attachment 8461123[details][diff][review]
4. (Proof of Concept) Porting mozStorage to AsyncShutdown
Moved this to bug 1064888. At the moment, the patch still causes some tests to fail, so I'll wait until I have fixed these tests before I ask for feedback/review.

(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #39)
> As for IDL/actors/chromium-ipc, well, what could make your life simpler?
Nothing that would fall in the scope of this bug :-)
(I've rarely found that a hierarchical, object-oriented actor model like offered by IPDL was a good match for the kind of IPC that we needed to do).