On a B2G system, there is a main "chrome" process that communicates with a number of "content" processes. If our tools could use a single connection established with the main chrome process to also communicate with whatever content processes were present, that would make a pleasant developer experience easier.
The main process and child processes should each run their own remote debugging protocol servers. The main process's root actor, in addition to advertising those actors available for inspection in the main process itself, should list "tunnel actors" for the child processes' root actors: the main process forwards messages addressed to a tunnel actor to the given child process, and forwards responses back.
We need to design, implement, and test extensions to the remote debugging protocol for enumerating child processes, attaching to and detaching from tunnel actors, and exchanging messages through the tunnel.

This is how I have imagined this will work, better ideas are more than welcome.
In Fennec and desktop Firefox we have a root actor with a number of tab actors as its children. For consistency we have maintained this duality in b2g as well, even though there was only one "tab" open at any time. We can put this common form to good use now, considering content processes as tabs and creating a tab actor for each new process, so that the remote protocol clients can be oblivious to this complexity.
Since actor pools are associated with connections, and intra-server communication will require a different connection, we will need to maintain a separate 'backbone' connection in the main server with its own actor pool, where the actors of subordinate servers will reside. So the tab actors in the connection to the remote protocol client will act as intermediaries to the root actors in the backbone pool. When the main server receives from the debugger frontend an 'attach' request to a tab actor, it will generate a subsequent 'attach' request to the subordinate server's single tab actor through the backbone connection. The response from the subordinate server's tab actor will be sent to the main server's tab actor and subsequently transmitted to the debugger frontend.
Now DebuggerServer or BrowserRootActor instances will need to be passed a parameter that differentiates between main and subordinate server operation. In the case of the subordinate server, the first order of business will be to open a client connection to the main server and send a registration packet, like this:
{ to: "root", type: "registerServer", url: "...", title: "..." }
URL and title are the essential properties of a tab actor and will be populated by these values. In case the URL of the content window is not available to the client, it can send a windowID and the main server will maintain a map of URLs to IDs.
Upon reception of such a packet, the BrowserRootActor will create both a subordinate actor in the backbone pool that will act as a local proxy for the subordinate server, and a regular tab actor in the client-facing pool.
The same pattern will be applied to all other children of a tab actor: threads/contexts, consoles, etc. A client-facing console acts like a proxy to the subordinate actor that transfers the requests to the console actor in the content process. Even addon-provided tab actors can be used this way, assuming we want to support that in b2g.
Subordinate actors should have not much more than an onRequest/onResponse pair of handlers for the requests coming from the tool frontend and the responses coming from the content process.
How does that sound?

(In reply to Panos Astithas [:past] from comment #4)
> How does that sound?
It does sound like it will use a lot of memory on devices where memory is a scarce resource. It would be better imho to just have the 'attach' request connect the client to the subordinate server more directly: the stream coming from the client would be sent directly to the subordinate server, and vice versa for the responses. IOW, the main server would only act as a tcp proxy (and this would be better handled at the C++ level).

I like Mike's idea of keeping the main process pretty dumb about what's going on in the child process, and just shuttling data back and forth. If we can avoid creating a proxy object in the main process for each actor in the child, that'd be great.
Also, since the main process already has a list of child processes around somewhere, it would be nice if the main process's root actor could list them, but create connections only on demand.
I'll start on a proposed spec for this. Ditch me if I don't have anything up by Friday.

Created attachment 698888[details][diff][review]
Rough draft: communicate with Firefox OS subprocesses via the debugging protocol
Here's a first draft. It's halfway between Panos's and Mike's suggestion, in that subprocesses and the main process do share a single connection, but the main process shuffles data back and forth packet by packet, not as uninterpreted bytes.

Great progress, thanks Jim!
I wish we could keep the client agnostic to the intricacies of subprocesses, but it seems that would be less clean from a protocol POV. I've been thinking about how we could make the client treat desktop, fennec and b2g in a more uniform way, so here are a few ideas that I came up with.
One thing we could easily change is to use a |current| tab index in place of the |selected| index for desktop as well. With a sufficient deprecation period of course.
Subprocesses and tabs seem almost interchangeable, so I wonder if normalizing them to |subordinates|, |children| or somesuch would work, and reduce the need for special-casing b2g in the client.
On a related note, desktop will soon need to deal with WebWorkers, which will need to be presented in a similar way. Moreover, a debugger server running on desktop will return both tabs and workers (possibly nested inside tabs). Maybe we could map this model on b2g as well? A tabs array with a single value that contains subprocesses could be one way to do it.
subprocessListChanged seems similar to the newGlobal notification currently emitted by the debugger, except that it requires an extra step to get the actor reference(s) and it occurs only once (per listSubprocesses call). Could we perhaps combine those two so that the client can treat them in a uniform way?

(In reply to Panos Astithas [:past] from comment #12)
> One thing we could easily change is to use a |current| tab index in place of
> the |selected| index for desktop as well. With a sufficient deprecation
> period of course.
There's no reason to have two different names there. I'll make them both 'selected'. Thanks!
> Subprocesses and tabs seem almost interchangeable, so I wonder if
> normalizing them to |subordinates|, |children| or somesuch would work, and
> reduce the need for special-casing b2g in the client.
Are they really interchangeable, though? A B2G child is truly a separate process, with its own stack; it can behave asynchronously in a way that tabs can't. It would have its own profiler. If we provide access to other sorts of diagnostic facilities through the root actor, each B2G child will have its own instances of those.
Thinking about "has-a" and "is-a" relationships, it seems to me that a subprocess *has* a content tab; since it also has other components that look more like an independent process, it doesn't seem to me that it *is* a content tab.
That's why I imagined attaching to a B2G subprocess should yield a new root actor, not a tab actor or something of that sort.
> On a related note, desktop will soon need to deal with WebWorkers, which
> will need to be presented in a similar way. Moreover, a debugger server
> running on desktop will return both tabs and workers (possibly nested inside
> tabs). Maybe we could map this model on b2g as well? A tabs array with a
> single value that contains subprocesses could be one way to do it.
The way we present web workers, it seems to me, needs to reflect who "owns" them. I don't know what the rules are - when the iframe that created a web worker is closed, does the web worker get cleaned up, even if there are other same-origin iframes that could be using it? In any case, the protocol needs to be clear on when workers have been shut down, whether because they exited themselves, were stopped by their owners, or the page was navigated away from. There are a lot of details here I don't understand, that I think have a big effect on how the protocol presents them.
> subprocessListChanged seems similar to the newGlobal notification currently
> emitted by the debugger, except that it requires an extra step to get the
> actor reference(s) and it occurs only once (per listSubprocesses call).
> Could we perhaps combine those two so that the client can treat them in a
> uniform way?
If we could have some conventions for these sorts of unsolicited update notifications, that would definitely be a good thing.
What I like about one-shot notifications that carry very little information, like the subprocessListChanged packet, is that you don't need machinery to subscribe to or unsubscribe from it: it's cheap enough to always send; and it doesn't waste time when the client is uninterested.

(In reply to Jim Blandy :jimb from comment #13)
> (In reply to Panos Astithas [:past] from comment #12)
> > Subprocesses and tabs seem almost interchangeable, so I wonder if
> > normalizing them to |subordinates|, |children| or somesuch would work, and
> > reduce the need for special-casing b2g in the client.
>
> Are they really interchangeable, though? A B2G child is truly a separate
> process, with its own stack; it can behave asynchronously in a way that tabs
> can't. It would have its own profiler. If we provide access to other sorts
> of diagnostic facilities through the root actor, each B2G child will have
> its own instances of those.
>
> Thinking about "has-a" and "is-a" relationships, it seems to me that a
> subprocess *has* a content tab; since it also has other components that look
> more like an independent process, it doesn't seem to me that it *is* a
> content tab.
>
> That's why I imagined attaching to a B2G subprocess should yield a new root
> actor, not a tab actor or something of that sort.
All good points, agreed.
> > On a related note, desktop will soon need to deal with WebWorkers, which
> > will need to be presented in a similar way. Moreover, a debugger server
> > running on desktop will return both tabs and workers (possibly nested inside
> > tabs). Maybe we could map this model on b2g as well? A tabs array with a
> > single value that contains subprocesses could be one way to do it.
>
> The way we present web workers, it seems to me, needs to reflect who "owns"
> them. I don't know what the rules are - when the iframe that created a web
> worker is closed, does the web worker get cleaned up, even if there are
> other same-origin iframes that could be using it? In any case, the protocol
> needs to be clear on when workers have been shut down, whether because they
> exited themselves, were stopped by their owners, or the page was navigated
> away from. There are a lot of details here I don't understand, that I think
> have a big effect on how the protocol presents them.
OK, let's cross that bridge when we get to it.
> > subprocessListChanged seems similar to the newGlobal notification currently
> > emitted by the debugger, except that it requires an extra step to get the
> > actor reference(s) and it occurs only once (per listSubprocesses call).
> > Could we perhaps combine those two so that the client can treat them in a
> > uniform way?
>
> If we could have some conventions for these sorts of unsolicited update
> notifications, that would definitely be a good thing.
>
> What I like about one-shot notifications that carry very little information,
> like the subprocessListChanged packet, is that you don't need machinery to
> subscribe to or unsubscribe from it: it's cheap enough to always send; and
> it doesn't waste time when the client is uninterested.
Sure, but isn't that common to both "subprocessListChanged" and "newGlobal"? Do you think the data we add on a newGlobal packet makes it much more expensive?

(In reply to Panos Astithas [:past] from comment #15)
> > What I like about one-shot notifications that carry very little information,
> > like the subprocessListChanged packet, is that you don't need machinery to
> > subscribe to or unsubscribe from it: it's cheap enough to always send; and
> > it doesn't waste time when the client is uninterested.
>
> Sure, but isn't that common to both "subprocessListChanged" and "newGlobal"?
> Do you think the data we add on a newGlobal packet makes it much more
> expensive?
You're absolutely right that the two cases are analogous; we could save some complexity by adopting a common model for these things.

I read the proposed changes in the attachment. Some feedback:
0. Instead of mentioning B2G explicitly, comparing B2G vs. what is currently normal today (single process Firefox), we should instead just differentiate between whether there is a sandbox or not, so that the spec can also handle the case of mutli-process Desktop Firefox and multi-process Firefox for Android.
1. Going even farther, it seems bad for the protocol between the debugging client and debugging host to mention "subprocesses" at all. Imagine a debugging client that remotely debugs desktop Firefox, which is not multi-process now. Then, we improve Firefox to have a multi-process design similar to B2G. Existing clients should continue to debug the multi-process Firefox just like they could debug the single-process Firefox before.
2. The proposal says that the host running in the parent process of a multi-process host can act as a dumb router for messages to the content processes. That might be possible now, but hopefully it won't be true for long. There is some information that the debugging client will likely need but which can only be retrieved from the parent process, not the the child processes. For example, child processes should not (but currently do) have access to HttpOnly cookies at all, so if the debugging client wants access to the cookie database, then the parent process will have to process that request and handle it itself. Similarly, access to HTTP auth credentials and the ability to enumerate the HTTP cache, etc. are likely to be restricted to the parent process eventually.

In comment five, Mike suggested simply turning the entire connection into a proxy forwarding to the subprocess. Attaching to a subprocess actor would turn the main "B2G" process into a simple byte-by-byte forwarder for that connection.
That approach means that attaching to a subprocess actor in that way effectively kills all the other actors that might have been mentioned on that connection. One of the general ideas of the protocol's design is that actors are pretty independent of each other; what you do with one doesn't interfere with your interactions with another, unless that's appropriate for their purpose.
In the proposal attached, actors remain independent, and the b2g process can still be a pretty dumb forwarding proxy. The main difference between that and comment 5's suggestion is that it still can serve local actors, and needs to parse packets to forward them.

(In reply to Brian Smith (:bsmith) from comment #17)
> I read the proposed changes in the attachment. Some feedback:
>
> 0. Instead of mentioning B2G explicitly, comparing B2G vs. what is currently
> normal today (single process Firefox), we should instead just differentiate
> between whether there is a sandbox or not, so that the spec can also handle
> the case of mutli-process Desktop Firefox and multi-process Firefox for
> Android.
You're saying that all kinds of servers may potentially want to have subprocesses which can be contacted, not just B2G servers. That makes sense. We can certainly disentangle those two questions.
> 1. Going even farther, it seems bad for the protocol between the debugging
> client and debugging host to mention "subprocesses" at all. Imagine a
> debugging client that remotely debugs desktop Firefox, which is not
> multi-process now. Then, we improve Firefox to have a multi-process design
> similar to B2G. Existing clients should continue to debug the multi-process
> Firefox just like they could debug the single-process Firefox before.
This distinction, however, I suspect is unavoidable. I explained why in comment 13. The fact that sandboxed content gets its own independent call stack, at the very least, poses a significant difference that the debugger simply has to know about.
> 2. The proposal says that the host running in the parent process of a
> multi-process host can act as a dumb router for messages to the content
> processes. That might be possible now, but hopefully it won't be true for
> long. There is some information that the debugging client will likely need
> but which can only be retrieved from the parent process, not the the child
> processes. For example, child processes should not (but currently do) have
> access to HttpOnly cookies at all, so if the debugging client wants access
> to the cookie database, then the parent process will have to process that
> request and handle it itself. Similarly, access to HTTP auth credentials and
> the ability to enumerate the HTTP cache, etc. are likely to be restricted to
> the parent process eventually.
This should not be a problem. The subprocessActor (using the terminology in the proposal) is exactly what you want: an actor in the parent process representing the subprocess. It could answer questions about cookies and credentials.
I think this reinforces the value of having the main process route on a packet-by-packet basis, rather than handing the whole connection over to the subprocess: debugging content in subprocess entails interacting with both the main process and the subprocess.

(In reply to Jim Blandy :jimb from comment #19)
> I think this reinforces the value of having the main process route on a
> packet-by-packet basis, rather than handing the whole connection over to the
> subprocess: debugging content in subprocess entails interacting with both
> the main process and the subprocess.
Jim, thanks for clarifying that stuff. I agree with all the stuff you wrote. I misunderstood what was meant by "In fact, the main process can remain entirely ignorant of what protocol the subprocess speaks, while still shuttling packets back and forth appropriately."
By the way, I am not sure if long-term there are only going to be two levels of subprocess. For example, I think Chromium supports a <browser> element which is like <iframe> but where the content of the <browser> element is in its own process. I do not know if we will support something like that, but I think we'll probably have to for web browsers on B2G, at least. But, it may be useful to punt on that kind of stuff unless/until we've actually implemented that stuff in Gecko.

Reading the proposed changes (btw, not easy to read it in the form of a diff with long lines), and the followup comments, I'm satisfied with the design, but I do worry about the implication of not having a dumb forwarded on memory use in the main process, especially for the profiler actor. Specifically, I worry how this impacts bug 797639.

(In reply to Mike Hommey [:glandium] from comment #21)
> Reading the proposed changes (btw, not easy to read it in the form of a diff
> with long lines),
When I read it with Splinter, it comes out pretty nicely; reading via github is a real pain. How are you going about it? (I want to find a good way to do these doc reviews, because I want to do them often.)
> but I do worry about the implication of not having a dumb forwarded on
> memory use in the main process, especially for the profiler actor.
> Specifically, I worry how this impacts bug 797639.
I think the forwarding should be handled at a layer immediately above the transport; the bulk transport extension described in bug 797639 is designed to ensure that dumb forwarding is possible.

(In reply to Jim Blandy :jimb from comment #25)
> Status: since B2G has disabled in-process content, a fix for this bug has
> become necessary to do any content debugging. It's currently my top priority.
Can I do anything to help on the B2G/Gaia side?

(In reply to Brad Lassey [:blassey] from comment #29)
> Jim, Vivien tells me you two talked on irc. Do you have the information you
> need to move on? And if so, do you have an ETA?
I'm hoping to have some patches up on Tuesday or Wednesday.

Created attachment 719769[details][diff][review]
Work-in-progress IDL for parent/child communication XPCOM component.
I'm expecting this would be implemented with a few additions to the PContent IPDL protocol.
I don't think we should use the nsIMessageManager classes, because they seem oriented towards tabs, whereas debugger servers must be tied to processes: we don't want to have to introduce special cases for in-process tabs, or content processes with multiple tabs.

Note that there are process-level message managers that I'm fairly confident work in-process as well. See uses of @mozilla.org/parentprocessmessagemanager;1 and @mozilla.org/childprocessmessagemanager;1.

(In reply to Josh Matthews [:jdm] from comment #32)
> Note that there are process-level message managers that I'm fairly confident
> work in-process as well. See uses of
> @mozilla.org/parentprocessmessagemanager;1 and
> @mozilla.org/childprocessmessagemanager;1.
Those are the nsIMessageManager classes, which I'm not sure will behave the way I want. Here's a question:
Suppose I register an observer on the parent process manager on some new topic, and have new debuggable child processes notify on that topic. How do I find the message manager object whose sendAsyncMessage sends messages to that specific child? That is, how do I reply?
I could have the child choose a unique ID somehow, include it in the data sent to the parent, and then observe a topic that contains that ID. Then the parent could broadcast on that topic on the parent process manager to send messages to that child. But that's crazy; why should Firefox deliver packets to all children when they're mostly going to ignore them? So I need a way to get the message manager for that specific child.

(In reply to Fabrice Desré [:fabrice] from comment #34)
> When the parent gets a message from the child in receiveMessage(message),
> message.target is the message manager for that child.
Okay --- I see now that, when nsFrameMessageManager::ReceiveMessage recurses on its parent, aTarget isn't changed, so it continues to refer to the message manager that represents only that child process.
This could work; I'll give it a try.

Created attachment 722033[details][diff][review]
Implement nsIContentChild and nsIContentChildService, for enumerating and controlling content child processes.
Work in progress; see the new nsIContentChildService.idl for the gist.
This allows us to enumerate child processes, and to start XPCOM services in them. The idea is that the debugger server can be registered as a service that is implemented in JS.
We can't use nsIFrameScriptLoader to start the debug server, because only element message managers implement that interface, and debug servers really need to be per-process, not per tab.
We can't use nsIMessageBroadcaster::getChildAt on the parentprocessmessagemanager to enumerate the child processes, because that yields the message managers for the individual child processes, not anything that can cause code to be loaded.
This patch makes the actual ContentParent instances something that can be enumerated directly (they were already visible to JS via the observer lists), and something which provides reasonable methods to get fresh chrome code started. nsIContentChild was smaug's suggestion; nsIContentChildService::listChildren was my thought, after talking with smaug and bent; it just fronts the existing ContentParent::GetAll function.

Created attachment 722034[details][diff][review]
Implement nsIContentChild and nsIContentChildService, for enumerating and controlling content child processes.
Update: forgot to include nsContentChildService.cpp in the patch.

Created attachment 725311[details][diff][review]
Implement ChildDebuggerTransport, a debug protocol transport for communicating with content child processes via process message managers.
Thanks to the use of the message managers, the debug protocol transport is extremely simple.
Soon to come, in this bug:
- including the child list in the root actor's initial greeting packet, per spec, as enabled by nsIContentChildService::listChildren and the 'ipc:content-created' notification
- starting servers in children, as enabled by nsIContentChild::startService
These are straightforward.
In follow-up bugs, we'll need to cover:
- Make sure the root actor's 'listTabs' request works properly in content children. (Perhaps what we have will just work, but I suspect it'll need tweaks.)
- Debugger client side changes to include child process tabs in the list. Because in-process and out-of-process tab actors implement the same protocol, once you've enumerated them and gotten the actors' names, all the further interactions should just work.

Alex Poirot has implemented a different approach to debugging content processes on FxOS than the one outlined in this bug, which is implemented and landed, for bug 817580. With that change, Firefox should now support debugging content running out-of-process.
We may still need the mechanisms proposed in this bug, but it's not clear. So I think we're going to WONTFIX this one for the time being, and re-open it if a need arises for more direct contact with the content process itself, as opposed to specific frames within the content process.

Note

You need to
log in
before you can comment on or make changes to this bug.