Sunday, September 11, 2016

Passing Messages from Renderer to Browser in Chromium by example!

Okay, so I'm not saying this is even close to correct, but after a lot of code reading, reverse engineering and reading of documentation I think I'm finally ready to pass a bool value of some sort from the Blink code in the Renderer process all the way up to the Browser process.

I was quite surprised how many steps there were in plumbing even a simple value, but when you think about all of the architectural layers and how open source software is composed it kind of makes sense. Blink has to be wrapped by a content host of some sort to give it access to resources. That means some well defined interfaces and abstract communication channels for those layers. The content host in turn gets wrapped into some form of marshaling abstraction that allows it to serve the multi-process model. Some more objects in the browser in turn wrap these marshaling abstractions and separate the browser and content process. After all, why does a browser know about only a specific piece of content or one content type, maybe I can plug in any content right? Even the browser implementation specifics of UX and platform need to be broken out and that means even more interfaces. The following, very high level, very abstract diagram should help in understanding the layer breakdown.

Quick Note: The diagram was made before converting instances of WebKit to Blink. The two are mostly interchangeable in this article and I've tried to be clear that they effectively mean the same thing.

If you are interested in more specifics then the actual diagram looks something closer to these hand scrawled hieroglyphics I threw up on my whiteboard. The faint of heart should avert their eyes.

In this sample I was hypothesizing how to get information from VRDisplay up to the Browser. However, we will actually use a slightly different and preexisting message to decode this diagram. The message we'll use is FrameHostMsg_DidChangeName. By the time you get to a message like this you are already at the Chrome IPC layer talking from the Renderer to the Browser. So how did I even pick this message? Well, I first started in WebContentsObserver which is the list of things that a browser might want to know about from the content. There are other types of observers but this one has a lot of good stuff on it. That allowed me to track back to the message.

From the message we can then look for someone doing a "new" operation on the type. So FrameHostMsg_DidChangeName isn't just an enum or index, but a structure. You need to create one of these structures to pass to the Browser, so now we have a foot hold in the Renderer to continue our examination.

Starting at the End

Okay, enough foot-hold, lets work this backwards. In fact, my diagram is incomplete, it only discusses up to the Send and then it fudges the work that happens in the Browser itself. We need to understand that though since it shows the differences between where the Shell is and were the Browser side implementation for the hosting model is.

From the diagram, the WebContentsObserver is what a Shell would create in order to spy on the WebContentsImpl which is really a tree of pages. Since this is a tree, if you want to know specifically what "document" something is for you have to pass in the RenderFrameHost object to uniquely identify it. This is precisely what we do with DidChangeName. You can view the implementation of RenderFrameHostImpl::OnDidChangeName to see how it passes itself to the delegate.

To get into the OnDidChangeName method we had to implement an IPC handler. We do this in render_frame_host_imp.cc in our RenderFrameHostImpl::OnMessageReceived method. Its a simple mapping of the message name to a handler function. The other side of this channel will be the renderer process passing us the notifications. If you scan up a few lines you'll find the Send method which we can also use to communicate back down to the renderer side object.

Our implementation of OnDidNameChange shows the difference between the Shell and Browser code. We need some Browser level logic to tidy up the frame node's and structures required to make sure that the HTML 5 spec is adhered to and that our Browser side view is in sync. This is what I'm calling Render Host in the high level diagram.

The Shell doesn't get a crack until we call into the RenderFrameHostDelegate::DidChangeName method. By default the RenderFrameHostDelegate is a no-op object, but we derive our WebContentsImpl from it and pass it into each RenderFrameHost that we wrap. From here things get simple and the Shell is able to access the notification by first creating a WebContentsObserver and attaching this to the WebContentsImpl. You can have as many observers as you want per WebContentsImpl and it will simply mulit-cast the notification. A couple of quick searches on the Chromium Code Search should find you the relevant details if you want to dig in further.

Escaping the Renderer Process

We are now back to those hieroglyphic scrawls from before!

On this side of the pipeline we are looking at RenderFrameImpl::didChangeName. Note how the casing and structure is starting to change as we get into the content process. This is important since it means that understanding the model of how things are structures is more important than understanding just a particular name or naming scheme. When naming schemes change code search becomes less useful since you have to perform many smaller searches to chase a thread of thought to its end.

On this end, the code takes in two strings, names and formats them into a FrameHostMsg_DidChangeName structure that it can then Send to the Browser process. We get notified of these changes because it turns out our RenderFrame/RenderFrameImpl is also a WebFrameClient. If you think about the delegate capabilities before, then this WebFrameClient must be similar. Our RenderFrameHostImpl is actually a delegate that can provide a WebFrame (or WebLocalFrame) with additional services. One of which is listening to a frame name change notification.

This is where I think things get tricky, but the architectural layering is pretty good. Things that start with the Web* prefix are going to be in third_party/WebKit code. All of the public interfaces for crossing from WebKit (Blink) -> Renderer (or whatever Content host you want to implement) will be in third_part/WebKit/public. This includes our WebFrameClient.h where we would add any new capabilities or notifications.

For this next part we stay in WebKit band (Blink) but it starts to cross layering boundaries within Blink itself. At some point we'll cross into a boundary that I called Modules (represented as both core and modules in the source tree) in the above diagram. Think of this simply as any sub-component that is logically being snapped in. Something like the Gamepad API might be a good example. Gamepad doesn't need to know everything about Blink to work. Instead it needs to bind in at key locations and those should be layered to allow proper extensibility. How that binding works is another blog entirely, but suffice to say its pretty cool.

Let's quickly jump through the rest of this since it is mostly plumbing. Now that we've seen how our Renderer code which is hosting Blink can be a client, we simply need to figure out how to poke all the way out of the Modules and WebKit (Blink) layers to invoke the client itself.

Most objects will have access to the Document and so that is how I would normally start. First get the LocalFrame using the frame() method. This returns a local m_frame and in context you should probably be able to guarantee its existence.

Next Steps

Note, don't trust that this just works, dig in, or even just add a simple notification to the code and get it to compile. Understanding the structure of these objects is very important not just now, but for future work. One cool thing that you find is there are Remote* versions of the window, document and frame. This is because of out-of-proc iframes I think. Those features help explain why some of the complexity around remote/local things needs to exist. It also details why many of these operations are only on local interfaces and that a listener for a remote frame, document, etc... would be listening in another process on the local version of the interface. Its representation in this document is only structural.

If anyone has a complete commit, pull request, etc... implementing a new notification that can be shared let me know. I'll add it to the post. The process is no joke for sure and you end up touching many files to have both the header default implementation (making it not quite an interface, but handy since you can add new methods without breaking everyone) and a client implementation.