Post navigation

a simpler, webbier approach to Web Intents (or Activities)

A few months ago, Mike Hanson and I started meeting with James, Paul, Greg, and others on the Google Chrome team. We had a common goal: how might web developers build applications that talk to each other in a way that the user, not the site, decides which application to use? For example, how might a major news site provide a “share” button that connects to the user’s preferred sharing mechanism? Not everyone uses the same top-three social networks, yet users are constantly forced to search for their preferred service within a set of publisher-chosen buttons. That leads to undue centralization and significantly undercuts innovation and user choice. How incredibly inelegant!

We figured that, with a bit more browser smarts, we could do better.

to the design studio!

While all this was happening, the always amazing Tyler Close, of Web Introducer fame and also from Google, was whispering in our ears “Guys, I think you’re doing it wrong. It’s over-engineered. We can do simpler.” We all ignored him. I think that was a mistake. Tyler was right. Web Activities was over-engineered. And, I fear, Web Intents is too.

the glaring inconsistency

Web applications already have a mechanism for communicating with other web applications loaded within the same browser: postMessage. It isn’t perfect, but it works, and it is flexible enough that much innovation has been built on it. Google, Microsoft, Facebook all use it, oftentimes for embedding widgets within other pages, each in a very different way. At Mozilla, we use postMessage extensively for BrowserID, and we’ve built nice abstractions on top of it, like winchan to consistently build a message channel to a new popup window (including all IE workarounds).

postMessage is a very simple, very Webby, and very generative: it’s easy to build new ideas on top of it. It doesn’t care about mime types, dialogs, callbacks, etc. It’s just a simple, authenticated message channel. The only reason postMessage isn’t enough to do what we need is that the sender and receiver are, for the most part, tightly coupled. The sender has to specify its receiver, which means the user can’t easily step in and substitute the endpoint of her choice. postMessage tightly couples the sender and receiver of the channel. We’d like a loose coupling, where the user gets to mix and match senders and receivers.

So wait, if that’s the only gap, then why are we proposing a completely different approach to cross-application messaging? Why should tight and loose coupling of messaging channels be implemented in completely different ways? Given that the postMessage abstraction has been so successful and useful, the “right” way to move forward is to tweak it, minimally, not to redesign a different stack.

a minimalist way forward

A minimalist way forward is to use postMessage as is, and to provide only the bits necessary to enable loose coupling.

Here, Tyler comes to the rescue again. He proposed, in one of the last chats we had with Google, using custom protocol handlers as the target of postMessage channels. So, when a major news site wants to share an article, rather than postMessage’ing (or linking) to http://twitter.com/, it can use the one-indirection-away URL share://.... The browser can then jump in and substitute the user’s preferred implementation of a sharing provider at that custom protocol handler. Everything else, linking or communicating via postMessage, is then the same.The only difference is, there’s one level of indirection to give the user a chance to step in and say “that service, please.”

What’s even more interesting is that we already have basic mechanisms for sites to register themselves as custom protocol handlers: registerProtocolHandler. The current mechanisms aren’t quite good enough yet, but the tweaks we would need are far simpler than building a whole new messaging stack. Mozilla’s own Austin King has prototyped what some of these tweaks might look like using a JavaScript shim, and the results are surprisingly useful with only minor tweaks.

another minimalist approach

There’s also Ian Hickson’s proposal, which is a little bit different than using protocol handlers and has some nice properties. It’s quite similar to Tyler’s proposal in one key way: do the smallest amount of work to set up a message channel, and get out of the way. Mark Hammond has prototyped Ian’s proposal, and it looks like it can be nicely shimmed in pure JavaScript (with just one tweak to the API that’s probably worth considering even for the native implementation.) I like this proposal, too, and I wonder if it could be made to work with custom protocol handlers, which have a nice URL-based architecture.

so now what?

I propose that we stop for a second on the Web Intents discussion and ask ourselves: maybe we’ve been over-engineering this. Maybe we don’t need mime types and new HTML elements and new DOM properties, etc. Maybe there’s a much easier, good-enough solution, based on proven technology, with only minor tweaks to well-understood code paths. It won’t be perfect, we’ll probably need some JS libraries to make things more convenient for developers, but that’s okay. That’s better for the Web. Keep the platform simple, leave the real innovation to the edges.

I believe Web Intents, as currently proposed, are over-engineered. So are Web Activities. But it’s not too late to correct course. Let’s figure out the simplest way to involve the user in choosing an application, set up a message channel, and get out of the way.

Post navigation

I can’t speak for the Intents team here on Chrome, but I can say this: the web didn’t win because it provided the simplest thing (to implement), it won because it provided a simple thing *to use*.

Intents make the classic web tradeoff: they take much implementation effort off of end-developers in return for limited flexibility. Yes, you eventually want that flexibility back, but that’s not a great place to start if you’re looking for adoption of a thing. It’s only ideal if you’re looking for quick/easy implementation. And I don’t think that’s a good deal for web authors.

I think you’re arguing a strawman: I’m not saying that we should make it simpler because the current proposal is too hard to implement by browser vendors. That’s simply not a consideration.

I’m saying that a simpler core web platform API is generally better for the Web, because it can accomodate more use cases that we haven’t anticipated yet. When I firs saw postMessage, I didn’t like it. Too simple, not enough features, needs more structure, I thought. And then we saw a flurry of fascinating applications built on top of it. Developers were able to make use of it directly, and when that wasn’t enough, to build awesome abstractions on top of it for their use case. Had we taken those abstractions and make those the actual web platform API earlier on, we almost certainly would have lost out on some of the more innovative uses of postMessage (including for example Google’s Belay Project, and Mozilla’s BrowserID.)

So we have great evidence that postMessage works really well as a baseline browser feature. I’m simply arguing that we should take a similar approach to the loosely-coupled inter-app communication requirement.

I’m not arguing the strawman you think I am. I am, instead, suggesting that there is a siren call in the “oh, we’ll just ship a very small primitive and let everyone build on it” approach. It comes in two flavors.

First, introducing exclusively a low-level API means that higher-order agreements are drastically less likely than not. For instance, if we had simply provided the ability to handle click events, render text, and navigate to a new page (instead of the anchor tag), we’d be faced with a world where the web *might* have existed but *probably* wouldn’t. To the extent that intents capture much higher level descriptions of what developers (*ahem*) intend, they also contain more descriptive, discoverable, referential value.

Both approaches Give Good Demo (TM), but it’s much simpler to get to a good demo if you go primitive only, which leads to an implicit bias on the part of implementers; often justified on the grounds that “we can do this simple thing and just add what people need later”. In reality, this is an ENORMOUS slog. Which brings me to the second siren:

You won’t iterate. Not really. High level things are easier to de-sugar than add later. The current debates around classes in JS are but one example of this. By creating the expectation that developers *should* be carting their own abstractions around (in un-searchable, undiscoverable imperative code), you create a culture which perpetuates the current group. That group, composed of people who were gnarly enough to get by in the spartan world naturally spurn such luxurious and wasteful additions. And besides, those additions are probably Doing It Wrong (TM). You show me how Mozilla is going to envolve from WebGL back to something like O3D and you’ll change my mind. But AFAICT, it’s not in anyone’s cards.

There’s a final point about shared language and vocabulary which is related to the first, but I won’t burden your eyeballs with it here as I’m sure you can extrapolate it for yourself.

For all of these reasons, then, it is better to start by *at least* introducing your high-level thing *along with* your low-level thing. Low-level only is a bug which is hard to fix. High-level only is a bug which is much easier to fix. Arguing for “simplicity” without paying attention to adoption economics (i.e., explaining things in terms of “simple primitives”) is to re-commit the XML mistake: nobody cares about schemas and all that junk…at least not up front. They just want the code that’ll get their job done.

The platform can, therefore, afford to do the big low-level reveal over time. it can’t afford to put off the high-level construct.

As a “bottom 95%” web developer I like a postMessage approach because it’s something I have already used and there’s plenty of example code from which I can learn. Extending that existing feature would make it easier for mortal web developers like me, IMO.

The core misunderstanding is that you think I’m arguing this from the point of view of the browser vendor. I know I work for Mozilla, but I actually don’t work on Firefox. So the complexity of shipping this feature is not even on my radar. I’m purely arguing about what I think is a good API for web developers.

The core disagreement has to do with the level of readiness of an API. In my opinion, it’s not clear that Web Intents is a good API for developers. It is quite a bit different than other browser APIs, it introduces new synchronous behavior (window.intent), and it introduces a new HTML element. All of that is very heavy-handed, and that seems like over-design. If we build that into the web platform, we can never take it away.

It sounds like you think the higher-level APIs should be built into the browser. I understand where you’re coming from, but I disagree specifically with that point.

I agree with Alex that the main drawback to this approach is that it doesn’t address the main problem head-on. The main problem isn’t in establishing packet transport, it is in the semantics of the exchanged messages. You can set up inter-page transport with nothing but iframes, /etc/host entries, and DOM manipulation, but it’s no great surprise that an ecosystem hasn’t arisen around the brute feasibility of communication.

Some more intuition pumps: the transport problem isn’t simply an inter-page problem. Requirements on web intents are to facilitate communication from pages to the browser, from pages to local applications, from system events to web pages. An RPC-style transport is a lot easier to map in these ways than something built on registerProtocolHandler and postMessage.

In general, I think that is true: if you start at the level of messages, an RPC-style implementation is the simplest mapping of the concept to API. I agree it is tempting to re-use existing transports, but I disagree that RPH-based discovery has the kind of adoption to exert this kind of gravitational pull on other APIs. In addition, the trust semantics of postMessage aren’t particularly amenable to the kinds of demands on web intents. Intuition pump: postMessage is more like a blood transfusion, whereas web intents is more like the exchange of business cards.

Thanks for clearing up the misunderstanding, although I never said that *you* were in love with the idea of a simple implementation. My read was that you were more attracted to the conceptual simplicity of abstractions built on primitives. I instead said that there is an implicit bias on the entire community of people who build browsers for this sort of a solution. Yes, Chrome engineers are included in this.

As for the disagreement, you simply didn’t engage in the points I raised about how you’re cutting high-level agreement out of the picture in all likelihood. Instead, you’re back-stopping your arguments with some appeal to how “heavy handed” it is to use HTML to introduce a fundamentally new semantic into the platform.

HTML is what we expose as a high-level point of agreement for things that we observe to be common and which we want to be searchable. How do intents (and/or whatever you want to enable) not qualify? Surely you’re not simply arguing that because it borders on the declarative (which is in a different standards kingdom) we shouldn’t attempt it?

Again, I’d like to get your response to the meta-points: if you only ship a low-level thing you drastically reduce the set of people who can make effective use of it in the first place, destroy the ability for semantic intent to be captured by search engines, and slow down the web with ever-growing piles of imperative code.

Note, again, I’m not arguing against a low-level description. See my work with Web Components and Shadow DOM to get a flavor for where I’m coming from. I absolutely want layering in the platform…but buttom-up is a dangerous game which we have lost before. And you’re presenting no way out. Why should we take your approach seriously?

This discussion of low level versus high level is a red herring. In both RePo and Web Intents there is an arbitrary message body whose semantics are coordinated using a string identifier. They have the same semantic information. The main difference is that RePo leverages the existing Web Platform; whereas Web Intents is a new stack, transplanted to the Web.

Turning regular web authors into protocol builders without a declarative form is *absolutely* different in kind than providing a declarative form in which those string indentifiers can be globally coordinated and understood.