Speechifying
[This is the kind of fuzzy crap that I just have to get out of my system.
It's the opposite of the clarity & conciseness needed for the thesis.
Instead, this is a hypothetical address for the talk I'm giving at the
O'Reilly conference in May.]
I'd like to speak to you today about a new way to integrate software
across the Internet. If we choose to view software modules as hosts on
a network, then we can apply the same principles networking
researchers used to internetwork decentralized LANs to "internetwork"
applications.
Specifically, I want to introduce a new product that makes this style of
integration easy: an application-layer router. I believe that such a
"router" is the missing element that completes the software industry's
current rush to address Internet-scale integration, under the moniker
of "Web Services."
Now, you may ask yourselves, "Don't we already have ways -- too many
ways! -- to build distributed software already?' Remote Procedure Calls,
dataflow graphs, mobile code, and so many other architectural styles
come readily to mind as tools to connect software systems across the
Internet.
However, almost all these architectural styles assume that the Internet
is merely an extension of the LAN. After all, an IP packet is an IP
packet, whether inside or outside the firewall, whether across a LAN or
a dialup modem.
But there's a reason why, thirty years later, we are in the happy
position of using IP "all the way down"; it wasn't always the case.
Originally, there were a slew of competing network protocols:
AppleTalk, DECnet, Novell IPX, and so on. IP was invented anew to
address the new challenge of networking networks. No single network
protocol was appropriate for unifying all the others.
Until IP, network protocols were tightly coupled to link-layer hardware
and operating systems. IP, by contrast, had to address three new
challenges: scaling across time, space, and organizational boundaries.
First, IP had to be stable for a long time. To let new computers,
network adapters, and operating systems invented years apart -- now
decades! -- to "speak" IP compatibly, a neutral team had to hammer out
simple, concrete specifications that could stand the test of time.
Second, IP had to work across larger spaces: continents, not just
campuses or cities. To accommodate the widely varying latency,
bandwidth, and jitter of all sorts of communications links, IP was
designed as an asynchronous "store-and-forward" network. Signaling
techniques appropriate for a few meters of copper cable running
Ethernet simply wouldn't work across the Atlantic.
Finally, IP had to work across organizational boundaries. Different
organizations' networks had very different ways to identify hosts,
users, terminals, files -- you couldn't even assume everyone used 8-bit
bytes back then! That's why internetworking required yet another new
namespace: IP addresses, and later, DNS names and so on. Conversely,
IP did not include security; it left concepts such as users, passwords,
and encryption to applications running "on top."
And yet, thirty years later, you're likely to get fired if you deploy
anything besides IP. The experiment that was merely intended to
integrate separate LANs took over as the LAN format, too; while so
many of the once-dominant LAN protocols IP struggled to
accommodate are nearly-extinct curiosities.
So what does this tale about networks have to do with software
architectures?
I claimed that today's state of the art for developing distributed
software merely treats the Internet as a slower kind of LAN, since "it's
all IP all the way down." We're still vainly trying to provide the illusion
of a single, large-scale von Neumann computer out of all these
distributed parts.
Instead, I claim there are brand-new concerns that arise at Internet-
scale. We need decentralized software that can cope with vastly larger
scales across time, space, and organizations. And to that end, I want
to tell you why traditional middleware doesn't match up to these
challenges -- and that we can apply networking concepts that can.
To begin with, I'd like to illustrate the weakness of current software
integration technology, and compare it to SOAP. When software
components are separated by time, I mean to consider the challenge
of interoperability when components are written years apart, by
separate teams. With technologies such as CORBA IIOP, Microsoft
DCOM, Java RMI, or TIBCO's information bus, the messages sent are in
a fragile, binary, and, often, proprietary format. This implies tight
coupling of components in terms of vendor, language, and interface
versions. Using SOAP and WSDL loosens these couplings, since these
standards leverage XML to allow Web Services to be called from any
platform, and to allow interfaces & data formats to evolve gracefully.
When software components are separated by distance, this translates
back into time, or latency. Traditional RPC and event-based
integration systems assume that the network is reliable and low-
latency. For example, TIBCO relies on IP multicast to announce events
to all nodes. Multicast is only efficient at LAN-scale. By contrast, a
nomadic laptop may go days without connecting to the Internet. SOAP
acknowledges this challenge by allowing many different kinds of
transport, such as SMTP (email). While the calling application may still
block, as with a traditional RPC, SOAP at least allows developers to
loosen coupling in time, and hence account for wider geographical
range.
Finally, consider what happens when software components are
separated by organizational boundaries. In the travel industry, a "day"
means any 24-hour period for car rentals; but only a single evening in
a hotel. A reservation service will have to explicitly handle these
variances; this is the bulk of the multi-billion dollar EAI (enterprise
application integration) industry. The Web Services vision, in contrast,
is for an intelligent actor to look up the relevant schema for both
services in a UDDI directory and at least translate miles into
kilometers, if not also the contractually distinct definitions of "day" in
the car rental and hotel industries.
So what's the problem, then?
It would seem that, yes, there are limitations to using today's
integration technology at Internet-scale, but that SOAP, WSDL, UDDI,
and the rest of the menagerie of Web Services technologies are
sufficiently evolved successors to them that we will be able to
successfully integrate software across the Internet.
I believe these technologies are only one half of the solution. On the
surface, the analogy would seem to hold: the IP packet format and
TCP protocols are all we needed to network networks; shouldn't SOAP
messages be all we need to network software?
But there was a complementary concept implied by the very nature of
IP packets: the IP router. Literally, the IP specifications don't define or
require a construct called a "router", but the router was the device
that unleashed the full potential of IP to actually interconnect LANs.
Similarly, I claim SOAP routers are necessary to actually unleash the
full potential of Web Services.
But that's skipping steps, my claiming that we need SOAP routers. Let
me begin by explaining what they are and how they work. Also, in
stepping back from the industry hype and of-the-moment buzzwords,
I'll put aside any mention of SOAP and specific technologies for the
time being.
Theoretically, most integration models abstract software components
as miniature machines. Machines have control levers and input feeds;
we know how to chain them together in sequence, or nest them, as in
the very word, "subroutine." These machines are tightly coupled, in
that the output of one must be directly and immediately fed to the
next. Furthermore, "next" is itself well defined, unfortunately allowing
machines to rely on the exact implementation of the others.
To decouple these factories, "brokers" emerged to buy, warehouse,
and sell intermediate goods. Integration models adopted the same
abstraction. Object Brokers, as a generic category of middleware,
allowed the invoking machine to dynamically bind to the "next." This
deferred a range of choices to run-time, such as directing which actual
computer to invoke the command on; enforcing that the caller had the
appropriate security credentials; and queuing invocations to mask
transient connectivity failures. Even more capable variants of Object
Brokers incorporated Transaction Monitors, so that even distributed
invocations could be modeled as atomic invocations.
My colleague Roy Fielding continued in this vein to catalog a wide
range of architectural styles for distributed software integration.
Ultimately, he synthesized a new style he suggests best represents
the power of the Web: REST (Representational State Transfer).
To quote Dr. Fielding: "The central feature that distinguishes the REST
architectural style from other network-based styles is its emphasis on
a uniform interface between components."
His conclusion is where we'll begin.
Just as the HTTP-based Web provides a uniform interface for accessing
and transmitting any hypermedia resource, I claim SOAP-based Web
Services provide a uniform interface for invoking and responding to
any software component. The key addition we're making this time is
that, unlike hypermedia transfer, software components require
asynchronous messaging, since we need to encompass both RPC and
event-based integration styles.
[OK, so I broke my rule about buzzwords. It's a draft! ;-]
Assume I'll come back later and defend why I believe SOAP is REST
applied to software integration. What new powers do we gain by
stipulating this? I'm proposing that even third parties can add "ilities"
-- reliability, availability, scalability, security, extensibility, and
visibility -- to REST services without modifying the services or callers.
One of the most powerful, and underappreciated, implications of
HTTP's proxy support is the potential to compose active proxies to
extend the Web. Content can be tailored to various devices;
advertising can be stripped out (or inserted); identities can be
anonymized; protocols can be gatewayed; and so on. The key is that
third parties can assemble custom proxy chains of fourth party
services, all without modifying the origin server or user-agent.
Similar implications hold for SOAP intermediary support. To date,
these have been little-used in the early phases of RPC and
distributed-objects applications. However, the lessons learned in HTTP
have made SOAP's intermediary support even more powerful, most
notably by adding the mustUnderstand attribute. It affords forward-
compatibility for coping with future actors and headers.
A subtler lesson is in the social construction of proxies. For the
hypermedia Web, user-agents presumed there was a single,
permanent proxy, typically for caching or content-filtering. We simply
didn't envision dynamically calculating purpose-built chains of proxies
for given transactions. I can testify to that personally because of the
long-running failure of a proposal I made at W3C called PEP that called
for just that.
But for the services Web, SOAP has laid the groundwork for per-
transaction intermediation. Microsoft's Henrik Frystyk Nielsen has
gone so far as to propose Web Services routing and referral standards
along these lines.
Indeed, the new challenge is systematizing our ability to string
together intermediaries at will. If we can call a multi-hop, multi-path
composition of active proxies a route, it sounds reasonable that a
device that automatically calculates and enacts such routes should be
called a router.
So having named it, what does it actually do, and how does it work?
A router is a device that, given a symbolic name, resolves it into an
address(es) of communication paths one layer below for onward
delivery. An IP router maps IP addresses into the MAC address(es) of a
LAN adapter. An application-layer router maps resource names into
application protocol messages.
For example, presenting an application-layer router with the
document "party tonight!" at the URL /Rohit/announcements might be
resolved into specific onward URLs such as
mailto:adam@knownow.com, ftp://fred@mit.edu/inbox, and
http://roy-s-webserver/logger.cgi , if there were three such routing
rules, or "subscriptions" for each listening service.
How would such a router be implemented? Once again, I appeal to
Layer 3 precedents. The key to internetworking many different LAN
protocols is that rather than translating them directly, each one is
mapped to IP as an intermediate form. So a Layer-3 IP router would
have several kinds of LAN adapter cards, and upon receipt of any
packet, it would internally convert it to IP format, store away its copy,
and then indicate to the input LAN that the data had been consumed.
Then, at some later time, if the router hadn't been forced to throw it
away due to memory exhaustion or aging, the destination addresses
for the packet would be calculated, and it would be transmitted
onward, after being translated out to a foreign LAN format if
necessary.
That is exactly how I propose a Layer-7 router ought to work. Or,
specifically, what I term a SOAP router. Just as IP provided a
metaformat for encoding many different addressing schemes,
signaling messages, and payloads, I posit that a SOAP message is
similarly flexible enough to intermediate all other major Layer-7
application protocols.
Specifically, rather than using IP addresses to identify computers, we
use WebDAV collections (directories) to identify topics. This way, FTP
directories, mailboxes, newsgroups, and SNMP devices can all be
mapped into WebDAV collections. Then, new or modified resources
within the router's topic space can be delivered onward using the
same range of supported protocols.
"OK," you might agree, warily. "So what?"
[... well from here on out, you had to be at the talk!]