Application Protocol Metaformats

Just as data file metaformats have evolved to simplify serialization
for storage, application protocol metaformats have evolved to simplify
serialization for transactions across networks. The tradeoffs are a
little different in this case; because network bandwidth is more expensive
than storage, there is more of a premium on transaction economy.
Still, the transparency and interoperability benefits of textual
formats are sufficiently strong that most designers have resisted
the temptation to optimize for performance at the cost of readability.

The Classical Internet Application Metaprotocol

Marshall Rose's RFC 3117, On the Design of
Application Protocols,[55] provides an excellent overview of the
design issues in Internet application protocols. It makes explicit
several of the tropes in classical Internet application protocols that
we observed in our examination of SMTP, POP, and IMAP, and provides an
instructive taxonomy of such protocols. It is recommended
reading.

The classical Internet metaprotocol is
textual. It uses single-line requests and responses, except for
payloads which may be multiline. Payloads are shipped either with a
preceding length in octets or with a terminator that is the line
".\r\n". In the latter case the payload is
byte-stuffed; all lines that start with a period get
another period prepended, and the receiver side is responsible for
both recognizing the termination and stripping away the stuffing.
Response lines consist of a status code followed by a human-readable
message.

One final advantage of this classical style is that it is
readily extensible. The parsing and state-machine framework doesn't
need to change much to accommodate new requests, and it is easy to
code implementations so that they can parse unknown requests and
return an error or simply ignore them. SMTP, POP3, and IMAP have all
been extended in minor ways fairly often during their lifetimes, with
minimal interoperability problems. Naïvely designed binary
protocols are, by contrast, notoriously brittle.

HTTP as a Universal Application Protocol

Ever since the World Wide Web reached critical mass around 1993,
application protocol designers have shown an increasing tendency to
layer their special-purpose protocols on top of HTTP, using web servers
as generic service platforms.

This is a viable option because, at the transaction layer, HTTP
is very simple and general. An HTTP request is a message in an
RFC-822/MIME-like
format; typically, the
headers contain identification and authentication information, and the
first line is a method call on some resource specified by a Universal
Resource Indicator (URI). The most important methods are GET (fetch
the resource), PUT (modify the resource) and POST (ship data to a form
or back-end process). The most important form of URI is a URL or
Uniform Resource Locator, which identifies the resource by service
type, host name, and a location on the host. An HTTP response is
simply an RFC-822/MIME message and can contain arbitrary content to be
interpreted by the client.

Web servers handle the transport and request-multiplexing layers
of HTTP, as well as standard service types like http and ftp. It is
relatively easy to write web server plugins that will handle custom
service types, and to dispatch on other elements of the URI format.

Besides avoiding a lot of lower-level details, this method means
the application protocol will tunnel through the standard HTTP service
port and not need a
TCP/IP service port of
its own. This can be a distinct advantage; most firewalls leave port 80
open, but trying to punch another hole through can be fraught with
both technical and political difficulties.

With this advantage comes a risk. It means that your web server
and its plugins grow more complex, and cracks in any of that code can
have large security implications. It may become more difficult to
isolate and shut down problem services. The usual tradeoffs between
security and convenience apply.

RFC 3205, On the Use of HTTP As a
Substrate,[56] has good design advice for anyone
considering using HTTP as the underlayer of an application
protocol, including a summary of the tradeoffs and problems involved.

Case Study: The CDDB/freedb.org Database

Audio CDs consist of a sequence of music tracks in a digital
format called CDDA-WAV. They were designed to be played by very simple
consumer-electronics devices a few years before general-purpose
computers developed enough raw speed and sound capability to decode
them on the fly. Because of this, there is no provision in the
format for even simple metainformation such as the album and track
titles. But modern computer-hosted CD players want this information so
the user can assemble and edit play lists.

Enter the Internet. There are (at least two) repositories that
provide a mapping between a hash code computed from the track-length
table on a CD and artist/album-title/track-title records. The
original was cddb.org, but
another site called freedb.org which is probably now more
complete and widely used. Both sites rely on their users for the
enormous task of keeping the database current as new CDs come out;
freedb.org arose from a
developer revolt after CDDB elected to take all that user-contributed
information proprietary .

Queries to these services could have been implemented as a
custom application protocol on top of TCP/IP, but that would have
required steps such as getting a new TCP/IP port number assigned and
fighting to get a hole for it punched through thousands of firewalls.
Instead, the service is implemented over HTTP as a simple CGI query
(as if the CD's hash code had been supplied by a user filling in a Web
form).

This choice makes all the existing infrastructure of HTTP and
Web-access libraries in various programming languages available to
support programs for querying and updating this database. As a
result, adding such support to a software CD player is nearly trivial,
and effectively every software CD player knows how to use them.

Case Study: Internet Printing Protocol

Internet Printing Protocol (IPP) is a successful, widely implemented
standard for the control of network-accessible printers. Pointers to
RFCs, implementations, and much other related material are available
at the IETF's Printer Working
Group site.

IPP uses HTTP 1.1 as a transport layer. All IPP requests are
passed via an HTTP POST method call; responses are ordinary HTTP
responses. (Section 4.2 of RFC 2568, Rationale for the Structure
of the Model and Protocol for the Internet Printing
Protocol, does an excellent job of explaining this
choice; it repays study by anyone considering writing a new application
protocol.)

From the software side, HTTP 1.1 is widely deployed. It already
solves many of the transport-level problems that would otherwise
distract protocol developers and implementers from concentrating on
the domain semantics of printing. It is cleanly extensible, so there
is room for IPP to grow. The CGI programming model for handling the
POST requests is well understood and development tools are widely
available.

Most network-aware printers already embed a web server, because
that's the natural way to make the status of the printer remotely
queryable by human beings. Thus, the incremental cost of adding IPP
service to the printer firmware is not large. (This is an argument
that could be applied to a remarkably wide range of other
network-aware hardware, including vending machines and coffee makers[57]
and hot tubs!)

About the only serious drawback of layering IPP over HTTP is
that the protocol is completely driven by client requests. Thus there
is no space in the model for printers to ship asynchronous alert
messages back to clients. (However, smarter clients could run a
trivial HTTP server to receive such alerts formatted as HTTP requests
from the printer.)

BEEP: Blocks Extensible Exchange Protocol

BEEP (formerly BXXP) is a generic protocol machine that competes
with HTTP for the role of universal underlayer for application
protocols. There is a niche open because there is not as yet
any other more established metaprotocol that is appropriate for truly
peer-to-peer applications, as opposed to the client-server
applications that HTTP handles well. A project
website provides access to standards and open-source
implementations in several languages.

BEEP has features to support both client-server and peer-to-peer
modes. The authors designed the BEEP protocol and support library so
that picking the right options abstracts away messy issues like data
encoding, flow control, congestion-handling, support of end-to-end
encryption, and assembling a large response composed of multiple
transmissions,

Internally, BEEP peers exchange sequences of self-describing
binary packets not unlike chunk types in
PNG. The design is tuned
more for economy and less for
transparency
than the classical Internet protocols or HTTP, and might be a better
choice when data volumes are large. BEEP also avoids the HTTP problem
that all requests have to be client-initiated; it would be better in
situations in which a server needs to send asynchronous status messages
back to the client.

BEEP is still new technology in mid-2003, and has only a few
demonstration projects. But the BEEP papers are good analytical
surveys of best practice in protocol design; even if BEEP itself fails
to gain widespread adoption, the papers will retain considerable tutorial
value.

XML-RPC, SOAP, and Jabber

There is a developing trend in application protocol design
toward using XML within
MIME to structure
requests and payloads. BEEP peers use this format for channel
negotiations. Three major protocols are going the XML route
throughout: XML-RPC and SOAP (Simple Object Access Protocol) for
remote procedure calls, and Jabber for instant messaging and presence.
All three are XML document types.

XML-RPC is very much in the Unix spirit (its author observes
that he learned how to program in the 1970s by reading the original
source code for Unix). It's deliberately minimalist but nevertheless
quite powerful, offering a way for the vast majority of RPC
applications that can get by on passing around scalar
boolean/integer/float/string datatypes to do their thing in a way that
is lightweight and easy to understand and monitor. XML-RPC's type
ontology is richer than that of a text stream, but still simple and
portable enough to act as a valuable check on interface
complexity. Open-source implementations are available. An excellent
XML-RPC home page points to
specifications and multiple open-source implementations.

SOAP is a more heavyweight RPC protocol with a richer type
ontology that includes arrays and C-like structs. It was inspired by
XML-RPC, but has been plausibly accused of being an overdesigned
victim of the second-system effect. As of mid-2003 the SOAP
standard is still a work in progress, but a trial implementation in
Apache is tracking the
drafts. Open-source client modules in Perl, Python, Tcl, and Java are readily discoverable by a Web
search. The W3C draft specification is available on the Web.

XML-RPC and SOAP, considered as remote procedure call methods,
have some associated risks that we discuss at the end of Chapter 7.

Jabber is a peer-to-peer protocol designed to support instant
messaging and presence. What makes it interesting as an application
protocol is that it supports passing around XML forms and live
documents. Specifications, documentation, and open-source
implementations are available at the Jabber
Software Foundation site.