Application Protocol Design

In Chapter 7,
we'll discuss the advantages of breaking complicated applications up
into cooperating processes speaking an application-specific command
set or protocol with each other. All the good reasons for data file
formats to be textual apply to these application-specific protocols as
well.

When your application protocol is textual and easily parsed
by eyeball, many good things become easier. Transaction dumps become
much easier to interpret. Test loads become easier to write.

Server processes are often invoked by harness programs such as
inetd(8)
in such a way that the server sees commands on standard input and
ships responses to standard output. We describe this “CLI
server” pattern in more detail in Chapter 11.

A CLI server with a command set that is designed for simplicity has
the valuable property that a human tester will be able to type
commands direct to the server process to probe the software's
behavior.

Another issue to bear in mind is the end-to-end design
principle. Every protocol designer should read the classic
End-to-End Arguments in System Design [Saltzer]. There are often serious questions about which
level of the protocol stack should handle features like security and
authentication; this paper provides some good conceptual tools for
thinking about them. Yet a third issue is designing application
protocols for good performance. We'll cover that issue in more detail
in Chapter 12.

The traditions of Internet application protocol design evolved
separately from Unix before 1980.[54]
But since the 1980s these traditions have become thoroughly
naturalized into Unix practice.

We'll illustrate the Internet style by looking at three
application protocols that are both among the most heavily used, and
are widely regarded among Internet hackers as paradigmatic: SMTP,
POP3, and IMAP. All three address different aspects of mail transport
(one of the net's two most important applications, along with the
World Wide Web), but the problems they address (passing messages,
setting remote state, indicating error conditions) are generic to
non-email application protocols as well and are normally addressed
using similar techniques.

Case Study: SMTP, the Simple Mail Transfer Protocol

Example 5.7 is an example transaction in SMTP
(Simple Mail Transfer Protocol), which is described by RFC 2821. In
the example, C: lines are sent by a mail
transport agent (MTA) sending mail, and S: lines
are returned by the MTA receiving it. Text emphasized like
this is comments, not part of the actual
transaction.

This is how mail is passed among Internet machines. Note the
following features: command-argument format of the requests, responses
consisting of a status code followed by an informational message, the
fact that the payload of the DATA command is terminated by a line
consisting of a single dot.

SMTP is one of the two or three oldest application protocols
still in use on the Internet. It is simple, effective, and has
withstood the test of time. The traits we have called out here are
tropes that recur frequently in other Internet protocols. If there is
any single archetype of what a well-designed Internet application
protocol looks like, SMTP is it.

Case Study: POP3, the Post Office Protocol

Another one of the classic Internet protocols is POP3, the Post
Office Protocol. It is also used for mail transport, but where SMTP
is a ‘push’ protocol with transactions initiated by the
mail sender, POP3 is a ‘pull’ protocol with transactions
initiated by the mail receiver. Internet users with intermittent
access (like dial-up connections) can let their mail pile up on
a mail-drop machine, then use a POP3 connection to pull mail
up the wire to their personal machines.

Example 5.8 is an example POP3 session. In the
example, C: lines are sent by the client,
and S: lines by the mail server. Observe the many
similarities with SMTP. This protocol is also textual and
line-oriented, sends payload message sections terminated by a line
consisting of a single dot followed by line terminator, and even uses
the same exit command, QUIT. Like SMTP, each client operation is
acknowledged by a reply line that begins with a status code and
includes an informational message meant for human eyes.

There are a few differences. The most obvious one is that POP3
uses status tokens rather than SMTP's 3-digit status codes. Of course
the requests have different semantics. But the family resemblance
(one we'll have more to say about when we discuss the generic Internet
metaprotocol later in this chapter) is clear.

Case Study: IMAP, the Internet Message Access Protocol

To complete our triptych of Internet application protocol
examples, we'll look at IMAP, another post office protocol designed in
a slightly different style. See Example 5.9; as
before, C: lines are sent by the client, and S: lines by the
mail server. Text emphasized like this is
comments, not part of the actual transaction.

IMAP delimits payloads in a slightly different way. Instead of
ending the payload with a dot, the payload length is sent just before
it. This increases the burden on the server a little bit (messages
have to be composed ahead of time, they can't just be streamed up
after the send initiation) but makes life easier for the client, which
can tell in advance how much storage it will need to allocate to
buffer the message for processing as a whole.

Also, notice that each response is tagged with a sequence label
supplied by the request; in this example they have the form A000n,
but the client could have generated any token into that slot. This
feature makes it possible for IMAP commands to be streamed to the
server without waiting for the responses; a state machine in the
client can then simply interpret the responses and payloads as
they come back. This technique cuts down on latency.

IMAP (which was designed to replace POP3) is an excellent
example of a mature and powerful Internet application protocol design,
one well worth study and emulation.

[54] One relic of this pre-Unix history is that Internet
protocols normally use CR-LF as a line terminator rather than
Unix's bare LF.