Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training,
learning paths, books, tutorials, and more.

Web Transport Protocols

Clients and servers use a number of different transport
protocols to exchange information. These protocols, built on
top of TCP/IP, comprise the majority of all Internet traffic
today. The Hypertext Transfer Protocol
(HTTP) is the most common because it was designed specifically
for the Web. A number of legacy protocols, such as the
File Transfer Protocol (FTP) and Gopher,
are still in use today. According to Merit’s measurements from
the NSFNet, HTTP replaced
FTP as the dominant protocol in April of 1995.[2]
Some newer protocols, such as Secure Sockets
Layer (SSL) and the Real-time Transport
Protocol (RTP), are increasing in use.

HTTP

Tim Berners-Lee and others originally designed HTTP to be a
simple and lightweight transfer protocol. Since its
inception, HTTP has undergone three major revisions. The
very first version, retroactively named HTTP/0.9, is
extremely simple and almost trivial to implement. At the
same time, however, it lacks any real features. The second
version, HTTP/1.0 [Berners-Lee, Fielding and Frystyk, 1996], defines
a small set of features and still maintains the original
goals of being simple and lightweight. However, at a time
when the Web was experiencing phenomenal growth, many
developers found that HTTP/1.0 did not provide all the
functionality they required for new services.

The HTTP Working Group of
the Internet Engineering Task Force (IETF)
has worked long and hard on the protocol specification for
HTTP/1.1. New features in this version include persistent
connections, range requests, content negotiation, and
improved cache controls. RFC 2616 is the latest standards track
document describing HTTP/1.1. Unlike the earlier versions,
HTTP/1.1 is a very complicated protocol.

HTTP transactions use a well-defined message structure.
A message, which can be either a request or a response,
has two parts: the headers and
the body. Headers are always
present, but the body is optional. Headers are
represented as ASCII strings terminated by carriage
return and linefeed characters. An empty line
indicates the end of headers and the start of the
body. Message bodies are treated as binary data.
The headers are where we find information and directives
relevant to caching.

An HTTP header consists of a name followed by a colon and
then one or more values separated by commas. Multiword
names are separated with dashes. Header names and reserved
words are case-insensitive. For example, these are all
HTTP headers:

HTTP defines four categories of headers: entity, request,
response, and general. Entity headers describe something
about the data in the message body. For example, Content-length
is an entity header. It describes the length of the message
body. Request headers should appear only in HTTP requests
and are meaningless for responses. Host and
If-modified-since are request headers.
Response headers, obviously, apply only to HTTP responses.
Age is a response header. Finally,
general headers are dual-purpose: they can be found in both
requests and responses. Cache-control is a general header,
one that we’ll talk about often.

The first line of an HTTP message is special.
For requests, it’s called the
request line and
contains the request method,
a URI, and an HTTP version number.
For responses, the first line is called
the status-line, and it
includes an HTTP version number and
a status code that indicates
the success or failure of the request. Note that
most request messages do not have a body, but
most response messages do.

RFC 2616 defines the request methods listed in Table 1-1. Other RFCs, such as 2518,
define additional methods for HTTP. Applications may even
make up their own extension methods,
although proxies are not required to support them.
A proxy that receives a request with an unknown or
unsupported method should respond with a 405 (Method Not
Allowed) message.
The descriptions in Table 1-1 are necessarily brief.
Refer to Section 9 of RFC 2616 for full details.

Table 1-1. HTTP Request Methods Defined by RFC 2616

Method

Description

GET

A request for the information identified by the request URI.

HEAD

Identical to GET, except the response does not include a message body.

POST

A request for the server to process the data present in the message body.

PUT

A request to store the enclosed body in the named URI.

TRACE

A “loopback” method that essentially
echoes a request back to the client. It is also useful for discovering and testing proxies between the client and the server.

DELETE

A request to remove the named URI from the origin server.

OPTIONS

A request for information about a server’s capabilities or support for optional features.

CONNECT

Used to tunnel certain protocols, such as SSL, through a proxy.

For our purposes, GET, HEAD, and POST are the only
interesting request methods. I won’t say much about the
others in this book. We’ll talk more about HTTP in Chapter 2.

FTP

The File Transfer Protocol (FTP) has been
in use since the early years of the Internet (1971). The
current standard document, RFC 959,
by Postel, is very different from the original
specification, RFC 172.
FTP
consumed more Internet backbone bandwidth than any other
protocol until about March of 1995.

An FTP session is a bit more complicated than an HTTP
transaction. FTP uses a control
channel for commands and responses and a
separate data channel for actual
data transfer. Before data transfer can occur, approximately
six command and reply exchanges take place on the control
channel.
FTP clients must “log in” to a server
with a username and password. Many servers allow
anonymous access to their publicly available files.
Because FTP is primarily intended to give access
to remote filesystems, the protocol supports
commands such as CWD (change working
directory) and LST (directory listing).
These
differences make FTP somewhat awkward to implement in web
clients. Regardless, FTP remains a popular way of making
certain types of information available to Internet and web
users.

SSL/TLS

Netscape invented
the Secure Sockets Layer (SSL) protocol
in 1994 to foster electronic
commerce applications on the Internet. SSL provides secure,
end-to-end encryption between clients and servers. Before
SSL, people were justifiably afraid to conduct business
online due to the relative ease of sniffing network traffic. The
development and standardization of SSL has moved into the
IETF, where it is now called Transport Layer
Security (TLS) and documented in RFC 2246.

The TLS protocol is not restricted to HTTP and the Web. It
can be used for other applications, such as email (SMTP) and
newsgroups (NNTP). When talking about HTTP and TLS, the
correct terminology is “HTTP over TLS,” the particulars of
which are described in RFC 2818. Some people refer to it as
HTTPS because HTTP/TLS URLs use “https” as the protocol
identifier:

https://secure.shopping.com/basket

Proxies interact with HTTP/TLS traffic in one of two ways:
either as a connection endpoint or as a device in the
middle. If a proxy is an endpoint, it encrypts and
decrypts the HTTP traffic. In this case, the proxy may be
able to store and reuse responses. If, on the other hand,
the proxy is in the middle, it can only tunnel the traffic
between the two endpoints. Since the communication is
encrypted, the responses cannot be cached and
reused.

Gopher

The Gopher protocol is slowly becoming all but extinct on
the Web. In principle, Gopher is very similar to
HTTP/0.9. The client sends a single request line, to
which the server replies with some content. The client
knows a priori what type of
content to expect because each request includes an encoded,
Gopher-specific content-type parameter. The extra features
offered by HTTP and HTML made Gopher
obsolete.