Web Communications: Internet
Protocols and HTTP

Internet protocols define the format for all Internet
communications between computers. What this means is that for your computer to
talk to another computer across the Internet, both must be speaking the same
language. For file transfers, FTP (File Transfer Protocol) is used, and for Web
communications, HTTP (HyperText Transfer Protocol) is used.

The next few sections introduce the Internet protocols and discuss
how they facilitate communications on the Web. A good understanding of what's
going on between the browser and the server is essential for PHP programming,
because within the requests and responses flying back and forth from client to
server is a wealth of data you can tap into and use.

TCP/IP

The Internet is designed to provide communications between
its many interconnected nodes. Every computer or device that has an IP address
(that set of four numbers connected by dots, such as 64.71.134.49) is a node on
the Internet. The main protocol (actually a suite of networking protocols) used
to format data for transit is TCP/IP (Transmission Control Protocol over
Internet Protocol.). TCP/IP is simply a method of describing information packets (the packages of bits that are
individually transmitted across a network) so that they can be sent down your
telephone, cable, or T1-line from node to node, until they reach their intended
destination.

One advantage of the TCP/IP protocol is that it can reroute
information very quickly if a particular node or route is broken or slow. When
the user tells the browser to fetch a Web page, the browser parcels up (turns
into packets) this instruction using TCP. TCP is a transport protocol, which
provides a reliable transmission format for the instruction. It ensures that the
entire message is taken apart and packaged up correctly for transmission (and
also that it is correctly unpacked and put back together after it reaches its
destination).

Before the packets of data are sent out across the network, they
need to be addressed (they should include a source address and a destination
address in the form of an IP address). So a second protocol called HyperText
Transfer Protocol (or HTTP) puts an address label on them, so that TCP/IP knows
where to direct the information. HTTP is the protocol used by the World Wide Web
in the transfer of data from one machine to anotherâ€”when you see a URL prefixed
with http://, you know that the internet protocol being
used is HTTP. You can think of TCP/IP as the postal service that does the
routing and transfer, although HTTP is the stamp and address on the letter
(data) to ensure it gets there.

The message passed from the browser to the Web server is
known as an HTTP request. When the Web server receives
this request (the request is actually a request for a Web page or file), it
checks its stores to find the appropriate page. If it finds the page, it parcels
up the HTML contained within (using TCP), addresses these packets to the browser
(using HTTP), and sends them back across the network. If the Web server cannot
find the requested page, it issues a page containing an error message (in this
case, the dreaded Error 404: Page Not Found), it parcels up, and dispatches that
page to the browser. The message sent from the Web server to the browser is
called the HTTP response.

The HTTP Protocol

There's quite a bit more technical detail to all of this,
let's look more closely at exactly how HTTP works. When a request for a Web page
is sent to the server, it contains more than just the desired URL. There is a
lot of extra information that is sent as part of the request. This is also true
of the responseâ€”the server sends extra information back to the browser. You'll
explore these different types of information shortly.

A lot of the information that's passed within the HTTP message is
generated automatically, and the user doesn't have to deal with it directly, so
you don't need to worry about transmitting such information yourself. Although
you don't have to worry about creating this information yourself, you should be
aware that this extra information is being passed between machines as part of
the HTTP request and HTTP response because the PHP script that you write can
enable you to have a direct effect on the exact content of this information.

Whether it's a client request or a server response, every HTTP
message has the same format, which breaks down into three sections: the
request/response line, the HTTP header, and the HTTP body. The content of these three sections is dependent on whether the message
is a request or a response, so you'll examine these two cases separately.

The HTTP Request

The HTTP request that the browser sends to the Web server
contains a request line, a header, and a body. Here's an example of the request
line and header:

The Request Line

The first line of every HTTP request is the request line, which contains three pieces of information:

An HTTP command known as a method (such as GET and POST)

The path from the server to the resource that the client is
requesting

The version number of HTTP (such as HTTP 1.1)

Here's an example:

GET /testpage.htm HTTP/1.1

The method is used to tell the server how to handle the request.
The following table describes three of the most common methods that appear in
this field.

Method

Description

GET

A request for information residing at a particular URL. The
majority of HTTP requests made on the Internet are GET requests (when you click
a link, a GET request is made). The information required by the request can be
anything from an HTML or PHP page, to the output of a JavaScript or PerlScript
program, or some other executable. You can send some limited data to the
browser, in the form of an extension to the URL

HEAD

The same as the GET method, except
that it indicates a request for the HTTP header only and no data

POST

Indicates that data will be sent to the server as part of
the HTTP body (from form fields, for example). This data is then transferred to
a data-handling program 011 the Web server

HTTP supports a number of other methods, including PUT, DELETE, TRACE, CONNECT, and OPTIONS. As a rule, you'll find that these are less common;
they are therefore beyond the scope of this discussion. If you want to know more
about these, take a look at RFC 2068, which you can find at www.rfc.net.

The HTTP Request Header

The next bit of information sent is the HTTP header. This
contains details of what document types the client will accept back from the
server, including the type of browser that has requested the page, the date, and
general configuration information. The HTTP request's header contains
information that falls into three different categories:

General: Information about either the client or server, but
not specific to one or the other

Entity: Information about the data being sent between the
client and server

Request: Information about the client configuration and
different types of acceptable documents

As you can see, the HTTP header is composed of a number of lines;
each line contains the description of a piece of HTTP header information, and
its value.

There are many different lines that can comprise a HTTP
header, and most of them are optional, so HTTP has to indicate when it has
finished transmitting the header information. To do this, a blank line is
used.

The HTTP Request Body

If the POST method is used in the
HTTP request line, then the HTTP request body contains any data that is being
sent to the serverâ€”for example, data that the user typed into an HTML form
(you'll see examples of this later in the book). Otherwise, the HTTP request
body is empty, as it is in the example.

The HTTP Response

The HTTP response is sent by the server back to the client
browser, and contains a response line, a header, and a body. Here's an example
of the response line and header:

The Response Line

The response line contains only two bits of information:

The HTTP version number

An HTTP request code that reports the success or failure of
the request

The example response line,

HTTP/1.1 200 OK

returns HTTP status code 200, which represents the message OK, denoting the success of the request, and that the
response contains the required page or data from the server. If the response
line contains HTTP status code 404 (mentioned earlier in the chapter), then the
Web server failed to find the requested resource. Error code values are
three-digit numbers, where the first digit indicates the class of the response.
There are five classes of response, as shown in the following table.

Code class

Description

100â€“199

Informational; indicate that the request is currently being
processed

200â€“299

Denote success (that the Web server received and carried out
the request successfully)

300â€“399

Indicate that the request hasn't been performed because the
information required has been moved

400â€“499

Denote a client error (that the request was incomplete,
incorrect, or impossible)

500â€“599

Denote a server error (that the request appeared to be
valid, but that the server failed to carry it
out)

The Response Header

The HTTP response header is similar to the preceding request
header. In the HTTP response, the header information again falls into three
types:

General: contains information about either the client or
server, but not specific to one or the other

Entity: contains information about the data being sent
between the client and the server

Response: contains information about the server sending the
response and how it can deal with the response

Once again, the header consists of a number of lines, and uses a
blank line to indicate that the header information is complete. Here's an
example header, with the name of each line commented at the end:

The first line is self-explanatory. On the second line, Server, indicates the type of software the Web server is
running. Because this example is requesting a file somewhere on the Web server,
the information on the third line refers to the last time the requested page was
modified.

The header can contain much more information than this, or
different information, depending on what is requested. If you want to know more
about the different types of information, you'll find them listed in RFC 2068
(Sections 4.5, 7.1 and 7.2).

The Response Body

If the request was successful, the HTTP response body
contains the HTML code (together with any script that is to be executed by the
browser), ready for the browser's interpretation. If unsuccessful, a failure
code is sent.

Running PHP Scripts via an HTTP
Request

Actually, any client application (not just a browser) that
can send an HTTP request to a Web Server can activate and run a PHP program. In
fact, it's not a requirement that the file display anything to the user (meaning
you don't have to embed your code in a Web page). If a properly formatted HTTP
request is sent to the Web server, asking for a file containing PHP code, and
the file has the appropriate filename extension, the PHP program will
run.

The Web Server

If you (or your system administrator) have properly set up
the Web server software for the OS it's running on and for PHP, you can expect
that HTTP requests for files containing PHP code will be properly handled and
that your PHP programs will run.

The PHP Processing Engine

PHP is actually composed of function modules, a language
core (named the Zend engine, now out as version 2.0), and a Web server
interface. The interface allows PHP to communicate with the Web server
machine-to-machine. The function modules give PHP its many valuable
capabilities, although the Zend engine (the language core) does the hard work of
analyzing, translating, and executing the incoming code (Zend does just a little
bit more than that, but you get the idea). It's important to note that PHP is
compiled at the moment it runs, on the server, therefore making your life much
simpler by avoiding the need to precompile the code specifically for each type
of machine you expect it to run on.