Apache part I

Abstract:

This Article about the most used web server Apache
is divided in two parts. In the first part I describe in short
the history of the World Wide Web and the second part is an
introduction to the HTTP protocol.

_________________ _________________ _________________

History

The concept of HTTP client and server has been developed by people
working at the CERN (Centre Européen de Recherche
Nucléaire).
Once their research job was completed, they gave it to an American
university (NSCA).
I guess that a number of people would be amazed to see that the basis of
the modern World Wide Web were created by European people (and particularly the French
people).

Apache is the name of a free
WEB server project. The name Apache has a slightly contested origin,
some say it comes from "a patchy server" because of the
numerous patches in the beginning ( again a Hacker trick :) ),some
others have a much more serious explanation and say that the founders of the
project took this name following in memory of Apache tribe. A tribe
with great adaptability on the land.
It is the most used web server in the Internet. It follows HTTP
protocol (1.1), standardized by the
consortium w3.
A Netcraft survey,
made in June 1999, estimates that 60.05% of the web servers are Apache
servers.
A web server is the "server" side of the client-server model. It
answers queries from "web clients" such
as e.g the lynx web browsers ;-).

The HTTP protocol

Server and client talk to each other using the HTTP protocol (Hypertext Transfer
Protocol). The current version is
HTTP 1.1 as specified in RFC 2616This protocol is divided in two parts : the client query and the
server answer. The protocol is ASCII text based.

The query :

It is one line of text divided in 3 parts :

[query type]

[URL]

[Protocol used]

Possible queries are : GET, POST, HEAD, PUT, DEL, TRACE.

The URL is the path to what you want to see and follows the domain
name (for instance www.linuxfocus.org is the domain name and
/Francais is the URL to the welcome page for French people)

The protocol used can be HTTP/1.0 or HTTP/1.1

This basic line may also be followed by other lines to specify the query, as
we shall see for a HTTP/1.1 query.

The answer :

The answer from the server is built with a header and a body,
depending on the query type.

What does this answer say?
The first line shows the protocol used and the return
value of the server (a return value greater than 400 indicates an
error). It is followed by the date, the version of the server, the
date of the last modification of the URL (this allows the client to
know if the files in his cache are still valid). Content-Length is the
length of the answer (queries to CGI scripts do not provide this information) and
the Content-Type tells the web client the MIME type of the answer
(text, html, images ...).

This is not a complete description : some lines are still a mystery
to me ;-)
Let's see what happens when an error occurs :

[the contents of index.html from www.linuxfocus.org is then displayed ]..

What happens inside the Apache server ?
You have been connected with the telnet command to the port 80 of
www.linuxfocus.org (IP adress 195.53.25.1) (the port 80 is the default
port for the http server). The server was waiting for a query and you
wrote GET / followed by 2 carriage return.
Why those 2 carriage returns ?The empty line just signals the server that this is the
end of the query.
The server answered by sending the requested file
(index.html). The TCP/IP connection is closed at the end of the transfer.

As you can see, the language used between the client and the server
is very simple but difficulties arise when you use version 1.1 instead of 1.0
for your query:

The query with the new HTTP 1.1 protocol requires more
information fields. It is built on several lines. The added lines allow for
the transmission of more precise information and therefore
improves the quality of the communication.
This is the version 1.1 of this protocol. Apache's team has
strictly followed the new specification which provides more
functionality : authentication, virtual sites - several sites
sharing the same IP address - and so on ...

As it is done with most of the clients-servers, when the server
receive a query :

it forks a child process to answer the query ;

and the "parent process" still listens to the port 80 for
a new query.

The child answers the query.

Functionality

The main principle is that a web server can only send one
single answer back to clients. The client just sees that it sends a
query and gets back the answer.

The web server is an interface between the web client asking for an
URL (Uniform Request Locator) - this abbreviation is not the only one used, you can
also find URI, URN, It's basically all the same - and the operating system
Apache is working on. The web client sends its query and the server answers
back the page which corresponds to the requested URL.

Some queries sent by the client can't be directly answered by the
server. The server can spawn some programs
in order to do the job and returns the results : this is exactly how
the CGI-scripts (Common Gateway Interface) are working.

Conclusion

To understand how Apache is working just
try telnet on different HTTP servers. This way you can also see what
server a specific site is running as the name of the server appears
in the answer.

Talkback form for this article

Every article has its own talkback page. On this page you can submit a comment or look at comments from other readers: