Miniature Web Server

Just how small can a usable HTTP server be? The answer might surprise you!

The phenomenal growth of the Internet, and its entry into many aspects of daily life, has led to the suggestion that TCP/IP will find its way into the most humble of domestic devices. Leaving aside the marketing question as to which appliances should be web-enabled, we're faced with a fundamental technical question: how small can a web server get?

It is not my intention to get caught up in a battle to create the world's smallest Web server, since this would require the use of highly optimized machine code, and result in an implementation so inflexible as to be of little practical use. Instead, we will be looking at the underlying techniques for miniaturizing a web server, based on a microcontroller implementation in the C language. Furthermore, the web server must have the ability to control and monitor real-world I/O signals, to pave the way for its use as a net appliance.

To do this, I'll need to re-evaluate the web server from scratch. I'll be taking a fundamental look at TCP/IP from first principles, with a view to establishing the core elements that are required, and the simplest way of implementing them. I'll review:

Typical microcontroller hardware and its constraints

The network protocols needed for a web server

Implementation techniques to minimize resource usage

It is important to remember that a web server is just a method of delivering web pages (web content) to the user; there's little point in creating a server that can only deliver the most rudimentary of pages. Discussion of the web content is outside the scope of this article.

Microcontroller hardware
A microcontroller is a computer-on-a-chip that is designed for high-volume low-cost applications. When compared with a conventional CPU, there are notable differences.

Program memory
The program code is usually stored in a dedicated on-chip ROM, which is programmed by inserting the microcontroller into an external system (such as an EPROM programmer), or is programmed with the device already fitted into the final system, using a dedicated serial interface. There may be two distinct memory spaces: program memory and data memory, with severe limitations as to their usage; the data memory may be unusable for program storage, and vice-versa.
Until recently, the size of the on-chip program memory (1,000 bytes or so) would have been a problem for the kind of protocols I will be discussing, necessitating the use of an external ROM and its associated support circuitry. However, on-chip ROM sizes of 8KB and greater are now widely available, so it is possible to embed complex applications without additional external memory.
Similarly, it used to be the case that all microcontrollers had to be programmed in assembly language to make the best possible use of their slender resources. The increase in on-chip RAM size, coupled with improvements in compiler efficiency, means that a high-level language can be used, with the attendant benefits of source-code readability, maintainability, and portability.

Data memory
At the time of writing, there seems to be an unfortunate tendency amongst the microcontroller manufacturers to restrict the amount of data memory (RAM) on devices with low pin counts; that is, it is necessary to use a device with a large number of I/O pins if one requires a few kilobytes of RAM. To a certain extent, the reverse is true when dealing with protocols. A protocol handler chip needs lots of RAM (for data buffering and state-machine storage), yet may only need a few pins for data I/O and diagnostic indicators.
To fulfill the objective of being able to embed the microcontroller into an appliance, the decision was taken to use a device with a low pin count, and hence a small amount of RAM. This has profound effects on the design philosophy, as will be discussed later; for example, the total RAM size is considerably smaller than the size of the messages that will actually be sent and received. Nevertheless, the discipline of carefully scrutinizing RAM usage is a sensible one, and will still be of use when dealing with large RAM sizes.

CPU limitations
To minimize the complexity of the microcontroller CPU core, significant compromises have to be made in the instructions it implements, such as:

Native word-length restrictions. Arithmetic and logical operations may be restricted to eight bits only. The high-level language compiler will support longer word-lengths by chaining instructions, but the programmer must be aware of the speed and code-size penalties this carries.

Stack size. The processor may have a call/return stack of limited size implemented in hardware, which will restrict the depth to which function calls can be nested.

Local variables. The high-level language compiler must support local variables, even though the CPU may have no suitable hardware for this-it may have no provision for stack-based data storage or index-plus-offset addressing. The compiler can work around this by allocating a fixed memory location for each local variable, but this will make the code non reentrant, and prevent the use of recursive calls.

Pointers. C programmers are accustomed to using pointers for a wide variety of purposes, and they are particularly useful when encoding or decoding data streams. The separation of the microcontroller memory space into special-purpose areas, and the possible segmentation of those areas, can lead to considerable inefficiencies in the use of pointers, and make it impossible to use them for certain tasks. This reinforces the need for careful planning of memory usage, with particular reference to the buffering of incoming and outgoing data.

Choice of microcontroller
A very large number of microcontrollers are on the market, and I don't claim to have performed an exhaustive analysis to find the best one. The choice of the Microchip PICMicro family was dictated by personal preference based on past experience, and the PIC16C76 device (shown in Figure 1) was chosen as a good compromise between a small physical size (low pin count), and adequate on-chip peripherals, such as a bi-directional serial port that can use interrupts.

PIC16C76/16F876
The PIC16C76 has the following on-chip hardware:

8,192 14-bit words of EPROM program memory

368 bytes of RAM data memory

Eight-level hardware stack

Interrupt capability

8-bit analogue-to-digital converter (ADC) with input multiplexer

One 16-bit timer and two 8-bit timers

Two capture/compare/PWM modules

Synchronous serial port (SSP)

Universal synchronous asynchronous receiver transmitter (USART)

22 input/output pins (parallel I/O, shared with the above functions)

The PIC16F876 is essentially an enhanced version of the PIC16C76, with flash memory in place of EPROM and additional EEROM for non-volatile data storage. This is more convenient, as it can be programmed and erased in-circuit.

The memory architecture will seem fairly strange to a PC programmer. It has three completely separate memory spaces:

Read-only program memory

Read/write data memory

Dedicated hardware stack

I must stress that these spaces are absolutely separate, and are accessed using completely different addressing schemes.

Program memory
The program memory is 14 bits wide, so that every instruction can fit in a single program word, and take a single CPU cycle (four external clock cycles) to execute, with the exception of branches, which take two CPU cycles. The program memory only contains program instructions and no data-not even constant data, because there is no mechanism for accessing it. The program memory is segmented into four banks of 2,000 words each. There are specific bit-manipulation operations to switch between banks, which are automatically inserted by the compiler, with the limitation that one function cannot straddle two banks.

The inability to put constant data in the program space (or, more specifically, the inability of the CPU to read any such data) makes it difficult to store large amounts of constant data, such as constant strings. There is a "return with a byte value" instruction, which has to be used repeatedly to form the string from a series of character-value instructions. Such strings have awkward properties; they can't be accessed by pointers, and must be copied into RAM before use. Although the Custom Computer Services PCM compiler provides some support for this, mistakes can easily go undetected, so string constants should be avoided if possible.

Data memory
The data memory space is eight bits wide and is shared between the I/O registers and the workspace RAM. It is segmented into four banks, using bank-switching bits that are completely independent of the code-space switching. The workspace RAM occupies the memory locations that aren't taken up by I/O registers (of which there are a large number), so the 368 bytes of RAM is fragmented into the following address ranges:

In addition, the following areas are common to all banks; data written in one bank can be read in all others:

70h - 7Fh
F0h - FFh
170h - 17Fh
1F0h - 1FFh

It is fortunate that we don't have to work our way around this strange map, but can offload the job onto the compiler. However, it is possible to confuse the compiler into generating wrong code, so it is best to be cautious in the use of data pointers.

Hardware stack
There is a hardware stack for machine-code calls and returns; it is 14 bits wide (to accommodate the full address range) and eight levels deep. We have no provision for extending the stack into RAM, or detecting overruns. Nesting function calls more than eight deep will have unforeseen consequences, though this should be detected at compile-time.

Data values cannot be stored in the stack, so how are function arguments and local variables handled? The compiler assigns fixed locations in RAM for these variables, having carefully assessed the function nesting, to ensure one function won't destroy another's variables.

External memory
A web server needs ample storage for web pages, and the on-chip ROM is clearly inadequate for this. The I2C bus is a simple two-wire synchronous interface that can be used to add external devices, such as a 32KB EEPROM, which is adequate for a miniature web server.

Network interface
Mindful of the large numbers of laptop and palmtop computers with an infrared interface, and the fragility of sub-miniature serial connectors, I decided to implement an infrared interface, using the IrDA (Infrared Data Association) standards for low-level communications.

After considerable work, it became clear that the IrDA protocols weren't as low-level as I thought, and the simple task of sending IP datagrams over an infrared link demanded a significant additional programming effort, which threatened to eclipse the rest of the project. In view of this, I decided to revert to an RS-232 SLIP interface, which may be a lowest-common-denominator interface, but is still very useful for a wide range of applications.

The PIC micro has an on-chip asynchronous serial interface (USART), so the only extra network components are the level-shifters for the RS-232 voltage-levels. To allow modem interfacing, a three-wire interface is implemented, using a general-purpose output line for the output handshake.

Web server protocols
From the top down, the protocols we need for our web server are:

HTTP-document request/response

TCP-reliable communications

IP-low-level data transport

ICMP-diagnostics (ping)

SLIP-serial interface

Modem emulation

I will now review each of these, with the aim of creating a small, yet fully-functional, web server implementation.

HTTP request
The Hypertext Transfer Protocol (HTTP) defines a request-response mechanism for obtaining documents from a web server.
The web browser sends a request to the server in the form of a multi-line string, each line being terminated with Carriage Return and Line Feed (<CR> and <LF>) characters. The first line specifies an upper-case "method" (that is, command), followed by an argument string. The most common method is "GET," followed by a filename to be fetched, and a protocol version identifier. Subsequent lines contain additional information about the browser configuration:

If we're keen to keep memory usage to a minimum, the question must be asked: what use is the additional information? We don't care about the type of browser, our server hasn't got sufficient resources to maintain a cache or an access log, and if the file we're sending has an unacceptable character set, there's nothing that can be done about it. Even the HTTP version number on the first line isn't needed; we're planning to use the simplest HTTP interface anyway.

It would seem that we can chop off the remainder of the command after the filename, without losing any functionality, but what is the maximum length of the filename? Surprisingly, this is largely under the control of the server, for the following reason: when the user wishes to access the server for the first time, an IP address is entered into a browser window, such as:

http://172.16.1.2

The web client (browser) locates the given address, and submits the request to that server using a null filename:

GET / HTTP/1.0

By convention, most web servers interpret this as a request for the default index file, which is INDEX.HTM or INDEX.HTML. This, in turn, contains pointers to other files on the server. While the user clicks on pages we've provided, they will only be requesting filenames we've defined, so if we keep these short, we won't have to handle any long filenames.

A possible exception to this rule would occur if we included any HTML forms on our server. When a form is submitted, all the state-information is appended to the filename, making a much longer string. Various other difficulties are associated with forms-handling on a very small web server, so for the time being, we'll assume that forms aren't being used.

HTTP response
The response from the server to the client consists of an HTTP header, and, if the request succeeded, the document itself. The header consists of several text lines, each terminated with a <CR> <LF> delimiter. The header is separated from the document's contents by a single blank line.

As a minimum, the header must identify the HTTP protocol version, the success or failure status, and the content-type of the document (plain text, HTML, GIF graphic). For example:

Unfortunately, it isn't possible to send out the same HTTP header for all files; it must be adapted to reflect the file's contents. A minimum list of file formats would be:

text/plain
text/html
image/gif

though it would be highly desirable to add other formats to the list.
If the client's request fails, an appropriate HTTP error message must be sent out, and it is also desirable to send out a document explaining the problem, so that the browser has something to display. The simplest explanation could be in the form of a plain text document, which has no fancy formatting:

HTTP/1.0 404 Not found
Content-type: text/plain

File 'abc.htm' not found

TCP
To convey the HTTP request and response between client and server, a reliable communications channel is required. This is provided by Transmission Control Protocol (TCP), which provides a reliable logical connection between two endpoints on the network, known as sockets. The objective of TCP is to make the network connection appear as transparent as possible. Regardless of the network type, or the distances involved, data should be transferred between sockets in as timely and error-free fashion as possible.

TCP sockets
A socket is an endpoint of a network connection that acts as a source and sink of connection data. Each active socket is implicitly linked to an application that sends and receives this data. In the case of a web client, the application is a browser; in the case of a web server, the application is an HTTP server as described in the previous section.
Aside from the IP (Internet Protocol) addresses of client and server, the other parameters that define a socket are the port numbers. In the case of servers, a port number defines the service being offered; for example, a web server should only respond to incoming requests on port number 80.

At any one time, a server may support several simultaneous transactions, each of which involves a unique client-server socket pair. Clients will frequently open up several simultaneous connections to the same server, in order to fetch several items in parallel, such as the graphical items on a web page. To save on resources, a web server can restrict the number of simultaneous connections to one, but this leads to a very sluggish response, even when in use by only one client. If the client attempts to fetch, say, a page of text and three graphic images simultaneously, the server can do one of two things:

Ignore the request. The TCP client will retry after about 1.5 seconds; the next retry time doubles to three seconds, the one after that to six seconds, and so on. If several images are being requested, there can be an unacceptably long wait until the last one is successfully obtained.

Reject the request. If a TCP reset is sent, the client will quickly retry the request; I have observed retry rates of around two per second, for 40 seconds. This is a lot of extra traffic for the serial link to handle, and will only succeed in slowing up the data transfer yet further.

Ideally, we would respond to as many simultaneous network requests as the network bandwidth would permit; the problem is how to do this, without requiring a large amount of socket storage.

Passive open
Convention dictates that a server application must passively open a TCP socket, before exchanging data through it. This model is derived from standard implementations on multi-user systems, where there is a strong separation between the system's TCP code and the user's application code. To fit in the microcontroller, our application code will have to be tightly coupled to the TCP stack, so the distinction between the two becomes blurred.

There is no point in maintaining the fiction of passive open; if a network frame arrives, and the server has the resources to handle it, then it should do so.

Sequence space
To control and monitor the establishment of a connection, transfer of data, and closure of a connection, each TCP transmission (segment) is identified by a TCP sequence number, which refers to its position in an imaginary sequence space. The start and end of a transaction (known as SYN and FIN) can be seen as fixed points in this space; using the sequence number, the recipient can place an incoming segment in its rightful location within that space, and detect whether it forms a logical progression from the last segment received, is a duplicate of a previous segment, or is a future segment received out of sequence.

The sequencing process is symmetrical; both client and server use sequence numbers to place their transmitted segments in the outgoing sequence space and send acknowledgment numbers to confirm the point they have reached in the (completely separate) incoming sequence space. Figure 2 shows a sample data transfer, with 120 hex bytes being sent in two unequal-sized blocks. In addition to the actual 32-bit sequence number, the relative sequence number is shown in brackets; I've used the convention that the first data byte has a relative sequence number of zero, which means that the first synchronizing byte has a value of --1.

At the start of any transaction, the 32-bit sequence number must be set to a completely new value, to avoid confusion with past transactions. The good news is that, within certain constraints, the server can choose any 32-bit starting sequence value it likes. This suggests it might be possible to use the sequence number as a kind of file pointer, indicating the current location of the file (in ROM) being sent. The bad news is that the sequence value must be chosen before the first segment of the new transaction is sent. At this time, the client
hasn't yet revealed the filename to be accessed, so it isn't possible to choose a value that is convenient for accesses to that file.

A lesser option, which is still very useful, is to choose a sequence value that reflects the relative position within the file. This has already been done in Figure 2; the least-significant word of the sequence number has the following hex values:

So long as the file size is less than 64KB (a reasonable assumption, given the ROM size limitations of our miniature web server), this technique will result in a useful simplification to our code.

Of course, we have no control over the client's choice of sequence number, which makes it impossible to use a similar trick with that. However, the client's request string is sufficiently short that it should fit within a single frame, so we have only the one incoming data frame to worry about; if any more arrive, they can be discarded.

Managing connections
It is traditional to view the opening and closing of a TCP connection in terms of the state diagram transitions. This implies we need to maintain an individual state machine for each simultaneous connection (that is, each socket), which will consume a lot of RAM.

To solve this problem, it is worth bearing in mind that the web server always responds to client HTTP requests; it never takes the initiative. If we could also guarantee that it only ever responded to TCP segments, rather than initiating them, then a massive simplification in the TCP stack would result. We wouldn't have to store any node addresses or port numbers, since these could just be copied from the incoming to the outgoing message. By careful choice of initial sequence number (as discussed in the previous section), we can deduce the location in the current file by examining the incoming acknowledgment value, so we'd always know what action to take next. We'd be creating an implicit, rather than explicit TCP state machine, using the incoming acknowledgment value as a state variable. Such implementation is known as "stateless" TCP, because the server doesn't keep any state information about the client.
This looks promising, but two more problems need to be solved:

Current filename. If the server doesn't keep a record of which client requested which file, how does it know what to send next? It may know the relative position within the file from the sequence number, but if it doesn't know the filename, this isn't a lot of use.

Retransmissions. A normal TCP stack will retransmit a TCP segment if it doesn't receive an acknowledgement within a certain time. If our stack doesn't store any information about its clients, it won't ever be able to retransmit anything without being prompted.

My initial attempt to solve the first problem was a fiendishly clever plan to use the least-significant bits of the sequence number to indicate the filename (or more precisely, the index number in a file directory). So long as the file is sent in blocks of, say, 64 bytes and the front is padded with a variable-length HTTP message that depends on the filename, then...well, work it out for yourself. The disadvantage of this technique is the inflexibility of having to send out fixed-length blocks, which is a real nuisance when generating web pages dynamically.

To solve the second problem, it is tempting to rely on the client's retry mechanisms, but this doesn't work. The server can't rely on receiving an acknowledgment for every segment, since some may have been lost in transit. If the server stops sending data (because it failed to see an acknowledgement), it will be a long time (two hours) before the client's "keepalive" timer triggers it to send a "keepalive probe" to see if the server has crashed. That is a long time to wait; the client application will have abandoned the connection long before.

A simple solution to both problems is to restrict the outgoing page to one TCP data block (segment), and to send it out as soon as the client's HTTP request has been received.

One-segment pages
We are dealing with a small web server, with extremely limited resources, so the idea of fitting a web page into a single TCP segment isn't quite as crazy as it sounds. True, this may force web designers to be less lavish in their use of page embellishments, but is that necessarily a bad thing? A small web server has a small amount of information to convey, and padding it out unnecessarily is pointless.

We have adopted the usual maximum SLIP size of 1,006 bytes, the IP and TCP headers are a minimum of 20 bytes each, so our maximum TCP data size is 966 bytes. It is remarkable how much can be achieved within this limitation.

The key advantage of one-segment pages is that a one-to-one relationship exists between the actions of the client and the server; this relationship is reinforced if the closing FIN is piggybacked onto the page data.

TCP segment format
A TCP header plus data block is known as a "segment." The format is shown in Figure 4.

Destination port. We'll be checking the destination port field of every incoming TCP segment to see if it refers to a service we support; at present there are only two of these:

Port 13: daytime service
Port 80: HTTP server

The daytime service returns a simple string giving the current date and time. It is by no means essential for a web server to provide this, but is nevertheless a useful step on the road to creating an HTTP server, since the TCP transaction is simpler and easier to debug.

Sequence and acknowledgement numbers. The 32-bit sequence and acknowledgment numbers have already been discussed. 32-bit arithmetic is a problem on the PIC, since the CPU only supports 8-bit operations directly. Our chosen compiler provides no support for 32-bit data types, so we have to create all our own functions. If we assume that the incoming and outgoing data is less than 64KB, then a useful simplification is to only perform 16-bit operations on the 32-bit values, propagating the carry value to the upper 16 bits.

Header length. The header length reflects the length of the standard header plus the options; there is no length value for the data, since this can be deduced from the value in the lower protocol layer (IP).