Embedding Web Servers

As with most of my previous articles, this one grew out of a project at my $DAY_JOB. The project du-jour involves large dependency graphs, often containing thousands of nodes and edges. Some of the relationships are automatically generated and can be quite complicated. When something goes wrong, it’s useful to be able to visualize the graph.

Simple Graph:

We use GraphViz for rendering the graph, but it falls down on huge graphs. They turn into an unreadable mess of thick lines – less than useful. To work around this, we trim down the graph to just a segment, centered around one node, and display only n inputs or outputs.

This works great, except that the startup time to create the graph data can be quite long, because of all the graph processing that is necessary to make sure the information is up to date. (The actually graph rendering is quite fast, for small graphs.)

The solution? Process the data once, and render it multiple times, using, yes, you guessed it, a Web interface!

After the above “header” section, there must be a blank line, and then the bytes containing the data. There’s a lot more information that can go into the header block, but for the simple applications we will be developing, they are not needed.

You can use a telnet client to retrieve data from any Web server. You need to be careful though - many modern Web servers are virtual hosted, which means they require the Host: header in the request to retrieve the appropriate data.

Writing A Simple Web Server

The Basics

With the above information, it isn’t hard to write your own simple Web server. There are several ways to do this and a few already written on CPAN. We’re going to start from first principles though, and pretend, for the moment, we don’t know about CPAN.

A good place to start looking for client/server information is in the perlipc document. About 2⁄3 of the way through is a section on “Internet TCP Clients and Servers”. This section shows how to use simple socket commands to setup a simple server. A little further down is the section we’re interested in - it demonstrates using the IO::Socket module to write a simple TCP server. I’ll replicate that here.

If the requested file exists, then send back a HTTP header that says that, along with a content type, and then the data. (We are assuming the content type is HTML here. Most http servers figure out the content type from the extension of the file.)

Almost 50 percent of the code is for error handling, and that doesn’t even take into account the error handling we didn’t do for I/O issues. But that’s the core of a Web server, all in about 15 lines of Perl.

If you use the above code without modification, it will allow every file on your system to be read. Generally, this is a bad thing. An explanation of proper security is outside the scope of this article, but generally you want to limit access to a subset of files, located under some directory prefix. File::Spec::canonpath and Cwd::realpath are useful functions for testing this.

Single Threaded, Nonforking, Blocking

The Web server presented above is very simple. It only deals with one request at a time. If a second request is received while the first is being processed, then it will be blocked until the first completes.

There are two schemes used to take advantage of modern computers’ ability to multiprocess (run more than one thing at once.) The simplest way is to fork off a Web server process for each incoming request. Because of forking overhead, many servers pre-fork. The second method is to create multiple threads. (Threads are lighter weight than processes.)

For a simple embedded server, it isn’t much more difficult to build a forking server, but the extra work is unnecessary if it’s only going to be used by one person or with a low hit-rate. The only advantage to the forking method is that it can serve multiple pages at once. (Taking advantage of modern operating systems ability to multiprocess.)

More information on forking servers, can be found in the perlipc documentation.

With a simple modification to our loop, we can turn our Web server into a forking client:

Structure

The example server above is useful for simple reporting of generated data. Because the accept loop is closed, all processing by the main part of the program needs to be complete before the Web server is run. (Of course, actions from the Web server can trigger other pieces of the program to run.)

There are other ways to integrate a simple Web server depending on the structure of your program, but for this article, we’ll stick with the design above.

Graph Walker

Above we mentioned using GraphViz to create an embedded graph viewer. To do that we’ll use a Graph class that has some methods that will make our life easier. (There isn’t actually a class that does all this, but you can do it with a combination of Graph and GraphViz available on CPAN.)

It is outside the scope of this article to cover graph operations, but I’ve named the methods so that they should be easy to figure out. I am also going to gloss over some of the GraphViz details. They can be picked up from a tutorial.

Three Easy Steps

Define the Goal

This is the easy part.

To develop a graph browser that allows the user to click on a node to recenter the graph.

Define the URL scheme

How is the Web browser going to communicate back to the Web server? The only way it can is by requesting pages (via URLs.) For a graph browser we need two different kinds.

First, an image representing the graph. Second, a HTML page containing the IMG tag for the graph and the imagemap.

Since every node has a unique name we can use that to represent which node, and then use an extension to determine whether it is the HTML page or the graphic.

I will admit that we glossed over some of the HTML and Client Side Imagemap details – because they’re tangential to the issue of embedding a Web server into a tool. An embedded Web server is like merging the Web server, cgi script and source of the data into one program – sometimes the best way to build one is to start with a standard CGI script and use that.

More Details

Query Strings

Because our embedded Web server isn’t serving actual files off of your hard drive you have lots of flexibility as to how to parse the requested URL. In normal Web servers the common way to pass extra arguments to requested pages/scripts is by using a query string.

A URL with a query string looks like this:

http://foo.bar/thepage.html?querystring

There is a convention for passing key/value data in query strings (from HTML FORM’s for example):

http://foo.bar/page.html?keyone=dataone&keytwo=datatwo&keythree=3"

It’s easy to modify our embedded webserver template to accept query strings. Just add it to the regular expression that parses the request:

if ($request =~ m|^GET /(.+)(?:\?(.*))? HTTP/1.[01]|) {

$2 will then contain the query string. You can parse it by hand or pass it to CGI.pm or another CGI Module to parse it.

URI Escaping

Some characters have special meaning in URIs. (We’ve already seen ? and &. Others are “ (space), %, and #. See the RFC for the full list.) In order to allow them to be passed in requests they need to be escaped. Escaping a URI changes the special characters into their hex representation with a prepended %. For example, “ becomes %20.

The easiest way to perform this encoding is to use the URI::Escape module.

More Ideas

You might want to embed a Web server into a tool to display the status of a task in a complicated way. Sure, you could just write the file to disk, but that’s less fun!

If a task has multiple possible outputs, then you could use the Web server to allow the user to choose between them visually, or pop up a browser at various states of a project to let the user confirm that things are going according to plan.

Speaking of popping up, you don’t need to make the user do anything to see the results of the page. You can force the browser to do it for you.

Our example code has a port number hardcoded into it. If that port is already being used on your system, then the IO::Socket::INET::new() call will fail. An easy improvement is to loop over a range of ports, or random port numbers, until an available port is found.

Reusable Code

In some cases, I’ve avoided the use of modules in this article. There are many things that could be done with modules including argument handling (CGI.pm), URI/URL parsing (the URI family of modules), and even the HTTP server itself. (HTTP::Daemon)

The code we’ve presented here tries to go through the behind the scenes process so you know what’s going on.

For quick and dirty servers, HTTP::Daemon is probably easier to use. Here’s an example:

The above HTTP::Daemon based server is a simple, single purpose, POD->HTML converter. I provide it with the name of a pod file to parse, and every time I reload the page, it will pass it through Pod::Simple and display the HTML to the browser.

You’ll note it has the same structure as our hand-made examples, but it handles some of the nitty-gritty work for you by encapsulating it in classes.

(If you think that HTTP::Daemon is much simpler than the original, then you should see the first version of this article which used the low-level socket calls.)

In Conclusion

Embedded hardware is all the rage these days - but embedded software can be quite useful, too. By embedding a Web server into your software you gain lots of possible output options that were difficult to have before. Tables, graphics and color! (Not to mention, output can be viewed by multiple computers.) Embedded Web servers open a world of opportunities to you.

Site Map

Contact Us

License

Legal

Perl.com and the authors make no representations with respect to the accuracy or completeness of the contents of all work on this website and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. The information published on this website may not be suitable for every situation. All work on this website is provided with the understanding that Perl.com and the authors are not engaged in rendering professional services. Neither Perl.com nor the authors shall be liable for damages arising herefrom.