Content

This is an introduction to writing CGI programs in the
C language.
The reader is assumed to know the basics of C as well how
to write simple
forms
in
HTML
and to be able to install CGI scripts on
a Web server.
The principles are illustrated with very simple examples.

Two important warnings:

To avoid wasting your time, please check—from applicable local
doc­u­ments or by contacting local webmaster—whether you can
install and run CGI scripts written in C on the server.
At the same time, please check how to do that in detail—specifically,
where you need to put your CGI scripts.

This document was written to illustrate the idea of CGI
scripting to C pro­gram­mers. In practice, CGI programs are usually
written in other lan­guages, such as
Perl, and for good reasons:
except for very simple cases, CGI programming in C is clumsy
and error-prone.

As my document
How to write HTML forms briefly explains,
you need a server side-script in order to use HTML
forms reliably. Typically, there are simple server-side scripts
available
for
simple, common ways of processing form submissions, such as sending
the data in text format by E-mail to a specified address.

However, for more advanced processing, such as collecting data into
a file or database, or retrieving information and sending it back,
or doing some calculations with the submitted data, you will
probably need to write a server-side script of your own.

If someone suggests using JavaScript as an alternative to CGI,
ask him to read my
JavaScript and HTML: possibilities and caveats.
Briefly, JavaScript is inherently unreliable at least if not
“backed up”
with server-side scripting.

The above-mentioned
How the web works: HTTP and CGI explained
is a great tutorial.
The following introduction of mine is just another attempt to present
the basics; please consult other sources if you get confused or need
more information.

Multiplication results

Assume that you type 4 into one input field and
9 into another and then invoke sub­mis­sion—typically,
by clicking on a submit button. Your browser will
send, by the HTTP protocol, a request to the server at
www.cs.tut.fi. The browser pick up this server name
from the value of ACTION attribute where it
occurs as the host name part of a URL.
(Quite often, the ACTION attribute refers, often using
a relative URL, to a script on the same server as the document
resides on, but this is not necessary, as this example shows.)

When sending the request, the browser provides additional information,
specifying a relative URL, in this case/cgi-bin/run/~jkorpela/mult.cgi?m=4&n=9
This was constructed from that part of the ACTION value
that follows the host name, by appending a question mark
“?” and the form data in
a specifically encoded format.

The server to which the request was sent
(in this case, www.cs.tut.fi)
will then process
it according to its own rules. Typically, the server’s configuration
defines how the relative URLs are mapped to file names and which
directories/folders are interpreted as containing CGI scripts.
As you may guess, the part cgi-bin/ in the URL causes
such interpretation in this case. This means that instead of just
picking up and
sending back (to the browser that sent the request) an HTML document
or some other file, the server invokes a script or a program
specified in the URL (mult.cgi in this case) and passes
some data to it (the data m=4&n=9 in this case).

It depends on the server how this really happens.
In this particular case, the server actually runs the (executable)
program in the file mult.cgi in the subdirectory
cgi-bin of user
jkorpela’s home directory. It could be something
quite different, depending on server configuration.

The often-mystified abbreviation CGI, for
Common Gateway Interface, refers just to a convention on how
the invocation and parameter passing takes place in detail.

Invocation means different things in different cases.
For a Perl script, the server would invoke a Perl interpreter and
make it execute the script in an interpretive manner. For an
executable program, which has typically been produced by a compiler
and a loader from a source program in a language like C, it would
just be started as a separate process.

In order to set up a C program as a CGI script, it needs to
be turned into a binary executable program. This is often problematic,
since people largely work on Windows whereas servers often run some
version of UNIX or Linux. The system where you develop your program
and the server where it should be installed as a CGI script may have
quite different architectures, so that the same executable does not
run on both of them.

This may create an unsolvable problem. If you are not allowed to
log on the server and you cannot use a binary-compatible system
(or a cross-compiler) either,
you are out of luck. Many servers, however, allow you log on and use
the server in interactive mode, as a “shell user,”
and contain a C compiler.

You need to compile and load
your C program on the
server (or, in principle, on a system with the
same architecture, so that binaries produced for it are executable
on the server too).

Normally, you would proceed as follows:

Compile and test the C program
in normal interactive use.

Make any changes that
might be needed for use as a CGI script.
The program should read its input
according to the intended
form sub­mis­sion method. Using the default GET method,
the input is to be read from the environment variable.
QUERY_STRING.
(The program may also read data from files—but these must then
reside on the server.)
It should generate output on the standard output stream
(stdout) so that it starts with suitable HTTP
headers. Often, the output is in HTML format.

Compile and test again. In this testing phase,
you might set the environment variable QUERY_STRING
so that it contains the test data as it will be sent as form
data. E.g.,
if you intend
to use a form where a field named foo
contains the input data,
you can give the commandsetenv QUERY_STRING "foo=42" (when using the tcsh shell)
orQUERY_STRING="foo=42" (when using the bash shell).

Check that the compiled version is in a format that
works on the server. This may require a recompilation.
You may need to log on into the server computer (using
Telnet, SSH, or some other terminal emulator) so that
you can use a compiler there.

Upload the compiled and loaded program, i.e. the executable binary program
(and any data files needed) on the server.

Set up a simple HTML document that contains a form for
testing the script, etc.

You need to put the executable into
a suitable directory and name it according to server-specific conventions.
Even the compilation commands needed here might differ from what
you are used to on your workstation.
For example, if the server runs some flavor of Unix and has the
Gnu C compiler available, you would typically use a compilation
command like gcc -o mult.cgi mult.c and
then move (mv) mult.cgi to a directory with
a name like cgi-bin.
Instead of gcc, you might need to use cc.
You really need to check local
instructions for such issues.

The filename extension .cgi
has no fixed meaning in general. However,
there can be server-dependent
(and operating system dependent) rules for naming executable files.
Typical extensions for executables are .cgi
and .exe.

As usual when starting work with some new programming technology,
you should probably first make a trivial program work.
This avoids fighting with many potential problems at a time and
concentrating first on the issues specific to the environment,
here CGI.

You could use the following program that just prints
Hello world but preceded by HTTP headers as required by
the CGI interface. Here the header specifies that the data is plain
ASCII text.

After compiling, loading, and uploading, you should be able
to test the script simply by entering the URL in the browser’s
address bar. You could also make it the destination of a normal
link in an HTML document. The URL of course depends on how you set
things up; the URL for my installed Hello world
script is the following:http://www.cs.tut.fi/cgi-bin/run/~jkorpela/hellow.cgi

For forms that use METHOD="GET" (as our
simple example above uses, since
this is the default),
CGI specifications say that the data is passed to
the script or program in an environment variable called
QUERY_STRING.

It depends on the scripting or programming
language used how a program can access the value of an environment
variable. In the C language, you would use the
library function getenv (defined in the
standard library stdlib) to access the value as
a string. You might then use various techniques to pick up data
from the string, convert parts of it to numeric values, etc.

The output from the script or program to “primary output
stream” (such as stdin in the C language) is handled
in a special way. Effectively, it is directed so that it gets sent
back to the browser. Thus, by writing a C program that it writes
an HTML document onto its standard output, you will make that document
appear on user’s screen as a response to the form submission.

As a disciplined programmer, you have probably noticed
that the program makes no check against integer overflow, so it
will return bogus results for very large operands. In real life,
such checks would be needed, but such considerations would take us
too far from our topic.

Note: The first printf function call prints out
data that will be sent by the server as an HTTP header.
This is required for several reasons, including the fact that
a CGI script can send any data (such as an image or a plain text file)
to the browser, not just HTML documents.
For HTML documents, you can just use the printf function
call above as such; however, if your
character encoding is different from ISO 8859-1 (ISO Latin 1),
which is the most common on the Web, you need to replace
iso-8859-1 by the
registered name of the encoding (“charset”) you use.

I have compiled this program and saved the executable program
under the name mult.cgi in my directory for CGI scripts
at www.cs.tut.fi.
This implies that any
form with
action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi"
will, when submitted, be processed by that program.

Consequently, anyone could write a form
of his own with the same ACTION attribute and pass
whatever data he likes to my program. Therefore, the program
needs to be able to handle any data.
Generally, you need to check the data before starting to process it.

The idea of METHOD="POST"

Let us consider next a different processing for form data.
Assume that we wish to write a form that takes a line of text
as input so that the form data is sent to
a CGI script that
appends the data to a text file on the server.
(That text file could be readable by the author of the form and the
script only, or it could be made readable to the world through another
script.)

It might seem that the problem is similar to the
example considered above; one would just need
a different form and a different script (program).
In fact, there is a difference. The example above can be regarded
as a “pure query” that does not change the
“state of the world.”
In particular, it is “idempotent,” i.e. the same form data could
be submitted as many times as you like without causing any problems
(except minor waste of resources). How­ever, our current task needs
to cause such changes—a change in the content of a file that is
intended to be more or less permanent. Therefore, one should use
METHOD="POST". This is explained in more detail in
the document
Methods GET and POST in HTML forms - what’s the difference?
Here we will take it for granted that
METHOD="POST" needs to be used and we will consider the
technical implications.

For forms that use METHOD="POST",
CGI specifications say that the data is passed to
the script or program
in the standard input stream (stdin), and the
length (in bytes, i.e. characters) of the data is passed
in an environment variable called
CONTENT_LENGTH.

Reading input

Reading from standard input sounds probably simpler than
reading from an environment variable, but there are complications.
The server is not required to pass the data so that
when the CGI script tries to read more data than there is, it would
get an end of file indi­ca­tion! That is, if you read e.g. using
the getchar function in a C program, it is undefined
what happens after reading all the data characters; it is not guaranteed
that the function will return EOF.

When reading the input, the program must not try to read more
than CONTENT_LENGTH characters.

Sample program: accept and append data

A relatively simple
C program
for accepting input via CGI and
METHOD="POST" is the following:

Essentially, the program retrieves the
information about the number of characters in the input
from value of the
CONTENT_LENGTH environment variable.
Then it unencodes (decodes) the data, since the data arrives in the
specifically encoded format that was already men­tioned.
The program has been written for a form where the text input field
has the name data (actually, just the length of the name
matters here). For example, if the user typesHello there!
then the data will be passed to the program encoded asdata=Hello+there%21
(with space encoded as + and exclamation mark encoded
as %21). The unencode routine in the program
converts this back to the original format. After that,
the data is appended to a file (with a fixed file name),
as well as echoed back to the user.

Having compiled the program I have saved it as collect.cgi
into the directory for CGI scripts. Now a form like the following
can be used for data submissions:

Form for submitting data

Please notice that anything you submit here will become
visible to the world:

Your input (80 chars max.):

Form for checking submitted data

The content of the text file to which the submissions are
stored will be displayed as plain text.

Even though the output is declared to be plain text, Internet Explorer
may interpret it partly as containing HTML markup. Thus, if someone enters
data that contains such markup, strange things would happen.
The viewdata.c program takes this into account
by writing the NUL character ('\0') after each occurrence
of the greater-than character lt;, so that it will not
be taken (even by IE) as starting a tag.

Further reading

You may now wish to read
The CGI specification,
which tells you all the basic details about CGI. The next step is
probably to see what the
CGI Programming FAQ contains. Beware that it is
relatively old.

The C language was originally designed for an environment where
only ASCII characters were used. Nowadays, it can be
used—with caution—for processing 8-bit
characters. There are various ways to overcome the
limitation that in C implementations,
a character is generally an
8-bit quantity. See especially the last section in
my book
Unicode Explained.