A Simple Client with a Twist

If you are creating networked applications in Python, there is a powerful,
open source tool that can help you. The Twisted framework had its big
debut during the 2003 Python Conference in Washington, DC (PyCon DC 2003). Twisted takes
care of many networking details, thus reducing the amount of development work
required, particularly for complex systems. Here is another implementation of
our simple client, this time using Twisted.

In particular, note the error handling function, which is passed to the
Twisted framework. Such "hooks" for error handling are very important, if the
system is to be extensible. Dealing with errors is the topic of the next
section. Make sure to read it before writing your own applications.

One very notable characteristic of the Twisted example is evident in the
last line of the code. Twisted's event loop resembles that of the Tkinter
example shown previously. In Tkinter, the event loop drives the GUI,
responding to inputs from the user. Twisted processes network events in a
similar way; like a waiter in a restaurant, the framework can keep track of
several "customers" (connections), responding when each one asks for service
(i.e., issues an event). This is called asynchronous I/O, and will be
covered in more detail in the next article.

The Importance of Error Handling

The admonition to handle error cases in your code is heard so often, you may
have learned to ignore it. In networked applications, however, error
handling is far more critical than what you may be used to.
Network-related errors are frequent, even commonplace. In such an environment,
a "quick and dirty" utility will not survive at all.

Fortunately, there are simple things that can be done to deal with errors in
network applications. We will discuss them in this section. In addition, we
will modify our original simple client to handle common
error conditions.

Notes on Security

Security concerns itself with a very specific kind of error: a deliberate
assault on your system by a remote attacker. This is a monumental subject that
can easily become the focus of your entire life. Nevertheless, any network
application ignores security only at great peril. The following list provides
several guidelines that should keep you reasonably safe in simple
situations.

Run your program with the lowest possible privileges that still allow
it to work. Even a client performs actions on your system in response to
untrusted information received over the network. The less power it has, the
less damage that attackers (or buggy code) can do.

Control the sizes of things — even when using a language such as
Python. While Python should protect you against the buffer overflow errors so
common in C and C++, unexpectedly large input can still cause unpredictable
behavior as system resources are used up. For example, it is not reasonable for
a person's last name to be ten megabytes long; you should catch such conditions
early, truncate the input, or even refuse to continue processing
altogether.

Make sure things are in the right format. A country name should not
contain question marks, for example. Regular expressions are very useful
for this sort of input validation. Thus, it is generally important to
construct them carefully, so that only correctly formatted input will
match.

Choosing files to manipulate based on a network request should be
avoided. After all, a filename is basically a pointer — it refers to some
place in the filesystem, similar to the way a pointer in the C language refers
to some place in memory. Reading a file based on a network request can expose
private information, while writing a file can overwrite critical data and
compromise the system. In contrast, if you locally specify the name of a
configuration file to read, or a log file to write to, then using these locally
specified files is usually safe.

Executing another program with arguments received over the network can
lead to very serious security breaches. Consider eliminating such requirements
by design, as you would a goto statement in some languages. If you
still choose to go ahead, always analyze the input very carefully, in order
to avoid feeding malformed data to the other program. Under Unix-like
operating systems, for example, these kinds of calls are typically made via the
intervening shell program. This is especially dangerous — the
shell supports a quite-capable programming language, meaning that insufficient
input validation will actually allow an attacker to write and (possibly) execute
arbitrary code on your system.

A Simple Client with Error Checking

The following example shows our simple client, modified to perform several
important error checks.

First, we limit the size of the web page that we are prepared to read (the
MAX_PAGE_LEN variable). This is a security precaution, as
described in item 2 on the list in the previous subsection.

Next, we make sure to catch any I/O errors from the network operation. The
urllib
module raises an exception in such situations. In comparison to a
local hard drive, for example, network I/O operations fail much more
frequently. While the lower levels of networking software on your machine will
(in cooperation with remote systems) try to effect a recovery, this is not
always possible. You must therefore be prepared to deal with unrecoverable
errors yourself. In this case, we simply print an error message and exit.

Finally, we add an explicit check of whether the regular expression pattern
has matched. If there is no match, the temperature reading is not available, and an
error message is printed instead. A pattern match failure is also a clear
indication that something unexpected has been received — it is therefore
important that your code deals with these faults explicitly.

Network I/O Is Unpredictable

One of the most challenging — but fascinating — aspects of
network I/O is its unpredictability. As mentioned in the previous subsection,
such operations are not reliable; it is not unexpected for a network request to
simply fail. Sometimes, however, it does not fail in a clean, readily apparent
fashion. Instead, data transmission might start only after a lengthy delay
(high latency), proceed very slowly (low bandwidth), or
both.

Many factors, in myriad combinations, can cause such unpredictable behavior.
On the Internet, data routinely travels over very long distances — even
across continents — as it hops from one system to another towards its
final destination. Anywhere along the route, hardware failures, software
crashes, excessive network traffic, misconfigured systems, electromagnetic
interference, and many other causes can disrupt the orderly flow of data.

Unpredictability of network operations becomes a central concern for
servers. It is rarely acceptable to make all clients wait because one
particular connection is having trouble. If you write a more complex client,
such as a spider that gathers information from multiple web sites, you
will also run into this problem.

Waiting for each query to complete fully before starting the next is very
time-consuming. In addition, failing to complete a crawl of a thousand sites
just because the connection to number five on your list is "hanging" is
unlikely the desired behavior. Fortunately, a modern computer is physically
capable of handling hundreds or thousands of network operations in parallel. In
consequence, many useful strategies for concurrent network I/O have been
developed, researched, and deployed in actual systems. This will be the topic of the
next article.