Source Code Accompanies This Article. Download It Now.

Keeping servers from knowing your true identity

Marc is an Assistant Professor in the Computer Information Systems Department at Manhattan College. He can be contacted at marc.waldmanmanhattan.edu. Stefan is a Ph.D. student at the Institute for System Architecture, Dresden University of Technology, Germany. He can be contacted at sk13inf.tu-dresden.de.

The need for Internet-based anonymous communication has been well documented in the media. Unfortunately, the Internet currently lacks a deployed, general-purpose infrastructure that supports anonymous communication between distributed applications. With a little work, however, you can use existing web-based anonymity tools to support anonymous communication. In this article, we describe how to create a general-purpose request-reply anonymous communication channel by utilizing currently deployed web-based "anonymizing" tools. Channels such as these let clients and servers exchange messages in such a way that servers don't know the true identity of clients.

Admittedly, anonymous Internet-based communication is a controversial topic. As with a lot of software, there is always concern that individuals could abuse anonymity for malicious purposes. However, today's Internet-based applications make it relatively easy for third parties to track and catalog an individual's online activity, including web-browsing history, online purchases, news postings, and the like. Anonymizing tools provide individuals with some form of protection against these third parties. For more information about anonymizing tools and the need for them, see "Privacy-enhancing Technologies for the Internet," by Ian Goldberg, David Wagner, and Eric Brewer (http://www.cs .berkeley.edu/~daw/papers/privacy- compcon97-www/privacy-html.html). In this article, we examine the anonymity provided by various web-based anonymizing tools, and show how you can interface with an existing web-based anonymizing tool to create an anonymous communication channel between distributed applications. While all source code is implemented in Java, the techniques we present can be used with any programming language that supports socket-based communication.

Web-Based Anonymity

Over the past few years, a number of tools have been developed that let you anonymously retrieve web pages. By anonymously we mean that the web server does not learn the true IP address of the web-browsing client. Roughly speaking, these tools can be classified as being either mix-net-based or proxy-based. Proxy-based designs are simpler than mix-net-based designs, but usually offer a weaker form of anonymity.

Figure 1 shows the typical architecture of proxy-based web anonymity tools such as Anonymizer (http://www.anonymizer.com/) and Rewebber (http://www.rewebber.de/). The web-browsing client sends its web-page request to the proxy server. The proxy server makes the web-page request on behalf of the client. The web page that results from this request is then sent back to the requesting client. The server only learns the proxy's IP address. Clearly, this is not a strong form of anonymity, as the proxy itself knows the client and the requested web page. This information could be logged or shared with third parties. Mix-net-based anonymity tools address this problem.

Figure 2 shows the typical architecture of mix-net-based web-anonymity tools. A mix-net consists of a collection of computers that store-and-forward encrypted messages. Each computer in the mix-net is called a "mix." The web-browsing client uses a layered encryption technique to construct a web-page request message, suitable for sending over the mix-net. This message is then sent to the first mix in the mix-net. When a mix receives an encrypted message, it decrypts the message. This decryption reveals only the name of the next mix that the decrypted message should be forwarded to. All other information contained in the message is unreadable due to the layered encryption. When the last mix in the mix-net receives a message, it also decrypts it. However, this decryption reveals the URL that the client wishes to retrieve. The last mix then contacts the appropriate web server to retrieve the document specified by the URL. The retrieved document is sent back to the client over the same mix-net path that the client request tookthe last mix sends the document to the next-to-last mix, and so on.

The anonymity that can be provided by mix-net-based tools is much stronger than that of proxy-based tools because of the encryption/decryption performed by the client and the mixes. The client encrypts the initial request in such a way that each mix can fully decrypt and interpret only part of the requestthe part of the request that specifies the next mix that the request should be forwarded to. Only the first mix in the mix-net learns the IP address of the client and only the last mix in the mix-net learns the URL that the client has requested. As the document makes its way back to the requesting client, each mix in the mix-net encrypts the document, hiding the content from the other mixes.

Interfacing with Web Anonymity Tools

Although the architecture of the proxy-based and mix-net-based anonymity tools are quite different, they do share an important design characteristicthey both carry HTTP-based messages. HTTP is the protocol that governs how web browsers and web servers talk to each other. All messages sent between browsers and servers must be formatted according to the HTTP specification. All of the web-based anonymity tools we just described speak the HTTP protocolthe tools expect HTTP-formatted messages. Therefore, if you want distributed applications to utilize these anonymity tools, your applications must be capable of sending/receiving HTTP-based messages.

Figure 3 shows the HTTP message a web browser might send to Amazon.com to retrieve the file "index.html." The first line specifies the type of action the browser wishes the server to perform. The word GET is called the HTTP "method." In the context of HTTP, a method is essentially the name of a command. This HTTP method tells the web server to send the file "index.html." The second and third line of Figure 3 specify the HTTP headers, which are essentially just a collection of name-value pairs that provide additional information to the recipient of the message. The HTTP header name appears to the left of the colon and the value appears to the right. The server responds with an HTTP-based message similar to that in Figure 4. Every HTTP response begins with a line, called the "status line," which specifies the result of the corresponding request. Requests can succeed or fail. A numeric code, called the "response code," specifies the result of the request. In Figure 4, the response code is 200. A set of numeric codes have been defined by the HTTP standard. For example, the response code 200 means that the request succeeded. Immediately following the response code is a text string, called the "reason phrase," that specifies (in natural language) what the response code means. In Figure 4, the reason phrase is "OK," meaning that the request succeeded. Immediately following the status line are the HTTP headers, which serve the same purpose as those in the HTTP request messagenamely to provide information to the recipient of the message.

Following the HTTP response headers is a blank linemore specifically it is a carriage return character followed by a linefeed character (CR/LF). This line marks the end of the HTTP headers and the beginning of the response body. The response body usually contains the item that the browser requested. In Figure 4, the response body contains the HTML code that is stored in the server's index.html file (the file requested in Figure 3). The size (in bytes) of the response body must be specified in the "Content-Length" HTTP header (Figure 4). The "Content-Type" HTTP header identifies the type of data being stored in the response body. This header essentially tells the browser how to interpret the data that is stored in the response body.

The POST method is similar to the GET method in that it lets browsers request content from servers. However, it is typically used when browsers need to send data, usually from an HTML form, to servers. The data to be sent to the server follows the request HTTP headers. This data is referred to as the "request body" and is separated from the HTTP headers by a blank line (more specifically, a CR/LF). The content of the request body is not limited to ASCII textany sequence of bytes can be sent in the request body. The size (in bytes) of the request body must be specified in the "Content-Length" HTTP header.

The HTTP specification defines a number of standard HTTP headers. However, the spec does not exclude the inclusion of additional headers that might be application specific. For example, you can develop a web-based application that requires new HTTP header values. Web servers, browsers, or other programs that read HTTP-based messages are supposed to ignore headers they don't understand. Ignored headers are simply meant to be passed to the next application that processes the HTTP-based message. As the HTTP message is sent over the Internet from one computer to another, it may gain or lose HTTP headersthe exact action taken depends on the headers and the applications that process them. A header that is unknown to all computers in this forwarding chain is simply left untouched as it moves to its destination. You can use this aspect of HTTP to transport application-specific messages over web-based anonymity tools.

To send messages over web-based anonymity tools, your applications need to send/receive HTTP-based messages. However, we are only interested in a small subset of HTTPnamely, the POST method and a few headers and response codes.

Assume you are developing a system that lets students anonymously send/receive reviews of college courses. Clearly, students would be more forthcoming in their reviews if they believed the reviews could be sent anonymously. The system consists of two components: the server that stores the reviews, and the client that can send reviews to the server and retrieve reviews from the server. You want the client to contact the server using a web-based anonymizing tool. Therefore, both the server and client must send/receive HTTP-based messages.

Figure 5 shows the HTTP that the client generates to store a review on the server. The client simply creates an HTTP-based POST message that contains the server's name (www.mycourseserver.edu) and listening port (4321) in the URL. A colon must separate the server name from the port. The client knows that the server is listening for connections on port 4321. We are not utilizing port 80the port traditionally utilized by web servers. By specifying the port in the URL, we can accommodate any number of applications running on the same server, each awaiting connections on a different port. (You may need to use port 80 if the other server ports are blocked by a firewall.)

The client generates three HTTP headers. The first two are standard HTTP headers that specify the size and type of the request body. The "Anon-Message-ID" header was created specifically for our application and is not a standard HTTP header. Its purpose is to identify the type of message we are sending. This would be unnecessary if the client could only send one type of message to the server, but that is rather limiting. The request body consists of a serialized version of the object that stored the course comments. How the comments are stored in the request body is immaterialthey could just as easily be sent as ASCII text. The "Content-Length" header specifies the size, in bytes, of the request body. Therefore, the size of the request body must be computed before the HTTP headers can be generated. The client sends this HTTP-based message to the appropriate web-based anonymizing tool. This is typically done by opening a socket connection to the tool and sending the message.

Once the client's message has been received by the server, via the anonymizing tool, the server parses the HTTP message and prepares a response. Figure 6 shows the server's HTTP-based response. The first line of the response indicates that the client's request was successfully processed (response code 200). The HTTP headers indicate the size and type of the response body. We once again utilize the "Anon-Message-ID" header to identify the type of message the server is sending back to the client. The response body holds a serialized version of an object that will be sent back to the client. For example, this object could hold a username and password that clients could later use to update their course review. The response body is optionalit is presented simply to show the HTTP necessary to return an object to clients. The anonymizing tool sends the entire response back to clients.

Implementation

The Java code we present interfaces with JAP, a freely available mix-based web anonymizing tool developed by Technische Universitt Dresden (http://anon .inf.tu-dresden.de/index_en.html). Although we are using Java in our example implementation, you can use any programming language that supports sockets. Since our program will be sending/receiving HTTP-based messages, it should be relatively easy to alter the code so that it can be used with other web-based anonymizing tools.

JAP is a free mix-based anonymizing tool that lets you browse the Web anonymously by configuring your browser to use JAP as an HTTP web proxy. Setting a web proxy is a standard option on almost all web browsers. However, our programs will not be using JAP in this way. Instead, our Java program generates HTTP-based messages and sends these requests to JAP. JAP delivers these messages to recipients, using a mix network, and returns the recipient's reply.

Listing One is used to connect to JAP. This source code assumes that JAP is running on the client (localhost) and that JAP is listening for client connections on port 4000. The port number that JAP listens on can be set using JAP's user interface. The source code simply creates a socket connection to JAP.

Listing Two shows the HTTPMessageHeader class. This class contains two static methods that generate HTTP messages. The prepareRequestMessageHeader method generates the POST method and associated HTTP headers are used when a client is sending a message to a server. The terms "client" and "server" could easily be replaced by "sender" and "receiver," respectively; the methods described here are meant for peer-to-peer communication. The prepareReplyMessageHeader method is used to generate the status line and HTTP headers needed for a response message. As required by the HTTP specification, CR/LF characters are placed at the end of every line. The size (in bytes) of the request and reply body must be known at the time the headers are created. This is because the "Content-Length" HTTP header appears before the request and reply body.

Listing Three is the sendMessage method, which sends an HTTP-based message to the recipient via JAP. The code sends a serialized copy of an object of type SendObject to the recipient. First, a socket connection is opened to JAP. Next, the object to be sent to the recipient is serializedconverted to a series of bytes that can be used by the recipient to reconstruct the object. Once the object has been serialized, its size, in bytes, can be determined. The intended recipient of the message is "courseserver.edu." The port number must be appended to the server's namethe port number follows the colon. This host and port format is specified by the HTTP standard. Finally, the HTTP-based message is sent to JAP. JAP will read the message and format it for transport over the mix-net. The last mix in the mix-net sends the HTTP-based message to the intended recipient (courseserver.edu, port 1765).

The intended recipient of the message (courseserver.edu) must be listening for mix-net connections on port 1765 (based on our example code). Once the mix-net delivers the message to courseserver.edu, the HTTP-based message must be parsed to determine the message ID and the size of the enclosed object. These items can be found using Java's regular expression matching library. Given the size of the enclosed object, the recipient can read the serialized bytes and reconstruct the object. The recipient is then free to send a reply to the sender using the HTTPMessageHeader.prepareReplyMessageHeader method (in Listing Two). An example Java program that utilizes these methods is available electronically from DDJ (see "Resource Center, page 3) and at http://home.manhattan.edu/ ~marc.waldman/DDJ_JavaAnon.zip.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!