A Stanford Project

Technology

How exactly does the Great Firewall Work? In this post, we’ll discuss the technical details of China’s firewall infrastructure. You’ll get to understand of some of the technology China employs as well as it’s limitations.

First, we’ll clarify some terminology and background on how the internet works. You’ll need to understand this to get through the rest of this article. If you’re already internet and tech savvy, feel free to skip ahead.

Quick Layman’s Guide to How the Internet Works:

The internet is basically a huge pile of computers. Each time you go to your favorite website (ie. facebook.com), you’re establishing a connection with that computer. The remote computer sends you some data, which you can view via your browser. You can keep the connection with the computer open as long as you need data from it.

The dirty secret of the internet is that when you go to your favorite website, you may not be going to the same computer. Why’s that? Well, depending on where it might be more convenient for you to go to a computer that’s closer to where you are. Just as if you were shopping for your groceries, you would go to a different convenience store depending on where you lived. So really, the grocery store is better identified by it’s address, rather than it’s name. There are many Ralph’s all over the country, but there is only one a block away from your home.

Computers are much the same way. The physical location of your favorite grocery store in the computer world is known as an IP address – a unique identifying number for every machine on the internet. Most addresses are typically IPv4, which just means that they are 32 bits in length. Here’s a typical IP address.

From reading this, you might notice that the number seems a bit random. If you did, you’d be exactly right – IP addresses are very hard to remember. So instead of using IP addresses to navigate to the computers we want, we tend to use human readable domain names, like facebook.com and google.com. The side effect to doing this is that each time we want to go to the domain name we remember we must go look up the IP address before we can connect and fetch the data we want. The machine for this is called a name server.

Each of the beige boxes in the above diagram is a name server. How name servers work, in essence, is as follows:

Client gives the name server the name they remember. In this diagram, it’s laboit.net.

Name server goes and asks some other authority name servers where to go. In the diagram, this process is labelled 1, 2, and 3.

Name server returns the IP address. In this diagram, it’s 92.243.11.196.

The key piece to note is that every name server must depend on an authority name server, and trust falls completely on the authority name server. This becomes an interesting situation in the Chinese firewall, where the Chinese government controls all of the national authority name servers. We’ll talk more about this in a bit.

The Great Firewall uses three distinct types of methods to block access to websites in China. They are as follows:

IP Blocking

IP address Misdirection

Data filtering

We’ll address each of them in turn.

IP Blocking

In this method, access to a certain IP address is refused connectivity by the Chinese firewall. This is the equivalent of preventing the user from gaining any sort of access to the remote computer, since the firewall intercepts all of the data sent and received by computers within the network.

Example:

www.facebook.com -> blocked

In the above example, facebook.com maps to a known IP address (e.g 69.63.187.17) so any connection made to that location is disconnected by the firewall.

IP address Misdirection

From our primer on name servers, we learned that the internet must basically trust all authoritative name servers to give it the correct IP addresses for a given name. However, in the case of the Great Firewall, the Chinese controls the majority of the internet presence and many of the authoritative name servers. By exploiting a flaw in the naming system, the government can redirect a given domain name to whichever actual web site it would rather have the people see. This technique is often called URL hijacking.

Example:

www.mit.edu -> (www.misdirected.mit.clone.edu)

In the above example, note that mit.edu itself is based in the United States. Since mit.edu is the actual IP address, it acts as the final arbiter as to what it’s own IP address is. However, any user from within China trying to reach ww.mit.edu will have his/her data intercepted by the Great Firewall as the request leaves China. Then, before mit.edu (which is located on the other side of the world) even sees the request, the Chinese government will issue a fake response to the original request solicitor. The naming system, which is built entirely on trust, is designed to accept this response without further asking and it goes to the fake website.

Data Filtering

The Chinese government will also examine the content of the URL that a request solicitor initiates a request with as well as the data they send inside of that request. These techniques are called URL filtering and packet filtering respectively.

In the above example, a user initiates a Google search for “Tienanmen Square”. This creates a request URL with “tiananmen+square” in the URL, which is intercepted by the Chinese government and dropped. This is an example of URL filtering.

Example 2:

http://www.folfg.org/ -> blocked

In this case, the user navigates to a URL that doesn’t have any obviously discernible characteristic that ought to become blocked. However, the website itself contains information about Falun Gong: a system of beliefs and religious movement in China upon, which the government has relentlessly cracked down and prosecuted. The firewall, intercepting such information, would identify Falun Gong in the data and block further transmission.

This concludes the short primer on the technical aspects regarding the Great Wall. Feel free to comment and give your opinion.