Main menu

Top changes in Tor since the 2004 design paper (Part 2)

This is part 2 of Nick Mathewson and Steven Murdoch's series on what has changed in Tor's design since the original design paper in 2004. Part one is back over here.

In this installment, we cover changes in how we pick and use nodes in our circuits, and general anticensorship features.

5. Guard nodes

We assume, based on a fairly large body of research, that if an attacker controls or monitors the first hop and last hop of a circuit, then the attacker can de-anonymize the user by correlating timing and volume information. Many of the security improvements to path selection discussed in this post concentrate on reducing the probability that an attacker can be in this position, but no reasonably efficient proposal can eliminate the possibility.

Therefore, each time a user creates a circuit, there is a small chance that the circuit will be compromised. However, most users create a large number of Tor circuits, so with the original path selection algorithm, these small chances would build up into a potentially large chance that at least one of their circuits will be compromised.

To help improve this situation, in Tor 0.1.1.2-alpha, the guard node feature was implemented (initially called "helper nodes", invented by Wright, Adler, Levine, and Shields and proposed for use in Tor by Øverlier and Syverson). In Tor 0.1.1.11-alpha it was enabled by default. Now, the Tor client picks a few Tor nodes as its "guards", and uses one of them as the first hop for all circuits (as long as those nodes remain operational).

This doesn't affect the probability that the first circuit is compromised, but it does mean that if the guard nodes chosen by a user are not attacker-controlled all their future circuits will be safe. On the other hand, users who choose attacker-controlled guards will have about M/N of their circuits compromised, where M is the amount of attacker-controlled network resource and N is the total network resource. Without guard nodes every circuit has a (M/N)2 probability of being compromised.

Essentially, the guard node approach recognises that some circuits are going to be compromised, but it's better to increase your probability of having no compromised circuits at the expense of also increasing the proportion of your circuits that will be compromised if any of them are. This is because compromising a fraction of a user's circuits—sometimes even just one—can be enough to compromise a user's anonymity. For users who have good guard nodes, the situation is much better, and for users with bad guard nodes the situation is not much worse than before.

6. Bridges, censorship resistance, and pluggable transports

While Tor was originally designed as an anonymous communication system, more and more users need to circumvent censorship rather than to just preserve their privacy. The two goals are closely linked – to prevent a censor from blocking access to certain websites, it is necessary to hide where a user is connecting to. Also, many censored Internet users live in repressive regimes which might punish people who access banned websites, so here anonymity is also of critical importance.

However, anonymity is not enough. Censors can't block access to certain websites browsed over Tor, but it was easy for censors to block access to the whole of the Tor network in the original design. This is because there were a handful of directory authorities which users needed to connect to before they could discover the addresses of Tor nodes, and indeed some censors blocked the directory authorities. Even if users could discover the current list of Tor nodes, censors also blocked the IP addresses of all Tor nodes too.

Therefore, the Tor censorship resistance design introduced bridges – special Tor nodes which were not published in the directory, and could be used as entry points to the network (both for downloading the directory and also for building circuits). Users need to find out about these somehow, so the bridge authority collects the IP addresses and gives them out via email, on the web, and via personal contacts, so as to make it difficult for the censor to enumerate them all.

That's not enough to provide censorship resistance though. Preventing the censor from knowing all the IP addresses they need to block to block access to the Tor network will be enough to defeat some censors. But others have the capability to block not only by IP address but also by content (deep packet inspection). Some censors have tried to do this already and Tor has, in response, gradually changed its TLS handshake to better imitate web browsers.

Impersonating web browsers is difficult, and even if Tor perfectly impersonated one, some censors could just block encrypted web browsing (like Iran did, for some time). So it would be better if Tor could impersonate multiple protocols. Even better would be if other people could contribute to this goal, rather than the Tor developers being a bottleneck. This is the motivation of the pluggable transports design which allows Tor to manage an external program which transforms Tor traffic into some hard-to-fingerprint obfuscation.

7. Changes and complexities in our path selection algorithms

The original Tor paper never specified how clients should pick which nodes to use when constructing a circuit through the network. This question has proven unexpectedly complex.

Weighting node selection by bandwidth

The simplest possible approach to path construction, which we used in the earliest versions of Tor, is simply to pick uniformly at random from all advertised nodes that could be used for a given position in the path. But this approach creates terrible bandwidth bottlenecks: a server that would allow 10x as many bytes per second as another would still get the same number of circuits constructed through it.

Therefore, Tor 0.0.8rc1 started to have clients weight their choice of nodes by servers' advertised bandwidths, so that a server with 10x as much bandwidth would get 10x as many circuits, and therefore (probabilistically) 10x as much of the traffic.

(In the original paper, we imagined that we might take Morphmix's approach, and divide nodes into "bandwidth classes", such that clients would choose only from among nodes having at least the same approximate bandwidth as the clients. This may be a good design for peer-to-peer anonymity networks, but it doesn't seem to work for the Tor network: the most useful high-capacity nodes have more capacity than nearly any typical client.)

Later, it proved that weighting by bandwidth was also suboptimal, because of nonuniformity in path selection rules. Consider that if node A is suitable for use at any point in a circuit, but node B is suitable only as the middle node, then node A will be considered for use three times as often as B. If the two nodes have equal bandwidth, node A will be chosen three times as often, leading to it being overloaded in comparison with B. So eventually, in Tor 0.2.2.10-alpha, we moved to a more sophisticated approach, where nodes are chosen proportionally to their bandwidth, as weighted by an algorithm to optimize load-balancing between nodes of different capabilities.

Bandwidth authorities

Of course, once you choose nodes with unequal probability, you open the possibility of an attacker trying to see a disproportionate number of circuits -- not by running an extra-high number of nodes -- but by claiming to have a very large bandwidth.

For a while, we tried to limit the impact of this attack by limiting the maximum bandwidth that a client would believe, so that a single rogue node couldn't just claim to have infinite bandwidth.

In 0.2.1.17-rc, clients switched from using bandwidth values advertised by nodes themselves to using values published in the network status consensus document. A subset of the authorities measure and vote on nodes' observed bandwidth, to prevent misbehaving nodes from claiming (intentionally or accidentally) to have too much capacity.

Avoiding duplicate families in a single circuit

As mentioned above, if the first and last node in a circuit are controlled by an adversary, they can use traffic correlation attacks to notice that the traffic entering the network at the first hop matches traffic leaving the circuit at the last hop, and thereby trace a client's activity with high probability. Research on preventing this attack has not yet come up with any affordable, effective defense suitable for use in a low-latency anonymity network. Therefore, the most promising mitigation strategies seem to involve lowering the attacker's chances of controlling both ends of a circuit.

To this end, clients do not use any two nodes in a circuit whose IP addresses are in the same /16 – when we designed the network, it was marginally more difficult to acquire a large number of disparate addresses than it was to get a large number of concentrated addresses. (Roger and Nick may have been influenced by their undergraduacy at MIT, where their dormitory occupied the entirety of 18.244.0.0/16.) This approach is imperfect, but possibly better than nothing.

To allow honest node operators to run more than one server without inadvertently giving themselves the chance to see more traffic than they should, we also allow nodes to declare themselves to be members of the same "family", such that a client won't use two nodes in the same family in the same circuit. (Clients only believe mutual family declarations, so that an adversary can't capture routes by having his nodes claim unilaterally to be in a family with every node the adversary doesn't control.)

8. Stream isolation

Building a circuit is fairly expensive (in terms of computation and bandwidth) for the network, and the setup takes time, so the Tor client tries to re-use existing circuits if possible, by sending multiple TCP streams down them. Streams which share a circuit are linkable, because the exit node can tell that they have the same circuit ID. If the user sends some information on one stream which gives their identity away, the other streams on the same circuit will be de-anonymized.

To reduce the risk of this occurring, Tor will not re-use a circuit which the client first used more than 10 minutes ago. Users can also use their Tor controller to send the "NEWNYM" signal, preventing any old circuits being used for new streams. As long as users don't mix anonymous and non-anoymous tasks at the same time, this form of circuit re-use is probably a good tradeoff.

However, Manils et al. discovered that some Tor users simultaneously ran BitTorrent over the same Tor client as they did web browsing. Running BitTorrent over Tor is a bad idea because the network can't handle the load, and because BitTorrent packets include the user's real IP address in the payload, so it isn't anonymous. But running BitTorrent while doing anonymous web browsing is an especially bad idea. An exit node can find the user's IP address in the BitTorrent payload then trivially de-anonymize all streams sharing the circuit.

Running BitTorrent over Tor is still strongly discouraged, but this paper did illustrate some potential problems with circuit reuse so proposal 171 was written, and implemented in Tor 0.2.3.3-alpha, to help isolate streams which shouldn't share the same circuit. By default streams which were initiated by different clients, which came from SOCKS connections with different authentication credentials, or which came to a different SOCKS port on the Tor client, are separated. In this way, a user can isolate applications by either setting up multiple SOCKS ports on Tor and using one per application, or by setting up a single SOCKS port but using different username/passwords for each application. Tor can also be configured to isolate streams based on destination address and/or port.

Proposal 171 sure is an interesting read, and implements very nice stream separation measures. Looking forward to Tor 0.2.3.3-alpha releasing so I can try them out!
.
Specifically, I want to separate my RSS news reader streams from my TorBrowser streams. Obviously, IsolateDestPort won't work for this.
.
So I'm looking forward to trying out IsolateSOCKSUser option. Then launching the RSS reader with Torsocks and assigning it a randomly generated (UUID), to use as the username for SOCKSPort. I plan to use the "torlaunch" bash script, listed in proposal 171, to accomplish this. Brilliant!
.
I can't thank everyone enough at Tor Project, for all the hard work they've put into this project. Thank you for sharing your hard work with the rest of the world!

"If the location of the computer cannot be determined, for example in the case of Tor-hidden services, the police are not required to submit a request for legal assistance to another country before breaking in."

"If it became law, this proposal would allow Dutch police to launch direct attacks against international cloud computing services. It would allow Dutch police to use exploits and malware against privacy systems like the Tor network, endangering the hundreds of thousand of people who use Tor clients every day, and those who publish hidden services in the network. It would, in short, allow Dutch police to use the methods of cyberwar to enforce Dutch law on people living anywhere in the world."

If this were enacted, initial targets would no doubt include Tor hidden services hosting content objectionable to someone in the Dutch government. Next, content offensive to someone in the Syrian government. Then, content offensive to anyone who hires a "malware as a service" company founded in Holland.

And what about peer to peer services such as i2p which is bundled with Tails? Isn't it likely that in the near future, clog plods who are fully aware that most users of p2p software are consistently law-abiding citizens might nonetheless break into all their computers for the purpose of searching out and destroying the computers of the persons they suspect might actually have done something wrong?

Civil libertarians are very familiar with the kind of mission creep which has seen laws enacted "to combat terrorism and cyberporn" are quickly exploited to target political bloggers, persons accused of downloading pirated media, persons accused of inadequate effort to recycling, persons who simply happen to live in the "wrong" neighborhoods, and other far less offensive figures than the late Mr. Bin Laden.

Some of us have long urged Tor to modify that part of its mission statement which rules out attempting to defend against "a global adversary". Because in the modern world, everyone in the world is potentially a target of the new breed of cyberwarriors and paramiitary e-SWAT cybercops.

Hi! There's a new alpha release available for download. If you build Tor from source, you can download the source code for 0.3.3.2-alpha from the usual place on the website. Packages should be available over the coming weeks, with a new alpha Tor Browser release some time in February.

Remember, this is an alpha release: you should only run this if you'd like to find and report more bugs than usual.

Tor 0.3.3.2-alpha is the second alpha in the 0.3.3.x series. It introduces a mechanism to handle the high loads that many relay operators have been reporting recently. It also fixes several bugs in older releases. If this new code proves reliable, we plan to backport it to older supported release series.

Changes in version 0.3.3.2-alpha - 2018-02-10

Major features (denial-of-service mitigation):

Give relays some defenses against the recent network overload. We start with three defenses (default parameters in parentheses). First: if a single client address makes too many concurrent connections (>100), hang up on further connections. Second: if a single client address makes circuits too quickly (more than 3 per second, with an allowed burst of 90) while also having too many connections open (3), refuse new create cells for the next while (1-2 hours). Third: if a client asks to establish a rendezvous point to you directly, ignore the request. These defenses can be manually controlled by new torrc options, but relays will also take guidance from consensus parameters, so there's no need to configure anything manually. Implements ticket 24902.

Major bugfixes (netflow padding):

Stop adding unneeded channel padding right after we finish flushing to a connection that has been trying to flush for many seconds. Instead, treat all partial or complete flushes as activity on the channel, which will defer the time until we need to add padding. This fix should resolve confusing and scary log messages like "Channel padding timeout scheduled 221453ms in the past." Fixes bug 22212; bugfix on 0.3.1.1-alpha.