Posted
by
timothy
on Friday November 14, 2014 @02:51PM
from the keep-him-on-the-line dept.

An anonymous reader writes A former researcher at Columbia University's Network Security Lab has conducted research since 2008 indicating that traffic flow software included in network routers, notably Cisco's 'Netflow' package, can be exploited to deanonymize 81.4% of Tor clients. Professor Sambuddho Chakravarty, currently researching Network Anonymity and Privacy at the Indraprastha Institute of Information Technology, uses a technique which injects a repeating traffic pattern into the TCP connection associated with an exit node, and then compares subsequent aberrations in network timing with the traffic flow records generated by Netflow (or equivalent packages from other router manufacturers) to individuate the 'victim' client. In laboratory conditions the success rate of this traffic analysis attack is 100%, with network noise and variations reducing efficiency to 81% in a live Tor environment. Chakravarty says: 'it is not even essential to be a global adversary to launch such traffic analysis attacks. A powerful, yet non- global adversary could use traffic analysis methods [] to determine the various relays participating in a Tor circuit and directly monitor the traffic entering the entry node of the victim connection.'

There's just one problem:responses. If I send data to B and B never sends data back, then that's clearly junk data. If I send data to B and B immediately sends data back then that's clearly junk data unless B is a hidden service. Apply this to every node B talks to (and the nodes they talk to) and it's readily apparent which ones are actually having a conversation.

Yep, you can't beat simple traffic analysis. How come we aren't don't doing more of that on government/corporate communications? I mean, turnabout is fair play, no? We might not know the content of the secret deals they make with the terrorists behind our backs, but we will know when they are talking to each other. Take away their privacy and maybe they'll respect ours.

People with the resources (if you get my drift) can conduct passive traffic analysis, and anonymously post the results, and even post them right here.

Oh I got your drift. You're the same as the people who proclaim from their basements how everyone else should uprise against the government. They, on the other hand, will do nothing but play armchair general.

Apparently I'm only pissing into the wind with the suggestion that we defend ourselves against domestic threats...

Yes, you are when you expect everyone else to do the work for you. Get off your fat ass and stop expecting others to do all the work. Then you might see some real change happen.

Wouldn't adding random timing jitter to the packets deal with the problem without using up more network resources with junk data? As long as the timing noise distribution between routers is not grossly dissimilar, that should work.

Not really. Random jitter can be dealt with statistically: collect more data, compute the mean, and use the mean where you would have used the exact timing.

In order to defeat timing analysis through noise injection, you need to introduce a large amount of variation compared to the number of packets being sent; for any realistically-sized data transfer, this requires jitter on the order of minutes to hours.

Imagine you are a spook who has compromised a 'secure' means of communication.

Can you think of anything better to do with this then shut it down immediately? Should Bletchly park have gotten on the radio and told the Germans 'neener neener, we broke your codes you jerry morons.'?

I've been posting it ever since the tormail take down and I posted it since the silk road takedown, and so on. PRISM's metadata collection is precisely what this article is talking about: timestamped lists of what computer talked to what computer for how long with how much data.

Just because something is theoretically possible in lab conditions does not mean that anyone in the real world is actually doing it. The FBI doesn't even have the resources to do something trivial brute force an iPhone 4 digit pass-code, you think they or the NSA have the resources to do this on any kind of real scale?

Despite what urban myths are out there, the NSA uses relatively simple means to do 99% of their spying and traffic interception.

Are you kidding me? Name one 'service' on TOR that has been up for long enough to get attention and not been busted?

Based on what came out about both the SR takedowns indicate that those were not taken down by sophisticated cyberattacks using high-grade NSA traffic analysis techniques.They were taken down because the people behind those sites were bad at being criminals and operating out of the US. I'm almost sure there's several alternatives to SR that are being run out of SE Asia or the former USSR that are not being taken down because the people running them are either good at being criminals or otherwise out of the r

"So, yes, some of us are still being paranoid. But that doesn't mean that we're not right."

Spoken like a true paranoid.

Why, thank you. That's the nicest thing anybody has said to me all week.

Look, if the reality wasn't that the surveillance programs in place are far more invasive, sophisticated, and all encompassing than we've ever thought possible, I would happily be a slightly paranoid guy in the corner tilting at windmills. I'm OK with that. Everybody needs a hobby, and it's fun at parties.

The reality is, stuff which we know to be happening is far more widespread than anybody would have believed. They've demonstrated themselves willing to lie to Congress. They get funding from alternate sources which they don't always tell us about. They don't always care about the niceties of the law.

They've colluded with law enforcement to conceal their ways and means, and come up with ways to charge you and hide how they got there by writing a handbook of perjury and lying.

They can use secret laws to make it illegal to tell anybody the scope of what they're actually doing.

So, the problem becomes... when a high degree of paranoia has been demonstrated to be not nearly paranoid enough... being somewhat paranoid becomes pretty much mandatory.

And these guys have made what would have been dismissed as merely paranoid ravings only a few years ago into something which is documented and commonplace.

So, yeah, I sound paranoid. Because the people who make me paranoid have upped their game to the level where it's hard to imagine I'm being paranoid enough.

No one with a clue in their head thought this stuff was impossible 5-10 years ago. Everyone who had the slightest background knowledge in how things operate already knew and assumed it was happening. The movie Enemy of the State came out in 1998 for god's sake, and people still did not wake up as to what was possible. This stuff wasn't fiction then, and it isn't fiction now.

That doesn't change the fact that 99% of the interception the NSA does is trivial. Using Tor is still a very good idea and can save yo

While I haven't read the paper, the article seems to have a reasonably big "correlation for non-victim" bar. If this means false positives, it makes this technique at least a lot less useful than the "81%" deanonymization rate that they claim. It might make it useless for anything really.

Honestly, this all seems like more headline and less news. But I do still have to read the paper.

While I haven't read the paper, the article seems to have a reasonably big "correlation for non-victim" bar. If this means false positives, it makes this technique at least a lot less useful than the "81%" deanonymization rate that they claim. It might make it useless for anything really.

Honestly, this all seems like more headline and less news. But I do still have to read the paper.

I read it as meaning "This type of attack can deanonymize a single TOR user 81% of the time" and not "This type of attack can deanonymize 81% of ALL TOR users at the same time"

You can add a fingerprint without changing the data. One way is by timing. A 10 Mbps cable modem, for example, can send at maybe 50 Mbps for 100 milliseconds, then it stops for a 400ms to average 10 Mbps, the speed you paid for. If I want to mark a traffic flow I'm relaying, I can send the packets out in burts of 120KB, 60KB, 120KB, 60KB. Assuming a sufficiently uncongested network, that pattern will be visible several routers further down the line.

I've relayed precisely the data I was sent, I just modulated the rate at which I sent it.

This is what I tell people about using tor. It's not iron clad but it adds a lot of difficulty for people who want to collect everyones data. And even if the nsa can break it, the coffee shop can't, your isp can't, and the websites that track your every move across the web can't, at least not all of the time. And currently tor is the best way for people to voice their discontent with the surveillance state that's been forced on us in recent years. So that's better than doing nothing at all.

The whole point of tor for those who are morally and ethically sane, is that it makes monitoring the populus orders of magnitude more expensive!

Forcing NSA and their ilk to actually target people individually, instead of just passivly collecting plain text data on everyone is exactly what needs to happen!

Use Tor as much as possible, it is the only thing stopping complete internet surveillance.

What can make things even more expensive is using strong end-to-end encryption for all network connections and strong encryption for everything stored on someone else's servers. This is *mostly* feasible if you have some technical knowledge, much less so for those that don't.

Things that can aren't really there but could really help the non-technical are:
1. Easy to use, verifiable but decentralized email encryption/non-repudiation
2. Ubiquitous network connection encryption with decentralized/anonymized

Every thing can be hacked and/or de-anonymized sooner or later. What is the point in using anti-virus and firewalls, tor and the likes. Seems every thing is flawed by design.

Exactly. those windows on your house are vulnerable to rocks, so there's no point in locking your door. Safes can be cracked or blown open, so why keep your valuables in one? It's just going to get broken into, so why bother?

A peeping Tom can look in your window or drill a hole in your wall to watch you, so put up cameras everywhere in your house and broadcast the output on the Internet and a large screen TV outside your house. They're going to see it anyway, so why risk damage to the house?

Basically what they are saying is that you should not use Tor at home or at work, but in other places, where you don't do your normal browsing.

Close, but not quite ideal. You should use TOR at home to do strictly legitimate things, to create the haystack in which the needles can be hidden. Then, when you want to do something without being watched, you use TOR with clean hardware and connectivity. Also, when travelling to your clean connectivity, leave your cell phone and other tracking devices at home, and do it somewhere with lots of other people.

Then, when you want to do something without being watched, you use TOR with clean hardware and connectivity.

So what is clean? I can only think of an Ubuntu VM, default install with maybe one or two addons in Firefox to delete cookies. Nothing that changes or adds fonts. Make snapshots and always revert to that. Create new snapshots after updates. Don't update when using public wifi, but update at home while not doing anything else - no browsing!

>> when you want to do something without being watched, you use TOR with clean hardware and connectivity.

> So what is clean? I can only think of an Ubuntu VM, default install with maybe one or two addons in Firefox to delete cookies. Nothing that changes or adds fonts...

That's a fairly good version. I think it's about how extreme you want to go and how secure you feel you need to be. You could grab a fresh laptop off Craig's List and only use it for a few days. You could get a Raspberry Pi with no

It's clear that there are significant limitations to the tested identification methods. Firstly, it requires that the server endpoint be under the control of the entity attempting identification. Secondly, the TOR *entry* node being used must be identified (if you have the resources, I guess you could monitor traffic flows from *all* entry nodes) in order for the Netflow data to be compared between the Server-->Exit Node and the Entry Node-->potential target client. Thirdly, in order to generate enough traffic to have enough collected data for correlation, large (the authors' term, they do not identify the size of the file/data required, only that downloads must last ~seven minutes to collect enough data) amounts of data must be downloaded from the server.

It's an interesting piece of work, but pulling off an identification like this requires the anonymized client to both connect to a server specifically configured to generate traffic flows that can be identified, and once connected, the client must be induced to download a "large" file/dataset. What is more, those attempting the identification must also be able to gather Netflow records from the interface(s) associated with the specific (and likely unknown) TOR entry node as well, or monitor flows from *all* TOR entry nodes.

It seems to me, that while the above scenario is certainly feasible, if you can get a potential target to visit a server that's under your control and download a large file, you can probably infect the client with malware from that server, and have said malware phone home without TOR, producing a specific identification without false positives or negatives. Which would be much less resource intensive and more useful, IMHO.

I read the paper, too. While the researchers used a server, the server was not part of the TOR network. It communicated with the TOR exit node. Further, the server only "injected" timing patterns. So, it would be possible for a router, located between the server and the exit node, to inject the timing patterns. While not as clean as having the content server impose the timing patterns, it would still work.

An interesting point. Unfortunately, there's a problem with that: at the hypothetical intermediate router, how do you determine which data flow(s) should have their timing modified? If you do it to *all* the flows, that destroys the uniqueness of the pattern and hence makes identification orders of magnitude more difficult, if not impossible. The whole point of this is to create identifiable patterns that can be correlated with data flow patterns external to Tor on the *client* side.

As for length of time, this attack could be useful for tracking movie downloads - especially if the download speed was limited,

But it probably is a problem if your opponent is a state-level actor. For example, China (and the US probably too) probably monitors connections to known tor entry/exit nodes. Given the attack mentioned, someone using tor in china is safe as long as the server being contacted is known to not be acting in concert with the adversary. However, if the server (or its connection to the tor entry/exit nodes) is also under control of the same adversary, then the connection can be de-anonymized. So this is a pro

But it probably is a problem if your opponent is a state-level actor. For example, China (and the US probably too) probably monitors connections to known tor entry/exit nodes. Given the attack mentioned, someone using tor in china is safe as long as the server being contacted is known to not be acting in concert with the adversary. However, if the server (or its connection to the tor entry/exit nodes) is also under control of the same adversary, then the connection can be de-anonymized. So this is a problem for chinese bloggers blogging on chinese blogs, but not so much on foreign blogs hosted outside china. Though it appears blog traffic would probably be too small to facilitate a successful attack.

As Tor nodes are scattered around the globe, and the nodes
of circuits are selected at random, mounting a traffic analysis
attack in practice would require a powerful adversary with
the ability to monitor traffic at a multitude of autonomous
systems (AS). Murdoch and Zielinski, however, showed that
monitoring traffic at a few major Internet exchange (IX) points
could enable traffic analysis attacks to a significant part of the
Tor network [13]. Furthermo

So if you can spy on the traffic from the user to the tor entry node, and can spy on the traffic leaving the tor exit node at the same time... then you can tell that the traffic you saw going to the entry node is linked to the traffic leaving the exit node?

NO FREAKING DUH!?

Good luck being able to sniff traffic on *both* ends.

You're misunderstanding the methodology. The trick isn't to sniff the actual data being transferred and can be used even with encrypted traffic.

The way it works is that you get the target client to initiate a file transfer from a server specifically set up for this, then you modulate the data rate (2 seconds at 1Mb/sec, 5 seconds at 3Mb/sec, 5 seconds at 750kb/sec, etc., etc. in a

There is no need to be rude or presumptive about my level of education. I shall explain what I meant in more depth to clear up any misunderstandings.

OP said: "So if you can spy on the traffic from the user to the tor entry node, and can spy on the traffic leaving the tor exit node at the same time... then you can tell that the traffic you saw going to the entry node is linked to the traffic leaving the exit node"

You said: "If you can correlate the server-->exit node flow to a specific entry node--

There is no need to be rude or presumptive about my level of education. I shall explain what I meant in more depth to clear up any misunderstandings.
OP said: "So if you can spy on the traffic from the user to the tor entry node, and can spy on the traffic leaving the tor exit node at the same time... then you can tell that the traffic you saw going to the entry node is linked to the traffic leaving the exit node"
You said: "If you can correlate the server-->exit node flow to a specific entry node-->client flow, you've just identified the client outside of Tor."
Distinction Without a Difference [logicallyfallacious.com] - The assertion that a position is different from another position based on the language when, in fact, both positions are exactly the same -- at least in practice or practical terms.
Your provided links show that "packet sniffing" and "traffic flow analysis" are not different concepts in practice. The difference is in how the collected data is analyzed or for what purpose. For the purposes of this discussion where analysis of collected packets is for identical purposes, this is also a distinction without a difference. "A packet analyzer...is a computer program or a piece of computer hardware that can intercept and log traffic passing over a digital network or part of a network." "NetFlow is a feature that was introduced on Cisco routers that provides the ability to collect IP network traffic as it enters or exits an interface."
If you feel I have misinterpreted your statements, I would appreciate additional feedback.

My points were literal, rather than pejorative. Sniffing packets is gathering the *actual* packets. Netflow collects statistics about packets being transmitted/received. Do you see the difference?

GP stated "Good luck being able to sniff traffic on *both* ends." Firstly, traffic isn't being "sniffed." Secondly, With Netflow, it's not necessary to have packet sniffers on the specific links used in order to gather packet statistics.

What is more, since context is everything, GP was responding to my assessm [slashdot.org]

Your provided links show that "packet sniffing" and "traffic flow analysis" are not different concepts in practice. The difference is in how the collected data is analyzed or for what purpose.

This is an incorrect conclusion. Packet sniffing and Netflow analysis are significantly different in both theory and practice, both from the standpoint of data collected, as well as the method(s) of collection. Granted, if you are sniffing packets, you can perform a similar analysis, but that's both completely impractical (and in the context of the research) self-defeating. Attempting to sniff all packets off an IX Node [wikipedia.org] requires mirroring all packets. Which would almost certainly cause serious congesti

Distinction Without a Difference [logicallyfallacious.com] - The assertion that a position is different from another position based on the language when, in fact, both positions are exactly the same -- at least in practice or practical terms.

To clarify once again. The distinctions drawn are not based on nomenclature. There are specific and important technical differences which have real impact on the discussion.

As I read your post again, I'm sorely tempted to respond in kind. However, I understand that you thought I was assigning ignorance of this particular area of knowledge to you as an insult (although you did do so in your original reply -- note that I simply repeated what you said first), rather than as a simple statement of fact. In

Distinction Without a Difference [logicallyfallacious.com] - The assertion that a position is different from another position based on the language when, in fact, both positions are exactly the same -- at least in practice or practical terms.

To clarify once again. The distinctions drawn are not based on nomenclature. There are specific and important technical differences which have real impact on the discussion.

As I read your post again, I'm sorely tempted to respond in kind. However, I understand that you thought I was assigning ignorance of this particular area of knowledge to you as an insult (although you did do so in your original reply -- note that I simply repeated what you said first), rather than as a simple statement of fact. In your position, I would likely have responded similarly.

My apologies. I mis-stated both what you and I posted. The above paragraph should read:

As I read your post again, I'm sorely tempted to respond in kind. However, I understand that you thought I was assigning ignorance of this particular area of knowledge to you as an insult, rather than as a simple statement of fact. In your position, I would likely have responded similarly.

Where do people get the idea that privacy is some sort of inalienable right? I'll agree that it's a civic courtesy, and certainly it's impolite to disregard another person's privacy, but to that end, I see it as more of a social contract than any sort of actual right. I would suggest that any appearance of privacy we might seem to have is actually just an illusion offered by the fact that other people are either making a deliberate choice to be polite in that regard, or else they are simply not interested

The right of the people to be secure in their persons, houses, papers, and effects,[a] against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

Where do people get the idea that privacy is some sort of inalienable right? I'll agree that it's a civic courtesy, and certainly it's impolite to disregard another person's privacy, but to that end, I see it as more of a social contract than any sort of actual right. I would suggest that any appearance of privacy we might seem to have is actually just an illusion offered by the fact that other people are either making a deliberate choice to be polite in that regard, or else they are simply not interested enough in what we think is private for others to be bothered with it. Either way, it's not something that you can actually control... its largely determined by what other people do or want.

I don't know. I'm a private person, but not a secretive one. I don't mind sharing personal information with the folks I want to share with. I feel it's incumbent on me to keep things to myself. That may include encryption or access controls or just keeping my mouth shut.

Yes, there are those out there who want to know all about everyone, for their purposes. That doesn't mean I have to roll over and give it all up to anyone who wants it. If I take steps to protect data, ideas, information or anything el

Just because I don't think people should really have any expectation of privacy at any time doesn't mean I think people should not have any right to do whatever is in within their own personal power and ability to directly control to preserve whatever privacy they feel they might be able to secure for themselves, to the extent that such efforts do not infringe on anyone else's freedoms or rights.

I'm not suggesting if you haven't done anything wrong you have nothing to hide, because that's actually a completely misleading argument that can be easily shown to be a false notion anyways.

Privacy, as I said, is created by two things, neither of which one is really in direct control of. The first thing is how polite other people are making a deliberate choice to be... invading someone else's privacy, for any reason, almost invariably amounts to

From the Universal Declaration of Human Rights, as adopted by the General Assembly of the United Nations on December 10, 1948. Article 12: Right to Privacy...

No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.

It's one of our fundamental human rights, right up there with other inconvenient courtesies such as right to life, freedom from slavery, freedom from arbitrary detention, freedom from torture, right to asylum, and freedom of thought and religion. Everyone should know their rights. If you don't know your rights, you won't know what you risk losing.

Except that society seems to function perfectly fine, even if not necessarily ideally, without everyone following the golden rule everywhere... which is what any kind of ubiquitous expectation of privacy actually generalizes to.

I'm pretty sure reddit probably through google analytics may have started doing this around eighteen months ago. I tested trolling them with sock puppets and they could identify my house through tor but could not differentiate between individual computers in the house. So pretty much anybody that uses google analytics probably has this capability.

I'm not surprised. I wrote a paper back in 2003, Techniques for Cyber Attack Attribution [dtic.mil], that listed a LONG list of ways to do attribution. This sounds a like a variant combining "modify transmitted messages" and "matching streams" via timing (see the paper).