Further Reading

The answer, perhaps unsurprisingly, is: yes. It’s easy to do, and it’s revealing about what I do, when I do it, and where I go.

Like many other websites, Ars Technica employs a system of voluntary user logins. These logins allow you to do things like leave comments at the bottom of every story and engage in our user forums. Each time you log in to Ars, we record the date, time, and IP address that you logged in from. This is a common practice: nearly every website maintains similar records. Typically though, Ars only keeps one record per user of the last date, time, and IP address used. We do not keep any historical records of login data.

However, Ars lead developer Lee Aylward was kind enough to make an exception—me. For 11 days in February 2014, Ars tracked all of my logins. The working theory was that since I’m telling Ars who I am (my login name is the frequently used and obvious “cfarivar”) and loading the site multiple times per day, my logins would actually give Ars a clear idea of my actions and movements.

In turn, I sent this 11-day log along to Nicholas Weaver, a computer security researcher at the International Computer Science Institute based in Berkeley, California. It took Weaver just a short amount of time to write a Python script that converted the raw CSV data file (including Unix time notation). It would start with a line like this:

That means Ars showed I was editing a particular story for about three hours on the morning of February 14, and I was connected likely through Private Internet Access (PIA), the commercial VPN that I frequently use. Normally, for privacy reasons, I use PIA to obscure my tracks online. While I tried leaving it off for the purposes of this experiment, sometimes I left it on by accident. That turned out to be useful, allowing us to see what it looks like when online origins are obscured.

Home is where the data is

Looking at the raw data and the cleaned-up script on my own, there were a few things that seemed obvious: first, it showed when I started and ended my work day. Some days, I was logged into Ars as early as 4:14am (February 13) and was active as late as 9:30pm (February 16). But generally speaking, I was consistently online by about 7am and ended around 5pm. There was then a few hours' gap (I knew this was for dinner) and sometimes a check-in again before calling it a night.

Further Reading

NSA had reverse-engineered many of Google's and Yahoo's inner workings.

Second, the data showed physical places that I knew I visited in the Bay Area: a particular San Francisco office building, an Oakland café, and the University of California, Berkeley, campus.

But Weaver’s analysis was far cleverer than I expected.

“I assumed you worked at home, because you had a residential Comcast IP address,” Weaver said. (He’s right: like nearly all of us at Ars, I work primarily at home.)

I didn’t realize that Comcast distinguishes its IP information in the hostname of business versus residential accounts. Anything that shows up as comcast.net is a residence, while anything else that shows up at comcastbusiness.net is likely a business. (Of course, anyone can sign up for a “Business-class” account at home, like Ars editor Lee Hutchinson, but most people don’t go that route.)

Apparently, the original CSV file he used also contained URL information for which article I was viewing. “I knew what you were reading,” Weaver added. “That tells me what article you were working on, if you're reading old stuff it means you're looking for links.”

(Again he's right. If I’m pulling up the last three stories I wrote about Bitcoin, there’s a high likelihood that I was working on a new story on Bitcoin.)

“When VPN was active I could see that you were active, but not where,” he said.

"I am person X at this location."

The precision of the IP addresses was surprising.

Further Reading

Stanford research shows even when offering up metadata, it's very revealing.

In one instance, on Thursday February 6, at 9:30am, I was logged in at a particular San Francisco IP address. Looking up that IP on myip.ms turned up not only the city, but one of two possible street addresses as well. The search was again correct: on that particular day at that particular hour, I was conducting an interview with Boxbee CEO Kristoph Matthews at The Hatchery, a co-working space and startup incubator at 645 Harrison Street, in San Francisco’s South of Market district.

Weaver explained that a stronger and more persistent adversary, like the NSA, would have a much longer-term and comprehensive data set. Data sets like that would include information from plenty of sites beyond Ars.

“Facebook knows if you hit any page that has a Like button on it,” he said. “Same with TweetThis, unless the site goes out of the way to mask them, then these are specifically reporting them to social networks. This is why NSA loves it, is because they can go along for the ride.

“One thing that we know that the NSA does on their non-US wiretaps is bind usernames to cookies, so if you see a request for LinkedIn or YouTube or Yahoo, these are all sites that have user ID in the clear. All you need to do is see a request, and say I don't know who this is or I know who this is, but then you look at the HTML body and look for the username. This is why the NSA went after Google ad networks; they include user identification [broadcast] in the clear: ‘I am person X at this location.’”

Despite the vast amount of data, it's just as easy to store as it is to interpret. “It works out to only a few kilobytes per person for everyone on the planet,” Weaver added. In other words, if I had the access, it'd cost just a few thousand dollars to have enough consumer-grade storage to keep data on everyone in the United States. It would comfortably fit on my desk.

Metadata is surveillance

There was good news from this exercise. Mainly, the digital obfuscatory tools I normally run did help mask my online trail.

Further Reading

Government can still get numbers from phone company data, two hops out.

Generally speaking, I run all kinds of anti-tracking software on my browser: constant private mode, Ghostery, Disconnect, and my VPN. (I also have Tor and use it occasionally. Though the VPN, of course, concealed my location but did not conceal my activity. I was still clearly logged into Ars.) And Weaver said, yes, these tools do help to thwart tracking to some degree.

“The biggest reason why the NSA thinks Tor stinks is that it's actually really hard to link user activity to people,” he said. “Because the [Tor browser] bundle operates [by default] not storing cookies and [doesn’t allow Flash]. The browser bundle is allowed to not have linkages across sessions. Every time you exit the tor browser it looks like a new user. Normal browsers are not set with clear all cookies. The real fault lies in the architecture of the Web. The Web is designed [to allow the] business model of tracking. If you have your browser set to clear cookies every time you quit, it really helps. Tor is overkill; your single hop VPN is still bouncing all over the place.”

As many privacy activists and security researchers have long noted, free products turn the customers into products. Google and Facebook are some of the biggest companies that make billions of dollars by tracking their users' behavior and selling ads against that behavior. But even my work account would have the potential for data mining.

“[Your Ars log] didn't tell me anything new about your site, but it does tell me about your workflow. It tells me where you go and when you're active,” Weaver concluded. “This is why everybody says metadata is surveillance.”

Promoted Comments

I don't see how anyone can justify metadata as being somehow less than the data it is associated with. It's more, much more.

I can't readily use the content of your HTTP sessions to work out when you send them, from where and what site to and frankly, it's probably not interesting to me as a snoop. Likewise the content of your phone calls isn't interesting and is simply cumbersome nonsense. I want to know about them, not their content.

The important bit is indeed the metadata, the content is just noise. Where from? Who to? When? For how long? I can build up an entire picture of your life. Your search queries, another metadata term, are also very important. I can work out those little things about you that make you unique, once I have those, you're so much easier to trace.

Through metadata a full picture of your life emerges. I can infer your doctor, and probably your medical conditions. I can work out your routine, and set alerts if you deviate from it. I can tie you in with your social contacts from SMS and phone metadata, as well as rank them in order of importance. 20 minutes talking to a wedding planner service? Congratulations. 45 minutes on the phone to an employment lawyer? Guess things aren't working out too well at work.

I can run a PageRank-like algorithm over your phone records and everyone who you called, and their contacts, and get my own database of everything that makes you, you.

I may never get your name, but I will know everything about you, your job, what's happening in your life, and all your friends, acquaintances and colleagues.

This article is a good first step. I'd like to see more in-depth information in the future, as well as how effective the tools you mentioned are for a spectrum of users (grandma, the tyke, the teenager, Paranoid Polly, dad, etc.)

Another day, another article confirming that my NSA paranoia is actually a perfectly sane self-preservation reflex...

Well, to be fair, pretty much anyone can track you via your metadata.

Not quite. If SSL is active, then basically only the websites you visit can track you, and also with sites like Ars, any ad network they hand all your metadata off to. The NSA can only really track IP addresses when SSL is active, and since many users can share an IP that isn't very useful (eg: Internet cafe or VPN).

Maybe a good idea would be some kind of industry standard privacy policy, that clearly defines how much data is gathered - and in particular how quickly it will be deleted. And it could also include a statement that you have not knowingly handed data to law enforcement in the last 12 months.

Having this privacy policy in place gives you the right to use a trademarked logo on your site.

You also pay a small membership fee, which is used to enforce the trademark and lobby government.

We did it with Creative Commons, how about Privacy Commons? There could be different grades of privacy policy too (eg: one for services that encrypt all your data with client side keys, and one for services that are regularly targeted by search warrants but are otherwise doing the right thing).

It's impossible to prevent websites from being able to track you. But we can stop third party tracking and we can build a network of websites who can be trusted not to do much tracking.

This article is a good first step. I'd like to see more in-depth information in the future, as well as how effective the tools you mentioned are for a spectrum of users (grandma, the tyke, the teenager, Paranoid Polly, dad, etc.)

Keep these stories coming!

Yes! I agree that ease of use for the appliance operators (general public) is the most important task for any solutions to gain acceptance. Something akin to leading a horse to water.

Another day, another article confirming that my NSA paranoia is actually a perfectly sane self-preservation reflex...

Well, to be fair, pretty much anyone can track you via your metadata.

Not quite. If SSL is active, then basically only the websites you visit can track you, and also with sites like Ars, any ad network they hand all your metadata off to. The NSA can only really track IP addresses when SSL is active, and since many users can share an IP that isn't very useful (eg: Internet cafe or VPN).

I don't see how anyone can justify metadata as being somehow less than the data it is associated with. It's more, much more.

I can't readily use the content of your HTTP sessions to work out when you send them, from where and what site to and frankly, it's probably not interesting to me as a snoop. Likewise the content of your phone calls isn't interesting and is simply cumbersome nonsense. I want to know about them, not their content.

The important bit is indeed the metadata, the content is just noise. Where from? Who to? When? For how long? I can build up an entire picture of your life. Your search queries, another metadata term, are also very important. I can work out those little things about you that make you unique, once I have those, you're so much easier to trace.

Through metadata a full picture of your life emerges. I can infer your doctor, and probably your medical conditions. I can work out your routine, and set alerts if you deviate from it. I can tie you in with your social contacts from SMS and phone metadata, as well as rank them in order of importance. 20 minutes talking to a wedding planner service? Congratulations. 45 minutes on the phone to an employment lawyer? Guess things aren't working out too well at work.

I can run a PageRank-like algorithm over your phone records and everyone who you called, and their contacts, and get my own database of everything that makes you, you.

I may never get your name, but I will know everything about you, your job, what's happening in your life, and all your friends, acquaintances and colleagues.

Here are some possibly relevant excerpts from a four page article printed off years ago by a law student I know. The citation is no longer available online. My take on this situation is that the NSA overreach is a top down problem. In the post 911 environment there is a power grab occurring which has been enabled by the last two executives. I have requested the Baltimore Sun to restore it. I have had no response after 3 days.

"WASHINGTON -- The National Security Agency developed a pilot program in the late 1990s that would have enabled it to gather and analyze massive amounts of communications data without running afoul of privacy laws. But after the Sept. 11 attacks, it shelved the project -- not because it failed to work -- but because of bureaucratic infighting and a sudden White House expansion of the agency's surveillance powers, according to several intelligence officials."

"The program the NSA rejected, called ThinThread, was developed to handle greater volumes of information, partly in expectation of threats surrounding the millennium celebrations.""By 1999, as some NSA officials grew increasingly concerned about millennium-related security, ThinThread seemed in position to become an important tool with which the NSA could prevent terrorist attacks. But it was never launched. Neither was it put into effect after the attacks in 2001. Despite its success in tests, ThinThread's information-sorting system was viewed by some in the agency as a competitor to Trailblazer, a $1.2 billion program that was being developed with similar goals. The NSA was committed to Trailblazer, which later ran into trouble and has been essentially abandoned."

"In what intelligence experts describe as rigorous testing of ThinThread in 1998, the project succeeded at each task with high marks. For example, its ability to sort through massive amounts of data to find threat-related communications far surpassed the existing system, sources said. It also was able to rapidly separate and encrypt U.S.-related communications to ensure privacy."

"But the NSA, then headed by Air Force Gen. Michael V. Hayden, opted against both of those tools, as well as the feature that monitored potential abuse of the records."

Another day, another article confirming that my NSA paranoia is actually a perfectly sane self-preservation reflex...

Well, to be fair, pretty much anyone can track you via your metadata.

Not quite. If SSL is active, then basically only the websites you visit can track you, and also with sites like Ars, any ad network they hand all your metadata off to. The NSA can only really track IP addresses when SSL is active, and since many users can share an IP that isn't very useful (eg: Internet cafe or VPN).

Sort of, kind of, half effort.

For example, do you routinely empty the pictures off your cell phone?

Yes. My oldest photo is a few months old, and I have deleted many individual pictures taken since. Flickr is the only online service with photos of mine, and I only upload photos of special activities, like the time I took a 3 day motorcycle ride with my dad through a remote part of Australia.

Thank you for doing this. It is hard to explain to people sometimes why meta data = surveillance and this is a great article to do just that. Now, if I could just get people (my family) to care...

This seems to be the problem when I hear people I know mention the Snowden files. 'Big whoop, I have nothing to hide, I'm a good citizen.' They seem to not understand/know what it really means, or just plain don't care. Having a conversation with them about it is almost pointless

I think the only way to get the point across is either articles like this in the news, or for someone to track them and publicly post the results and see how they feel about it then (not saying that that should be done, just that I think it's the only way to get it through their head how important meta data is.)

If you have browser set to clear cookies every time you quit, it really helps.

Out of curiosity, in Firefox you can configure cookies to be deleted "at the end of the session". Does someone know what Firefox considers the end of the session? Is closing a tab enough for the cookies related to the correspounding website, or is it necessary to completely close Firefox itself?

If you have browser set to clear cookies every time you quit, it really helps.

Out of curiosity, in Firefox you can configure cookies to be deleted "at the end of the session". Does someone know what Firefox considers the end of the session? Is closing a tab enough for the cookies related to the corresponding website, or is it necessary to completely close Firefox itself?

Yes, close Firefox entirely.

You can also delete individual sets of cookies in FF by selecting Tools -> Options-> Privacy-> Show Cookies, then select and Delete Cookies.

There's also History-> Clear Recent History, but the smallest time increment is the last hour, which is a bit long if you only want to wipe the last web site that you visited.

"In what intelligence experts describe as rigorous testing of ThinThread in 1998, the project succeeded at each task with high marks. For example, its ability to sort through massive amounts of data to find threat-related communications far surpassed the existing system, sources said. It also was able to rapidly separate and encrypt U.S.-related communications to ensure privacy."

"But the NSA, then headed by Air Force Gen. Michael V. Hayden, opted against both of those tools, as well as the feature that monitored potential abuse of the records."

Ah, I see exactly why NSA killed ThinThread: it had a "feature that monitored potential abuse of the records"!

A company with IPs based in Los Angeles can have their IP matched to their head office in Seattle. Guess what? Everyone looking the geodata will see Seattle as the location, not where they actually are physically.

Same is true for the IPs from Internet service providers, you can assume 80% is correct, and the rest actually is completely wrong, so geo ip tagging based on IP is incorrect in at least 20% of the cases.

This is also true for IPs that share one IP among many, many customers. ADSL, etc.

So, yes, while this works, its not accurate and it was said many times that IP does equal person or location, as there no way to relate them physically to a person or device.

If this is the way used to track someone by the NSA, its very bad and easy to cheat, not to mention extremely inaccurate.

I mean honestly if this is a concern, then every website can do this and this exists as far as the Internet exists. There is a way better way to track someone connected which I will not mention here and it does not depend on IP, or any data send by any browser.

Also for those that are worried about this, let me tell you this. Google tracks users way better than the NSA and it does not rely on something this basic.

Just look at Adwords re-branding, did you ever noticed the ads that appears from Google are always from websites you actually visited before? Google collects and matches data from several services, starting with you search, from the unique ID send by Chrome to the point you log into your Google account. They know who you are, what you do, what you searched, which websites, where you are, everything in the name of advertising.

If you are worried about the NSA, then that is the least of your worries, at least they don´t sell your data or share it with third parties.

If the NSA really wants you track you, all they have to do is hack Google or make partnership with some online company or just open their own company services on the Internet which everybody uses.

Another day, another article confirming that my NSA paranoia is actually a perfectly sane self-preservation reflex...

Well, to be fair, pretty much anyone can track you via your metadata.

Not quite. If SSL is active, then basically only the websites you visit can track you, and also with sites like Ars, any ad network they hand all your metadata off to. The NSA can only really track IP addresses when SSL is active, and since many users can share an IP that isn't very useful (eg: Internet cafe or VPN).

Here are some possibly relevant excerpts from a four page article printed off years ago by a law student I know. The citation is no longer available online. My take on this situation is that the NSA overreach is a top down problem. In the post 911 environment there is a power grab occurring which has been enabled by the last two executives. I have requested the Baltimore Sun to restore it. I have had no response after 3 days.

the Sun article might be gone, but there are literally hundreds of pages on the internet discussing Thinthread and Trailblazer. Gorman's source was a top NSA executive named Thomas Drake, who was later arrested and threatened with 30 years in prison for giving information to her, and several of his friends were raided by the FBI, including a congressional staffer. They have all appeared in the media, written articles, etc.

Thank you for doing this. It is hard to explain to people sometimes why meta data = surveillance and this is a great article to do just that. Now, if I could just get people (my family) to care...

I think the only way to get the point across is either articles like this in the news, or for someone to track them and publicly post the results and see how they feel about it then (not saying that that should be done, just that I think it's the only way to get it through their head how important meta data is.)

You could always go Godwin or Red Bait.

Point out the Nazis were big lovers of domestic surveillance, had a huge computer system built to track everyone, including the tattoos on Jews arms.

Stalin.. well. . . I've a feeling we have never yet learned of his use of information technology to build the Gulag, although it is highly interesting that one of IBMs biggest customers in the 20s and 30s was the Soviet Union.

Then of course, there was the great movie a few years back, "The Lives of Others", about the Stasi in East Germany during the Communist era.. and how they tracked everyone... even Katarina Witt was made to inform on people...

Or hey. You can go read Aristotle's The Politics, even he has a section on the tendency of Dictatorships to employ massive expenditures in spying on one's own people. (It was the White Rose society, one of Hitler's victims, who pointed this out in anonymous pamphlets they left lying around...)

There has always been an element of the US government, like almost all governments that views democracy and equal rights under the law as the threat to their power and wealth. This philosophy is probably stronger now in the US government and big business than it has been in a very long time.

Here are some possibly relevant excerpts from a four page article printed off years ago by a law student I know. The citation is no longer available online. My take on this situation is that the NSA overreach is a top down problem. In the post 911 environment there is a power grab occurring which has been enabled by the last two executives. I have requested the Baltimore Sun to restore it. I have had no response after 3 days.

the Sun article might be gone, but there are literally hundreds of pages on the internet discussing Thinthread and Trailblazer. Gorman's source was a top NSA executive named Thomas Drake, who was later arrested and threatened with 30 years in prison for giving information to her, and several of his friends were raided by the FBI, including a congressional staffer. They have all appeared in the media, written articles, etc.

If SSL is active, then basically only the websites you visit can track you, and also with sites like Ars, any ad network they hand all your metadata off to.

Or any social network with a button on the site. Or Google with it's analytics software (which Ghostery tells me is running on about three quarters of the sites I visit). Or any other company or organisation using freely available OTS tracking technology that has an affiliation with the site.

But apart from that, SSL protects you from being tracked. (Sounds like a Monty Python sketch.)

Quote:

The NSA can only really track IP addresses when SSL is active

Or they can just request the multitude of metadata Facebook/Google/etc have already collected on you. That's a much easier and less time consuming solution.

SSL protects content in transit. That's all. Ignoring the metadata that can be collected anyway, once the content arrives at either the server or your machine, it's just sitting there waiting to be collected/collated/reviewed/requested/subpoena'd/whatever.

Thank you for doing this. It is hard to explain to people sometimes why meta data = surveillance and this is a great article to do just that. Now, if I could just get people (my family) to care...

This seems to be the problem when I hear people I know mention the Snowden files. 'Big whoop, I have nothing to hide, I'm a good citizen.' They seem to not understand/know what it really means, or just plain don't care. Having a conversation with them about it is almost pointless

Yep. And when you swap "NSA" for "Google" they act as though they're pleased to be tracked.

Quote:

I think it's the only way to get it through their head how important meta data is.

Nope, cognitive dissonance doesn't work that way. Posting the personal information that's been collected on them will cause them the blame the person posting the info, not the group collecting the information (e.g. take a look at the reaction people had over Bradley Manning or Julian Assange doing exactly what you're suggesting).

For anyone who believes Google's and Facebook's pervasive surveillance are only about pushing ads in your face, I have a bridge for sale, cheap.

The reason there hasn't been a "Snowden-equivalent" from Google or Facebook yet, is that both have large financial incentives for people who work there to keep their mouths shut. Sooner or later, someone will spill the docs, it's only a matter of time.

I can't wrap my head around how people think the NSA only got up to its shenanigans recently. I recall hearing about ECHELON and it's follow ups more than a decade ago. That was basically sifting through all electronic communications with no real oversight or limits of any kind. Further back, the NSA got busted in the '70s by some fortunately half-decent senators and journalists for sifting through all incoming and outgoing telegrams. No defined scope and no oversight.

It disgusts me how easy it is to not only track someone online but to personally identify them, even when they have "private browsing" turned on. As another poster said, metadata tells you a whole hell of a lot about someone. Just ask Target and their predictive algorithms that revealed pregnancies.

I'm not particularly paranoid, but I decided a while ago to set my browser to accept, but clear all cookies at exit, which I do maybe once a day. With saved passwords, that's something I can live with. I also have multiple browser "instances" setup for specific tasks. Most of the time I'm logged out of Facebook, Twitter, LinkedIn. Google's the exception, but I have an instance just for Gmail.

I'd honestly appreciate some more less intrusive privacy features, such as clearing history more that a month old automatically, etc. Most of the privacy extensions are far too intrusive.

Between simple privacy features (and password managers) mom&dad could use, and the lack of these settings on mobile, there's still space to innovate on browsers.

I can't wrap my head around how people think the NSA only got up to its shenanigans recently. I recall hearing about ECHELON and it's follow ups more than a decade ago. That was basically sifting through all electronic communications with no real oversight or limits of any kind. Further back, the NSA got busted in the '70s by some fortunately half-decent senators and journalists for sifting through all incoming and outgoing telegrams. No defined scope and no oversight.

It disgusts me how easy it is to not only track someone online but to personally identify them, even when they have "private browsing" turned on. As another poster said, metadata tells you a whole hell of a lot about someone. Just ask Target and their predictive algorithms that revealed pregnancies.

Keyword "predict." The more I can predict you, the more I can control you.

Knowledge is power. When "they" know all about you, and you know nothing about "them," who has the power?

Why is Ars recording my IP in combination with my username? Do you use it? Can't you turn it off? Clever persons like Cyrus are already hiding behind layers, so what's the point?

Most often it is used in the forums. It can be useful for detecting sock puppet accounts, or coming up with IP bans for problem users.

As was stated in the article, we only keep the last IP that was used for a user. And each post also records the IP it was submitted from. This is all the default behavior of phpBB (our forum software), and not something we added.