GCHQ documents use Angry Birds – reportedly downloaded more than 1.7bn times – as a case study for app data collection.

The National Security Agency and its UK counterpart GCHQ have been developing capabilities to take advantage of “leaky” smartphone apps, such as the wildly popular Angry Birds game, that transmit users’ private information across the internet, according to top secret documents.

The data pouring onto communication networks from the new generation of iPhone and Android apps ranges from phone model and screen size to personal details such as age, gender and location. Some apps, the documents state, can share users’ most sensitive information such as sexual orientation – and one app recorded in the material even sends specific sexual preferences such as whether or not the user may be a swinger.

Many smartphone owners will be unaware of the full extent this information is being shared across the internet, and even the most sophisticated would be unlikely to realise that all of it is available for the spy agencies to collect.

Dozens of classified documents, provided to the Guardian by whistleblower Edward Snowden and reported in partnership with the New York Times and ProPublica, detail the NSA and GCHQ efforts to piggyback on this commercial data collection for their own purposes.

Scooping up information the apps are sending about their users allows the agencies to collect large quantities of mobile phone data from their existing mass surveillance tools – such as cable taps, or from international mobile networks – rather than solely from hacking into individual mobile handsets.

Exploiting phone information and location is a high-priority effort for the intelligence agencies, as terrorists and other intelligence targets make substantial use of phones in planning and carrying out their activities, for example by using phones as triggering devices in conflict zones. The NSA has cumulatively spent more than $1bn in its phone targeting efforts.

The disclosures also reveal how much the shift towards smartphone browsing could benefit spy agencies’ collection efforts.

A May 2010 NSA slide on the agency’s ‘perfect scenario’ for obtaining data from mobile apps. Photograph: Guardian

One slide from a May 2010 NSA presentation on getting data from smartphones – breathlessly titled “Golden Nugget!” – sets out the agency’s “perfect scenario”: “Target uploading photo to a social media site taken with a mobile device. What can we get?”

The question is answered in the notes to the slide: from that event alone, the agency said it could obtain a “possible image”, email selector, phone, buddy lists, and “a host of other social working data as well as location”.

In practice, most major social media sites, such as Facebook and Twitter, strip photos of identifying location metadata (known as EXIF data) before publication. However, depending on when this is done during upload, such data may still, briefly, be available for collection by the agencies as it travels across the networks.

Depending on what profile information a user had supplied, the documents suggested, the agency would be able to collect almost every key detail of a user’s life: including home country, current location (through geolocation), age, gender, zip code, martial status – options included “single”, “married”, “divorced”, “swinger” and more – income, ethnicity, sexual orientation, education level, and number of children.

The agencies also made use of their mobile interception capabilities to collect location information in bulk, from Google and other mapping apps. One basic effort by GCHQ and the NSA was to build a database geolocating every mobile phone mast in the world – meaning that just by taking tower ID from a handset, location information could be gleaned.

A more sophisticated effort, though, relied on intercepting Google Maps queries made on smartphones, and using them to collect large volumes of location information.

So successful was this effort that one 2008 document noted that “[i]t effectively means that anyone using Google Maps on a smartphone is working in support of a GCHQ system.”

The information generated by each app is chosen by its developers, or by the company that delivers an app’s adverts. The documents do not detail whether the agencies actually collect the potentially sensitive details some apps are capable of storing or transmitting, but any such information would likely qualify as content, rather than metadata.

Data collected from smartphone apps is subject to the same laws and minimisation procedures as all other NSA activity – procedures that the US president, Barack Obama, suggested may be subject to reform in a speech 10 days ago. But the president focused largely on the NSA’s collection of the metadata from US phone calls and made no mention in his address of the large amounts of data the agency collects from smartphone apps.

The latest disclosures could also add to mounting public concern about how the technology sector collects and uses information, especially for those outside the US, who enjoy fewer privacy protections than Americans. A January poll for the Washington Post showed 69% of US adults were already concerned about how tech companies such as Google used and stored their information.

The documents do not make it clear how much of the information that can be taken from apps is routinely collected, stored or searched, nor how many users may be affected. The NSA says it does not target Americans and its capabilities are deployed only against “valid foreign intelligence targets”.

The documents do set out in great detail exactly how much information can be collected from widely popular apps. One document held on GCHQ’s internal Wikipedia-style guide for staff details what can be collected from different apps. Though it uses Android apps for most of its examples, it suggests much of the same data could be taken from equivalent apps on iPhone or other platforms.

The GCHQ documents set out examples of what information can be extracted from different ad platforms, using perhaps the most popular mobile phone game of all time, Angry Birds – which has reportedly been downloaded more than 1.7bn times – as a case study.

From some app platforms, relatively limited, but identifying, information such as exact handset model, the unique ID of the handset, software version, and similar details are all that are transmitted.

Other apps choose to transmit much more data, meaning the agency could potentially net far more. One mobile ad platform, Millennial Media, appeared to offer particularly rich information. Millennial Media’s website states it has partnered with Rovio on a special edition of Angry Birds; with Farmville maker Zynga; with Call of Duty developer Activision, and many other major franchises.

Rovio, the maker of Angry Birds, said it had no knowledge of any NSA or GCHQ programs looking to extract data from its apps users.

“Rovio doesn’t have any previous knowledge of this matter, and have not been aware of such activity in 3rd party advertising networks,” said Saara Bergström, Rovio’s VP of marketing and communications. “Nor do we have any involvement with the organizations you mentioned [NSA and GCHQ].”

Millennial Media did not respond to a request for comment.

In December, the Washington Post reported on how the NSA could make use of advertising tracking files generated through normal internet browsing – known as cookies – from Google and others to get information on potential targets.

However, the richer personal data available to many apps, coupled with real-time geolocation, and the uniquely identifying handset information many apps transmit give the agencies a far richer data source than conventional web-tracking cookies.

Almost every major website uses cookies to serve targeted advertising and content, as well as streamline the experience for the user, for example by managing logins. One GCHQ document from 2010 notes that cookie data – which generally qualifies as metadata – has become just as important to the spies. In fact, the agencies were sweeping it up in such high volumes that their were struggling to store it.

“They are gathered in bulk, and are currently our single largest type of events,” the document stated.

The ability to obtain targeted intelligence by hacking individual handsets has been well documented, both through several years of hacker conferences and previous NSA disclosures in Der Spiegel, and both the NSA and GCHQ have extensive tools ready to deploy against iPhone, Android and other phone platforms.

GCHQ’s targeted tools against individual smartphones are named after characters in the TV series The Smurfs. An ability to make the phone’s microphone ‘hot’, to listen in to conversations, is named “Nosey Smurf”. High-precision geolocation is called “Tracker Smurf”, power management – an ability to stealthily activate an a phone that is apparently turned off – is “Dreamy Smurf”, while the spyware’s self-hiding capabilities are codenamed “Paranoid Smurf”.

Those capability names are set out in a much broader 2010 presentation that sheds light on spy agencies’ aspirations for mobile phone interception, and that less-documented mass-collection abilities.

The cover sheet of the document sets out the team’s aspirations:

The cover slide for a May 2010 GCHQ presentation on mobile phone data interception. Photograph: Guardian

Another slide details weak spots in where data flows from mobile phone network providers to the wider internet, where the agency attempts to intercept communications. These are locations either within a particular network, or international roaming exchanges (known as GRXs), where data from travellers roaming outside their home country is routed.

While GCHQ uses Android apps for most of its examples, it suggests much of the same data could be taken from iPhone apps. Photograph: GuardianGCHQ’s targeted tools against individual smartphones are named after characters in the TV series The Smurfs. Photograph: Guardian

These are particularly useful to the agency as data is often only weakly encrypted on such networks, and includes extra information such as handset ID or mobile number – much stronger target identifiers than usual IP addresses or similar information left behind when PCs and laptops browse the internet.

The NSA said its phone interception techniques are only used against valid targets, and are subject to stringent legal safeguards.

“The communications of people who are not valid foreign intelligence targets are not of interest to the National Security Agency,” said a spokeswoman in a statement.

“Any implication that NSA’s foreign intelligence collection is focused on the smartphone or social media communications of everyday Americans is not true. Moreover, NSA does not profile everyday Americans as it carries out its foreign intelligence mission. We collect only those communications that we are authorized by law to collect for valid foreign intelligence and counterintelligence purposes – regardless of the technical means used by the targets.

“Because some data of US persons may at times be incidentally collected in NSA’s lawful foreign intelligence mission, privacy protections for US persons exist across the entire process concerning the use, handling, retention, and dissemination of data. In addition, NSA actively works to remove extraneous data, to include that of innocent foreign citizens, as early as possible in the process.

“Continuous and selective publication of specific techniques and tools lawfully used by NSA to pursue legitimate foreign intelligence targets is detrimental to the security of the United States and our allies – and places at risk those we are sworn to protect.”

The NSA declined to respond to a series of queries on how routinely capabilities against apps were deployed, or on the specific minimisation procedures used to prevent US citizens’ information being stored through such measures.

GCHQ declined to comment on any of its specific programs, but stressed all of its activities were proportional and complied with UK law.

“It is a longstanding policy that we do not comment on intelligence matters,” said a spokesman.

“Furthermore, all of GCHQ’s work is carried out in accordance with a strict legal and policy framework that ensures that our activities are authorised, necessary and proportionate, and that there is rigorous oversight, including from the Secretary of State, the Interception and Intelligence Services Commissioners and the Parliamentary Intelligence and Security Committee. All our operational processes rigorously support this position.”

• A separate disclosure on Wednesday, published by Glenn Greenwald and NBC News, gave examples of how GCHQ was making use of its cable-tapping capabilities to monitor YouTube and social media traffic in real-time.

GCHQ’s cable-tapping and internet buffering capabilities , codenamed Tempora, were disclosed by the Guardian in June, but the new documents published by NBC from a GCHQ presentation titled “Psychology: A New Kind of SIGDEV” set out a program codenamed Squeaky Dolphin which gave the British spies “broad real-time monitoring” of “YouTube Video Views”, “URLs ‘Liked’ on Facebook” and “Blogspot/Blogger Visits”.

A further slide noted that “passive” – a term for large-scale surveillance through cable intercepts – give the agency “scalability”.

The means of interception mean GCHQ and NSA could obtain data without any knowledge or co-operation from the technology companies. Spokespeople for the NSA and GCHQ told NBC all programs were carried out in accordance with US and UK law.

When a smartphone user opens Angry Birds, the popular game application, and starts slinging birds at chortling green pigs, spies may be lurking in the background to snatch data revealing the player’s location, age, sex and other personal information, according to secret British intelligence documents.

In their globe-spanning surveillance for terrorism suspects and other targets, the National Security Agency and its British counterpart have been trying to exploit a basic byproduct of modern telecommunications: With each new generation of mobile phone technology, ever greater amounts of personal data pour onto networks where spies can pick it up.

According to dozens of previously undisclosed classified documents, among the most valuable of those unintended intelligence tools are so-called leaky apps that spew everything from users’ smartphone identification codes to where they have been that day.

The N.S.A. and Britain’s Government Communications Headquarters were working together on how to collect and store data from dozens of smartphone apps by 2007, according to the documents, provided by Edward J. Snowden, the former N.S.A. contractor. Since then, the agencies have traded recipes for grabbing location and planning data when a target uses Google Maps, and for vacuuming up address books, buddy lists, phone logs and the geographic data embedded in photos when someone sends a post to the mobile versions of Facebook, Flickr, LinkedIn, Twitter and other services.

The eavesdroppers’ pursuit of mobile networks has been outlined in earlier reports, but the secret documents, shared by The New York Times, The Guardian and ProPublica, offer far more details of their ambitions for smartphones and the apps that run on them. The efforts were part of an initiative called “the mobile surge,” according to a 2011 British document, an analogy to the troop surges in Iraq and Afghanistan. One N.S.A. analyst’s enthusiasm was evident in the breathless title — “Golden Nugget!” — given to one slide for a top-secret 2010 talk describing iPhones and Android phones as rich resources, one document notes.

The scale and the specifics of the data haul are not clear. The documents show that the N.S.A. and the British agency routinely obtain information from certain apps, particularly some of those introduced earliest to cellphones. With some newer apps, including Angry Birds, the agencies have a similar capability, the documents show, but they do not make explicit whether the spies have put that into practice. Some personal data, developed in profiles by advertising companies, could be particularly sensitive: A secret 2012 British intelligence document says that spies can scrub smartphone apps that contain details like a user’s “political alignment” and sexual orientation.

President Obama announced new restrictions this month to better protect the privacy of ordinary Americans and foreigners from government surveillance, including limits on how the N.S.A. can view “metadata” of Americans’ phone calls — the routing information, time stamps and other data associated with calls. But he did not address the avalanche of information that the intelligence agencies get from leaky apps and other smartphone functions.

And while he expressed concern about advertising companies that collect information on people to send tailored ads to their mobile phones, he offered no hint that American spies routinely seize that data. Nothing in the secret reports indicates that the companies cooperate with the spy agencies to share the information; the topic is not addressed.

The agencies have long been intercepting earlier generations of cellphone traffic like text messages and metadata from nearly every segment of the mobile network — and, more recently, computer traffic running on Internet pipelines. Because those same networks carry the rush of data from leaky apps, the agencies have a ready-made way to collect and store this new resource. The documents do not address how many users might be affected, whether they include Americans, or how often, with so much information collected automatically, analysts would see personal data.

“N.S.A. does not profile everyday Americans as it carries out its foreign intelligence mission,” the agency said in a written response to questions about the program. “Because some data of U.S. persons may at times be incidentally collected in N.S.A.’s lawful foreign intelligence mission, privacy protections for U.S. persons exist across the entire process.” Similar protections, the agency said, are in place for “innocent foreign citizens.”

The British spy agency declined to comment on any specific program, but said all its activities complied with British law.

Two top-secret flow charts produced by the British agency in 2012 show incoming streams of information skimmed from smartphone traffic by the Americans and the British. The streams are divided into “traditional telephony” — metadata — and others marked “social apps,” “geo apps,” “http linking,” webmail, MMS and traffic associated with mobile ads, among others. (MMS refers to the mobile system for sending pictures and other multimedia, and http is the protocol for linking to websites.)

In charts showing how information flows from smartphones into the agency’s computers, analysts included questions to be answered by the data, including “Where was my target when they did this?” and “Where is my target going?”

As the program accelerated, the N.S.A. nearly quadrupled its budget in a single year, to $767 million in 2007 from $204 million, according to a top-secret Canadian analysis written around the same time.

Even sophisticated users are often unaware of how smartphones offer a unique opportunity for one-stop shopping for information about them. “By having these devices in our pockets and using them more and more,” said Philippe Langlois, who has studied the vulnerabilities of mobile phone networks and is the founder of the Paris-based company Priority One Security, “you’re somehow becoming a sensor for the world intelligence community.”

Detailed Profiles

Smartphones almost seem to make things too easy. Functioning as phones — making calls and sending texts — and as computers — surfing the web and sending emails — they generate and also rely on data. One secret report shows that just by updating Android software, a user sent more than 500 lines of data about the phone’s history and use onto the network.

Such information helps mobile ad companies, for example, create detailed profiles of people based on how they use their mobile device, where they travel, what apps and websites they open, and other factors. Advertising firms might triangulate web shopping data and browsing history to guess whether someone is wealthy or has children, for example.

The N.S.A. and the British agency busily scoop up this data, mining it for new information and comparing it with their lists of intelligence targets.

One secret 2010 British document suggests that the agencies collect such a huge volume of “cookies” — the digital traces left on a mobile device or a computer when a target visits a website — that classified computers were having trouble storing it all.

“They are gathered in bulk, and are currently our single largest type of events,” the document says.

The two agencies displayed a particular interest in Google Maps, which is accurate to within a few yards or better in some locations. Intelligence agencies collect so much data from the app that “you’ll be able to clone Google’s database” of global searches for directions, according to a top-secret N.S.A. report from 2007.

“It effectively means that anyone using Google Maps on a smartphone is working in support of a G.C.H.Q. system,” a secret 2008 report by the British agency says.

(In December, The Washington Post, citing the Snowden documents, reported that the N.S.A. was using metadata to track cellphone locations outside the United States and was using ad cookies to connect Internet addresses with physical locations.)

In another example, a secret 20-page British report dated 2012 includes the computer code needed for plucking the profiles generated when Android users play Angry Birds. The app was created by Rovio Entertainment, of Finland, and has been downloaded more than a billion times, the company has said.

Rovio drew public criticism in 2012 when researchers claimed that the app was tracking users’ locations and gathering other data and passing it to mobile ad companies. In a statement on its website, Rovio says that it may collect its users’ personal data, but that it abides by some restrictions. For example, the statement says, “Rovio does not knowingly collect personal information from children under 13 years of age.”

The secret report noted that the profiles vary depending on which of the ad companies — which include Burstly and Google’s ad services, two of the largest online advertising businesses — compiles them. Most profiles contain a string of characters that identifies the phone, along with basic data on the user like age, sex and location. One profile notes whether the user is currently listening to music or making a call, and another has an entry for household income.

Google declined to comment for this article, and Burstly did not respond to multiple requests for comment. Saara Bergstrom, a Rovio spokeswoman, said that the company had no knowledge of the intelligence programs. “Nor do we have any involvement with the organizations you mentioned,” Ms. Bergstrom said, referring to the N.S.A. and the British spy agency.

Another ad company creates far more intrusive profiles that the agencies can retrieve, the report says. The apps that generate those profiles are not identified, but the company is named as Millennial Media, which has its headquarters in Baltimore.

In securities filings, Millennial documented how it began working with Rovio in 2011 to embed ad services in Angry Birds apps running on iPhones, Android phones and other devices.

According to the report, the Millennial profiles contain much of the same information as the others, but several categories listed as “optional,” including ethnicity, marital status and sexual orientation, suggest that much wider sweeps of personal data may take place.

Millennial Media declined to comment for this article.

Possible categories for marital status, the secret report says, include single, married, divorced, engaged and “swinger”; those for sexual orientation are straight, gay, bisexual and “not sure.” It is unclear whether the “not sure” category exists because so many phone apps are used by children, or because insufficient data may be available.

There is no explanation of precisely how the ad company defined the categories, whether users volunteered the information, or whether the company inferred it by other means. Nor is there any discussion of why all that information would be useful for marketing — or intelligence.

Unwieldy Heaps

The agencies have had occasional success — at least by their own reckoning — when they start with something closer to a traditional investigative tip or lead. The spies say that tracking smartphone traffic helped break up a bomb plot by Al Qaeda in Germany in 2007, and the N.S.A. bragged that to crack the plot, it wove together mobile data with emails, log-ins and web traffic. Similarly, mining smartphone data helped lead to arrests of members of a drug cartel hit squad for the 2010 murder of an employee of an American Consulate in Mexico.

But the data, whose volume is soaring as mobile devices have begun to dominate the technological landscape, is a crushing amount of information for the spies to sift through. As smartphone data builds up in N.S.A. and British databases, the agencies sometimes seem a bit at a loss on what to do with it all, the documents show. A few isolated experiments provide hints as to how unwieldy it can be.

In 2009, the American and British spy agencies each undertook a brute-force analysis of a tiny sliver of their cellphone databases. Crunching just one month of N.S.A. cellphone data, a secret report said, required 120 computers and turned up 8,615,650 “actors” — apparently callers of interest. A similar run using three months of British data came up with 24,760,289 actors.

“Not necessarily straightforward,” the report said of the analysis. The agencies’ extensive computer operations had trouble sorting through the slice of data. Analysts were “dealing with immaturity,” the report said, encountering computer memory and processing problems. The report made no mention of anything suspicious in the enormous lumps of data.