What is online tracking?

Have you ever read a newspaper and noticed a stranger reading it over your shoulder? Reading the news online is like having Google, Facebook, or Twitter doing the same thing. Known as "third party trackers", these companies collect data about who you are, what you’re reading and what you’re interested in, usually without you ever knowing it. Online tracking is an integral component of the internet's business model and it plays a vital role in a larger industry which profits out of our data.

So how does online tracking work? Most websites include embedded images and code which come from the domains and servers of third party companies. These companies are able to track us through the use of cookies and other technologies which collect different pieces of information about us. Such data can include our IP address, type of computer or mobile phone, operating system and the plugins we have installed, as well as data about our online behaviour, such as the websites we visit, where and for how long our mouse lingers on a page and what we search for. Data about our device and online behaviour enables companies to link our likes and interests directly to us and to create profiles about us, which are then subsequently sold to advertisers.

Why are we being tracked? Online tracking is part of a larger industry which makes a profit out of our data. The data industry makes billions of dollars from collecting data about who we are and what we areinterested in by tracking the websites we access every day.

What is Trackography?

Trackography is an open source project of Tactical Tech that aims to increase transparency about the online data industry by illustrating who tracks us online and where our data travels to when we access websites. In particular, Trackography shows:

the companies that track us

the countries which host the servers of the websites we access

the countries which host the servers of tracking companies

the countries which host the network infrastructure required to reach the servers of websites and tracking companies

information about how some of the "globally prevailing tracking companies" handle our data based on their privacy policies

The data collected through Trackography is open and can serve as a resource for researchers, lawyers, activists, advocates, campaigners and digital security trainers who are interested in raising critical questions about third party trackers or who want to show what happens to our data online.

Why Trackography?

Tactical Tech aims to empower groups and individuals with practical ways to defend their right to privacy. To this end it is important to understand the broader data ecology, especially about who collects data, how and what is done with it.

Trackography was developed to:

increase transparency and public awareness about the global data industry

help internet users better understand how online tracking works

show how data tracking is imminent to the use of the internet

raise critical questions regarding the global data industry

motivate groups and individuals to use existing privacy enhancing and tracking circumvention tools

By detecting the specific companies which track our online activity and the geographical location of servers that our data travels to when we access websites, we hope to contribute to the discussion on unseen and unconsented data collection and on the politics of data.

Why does online tracking even matter?

Online tracking means that our online behaviour is under the microscope by parties we have not explicitly consented to.

Advertising is the default business model of the internet. Almost every single website we access is being tracked by someone, somewhere. This is enabled through the use of tracking technologies, such as cookies, by companies which make millions out of tracking, collecting, analysing, processing, aggregating and selling our data - often at the cost of our civil liberties (such as our right to privacy).

And while this might all sound harmless, we have very little control over how and when our data is collected, how our profiles are created, whether they are accurate, who they are subsequently shared with, who has access to them, what they are used for, where they are stored and for how long.

This is all part of a large industry which profits out of tracking, collecting and aggregating data with the purpose of creating individual and group profiles. Such profiles are then subsequently sold to various third parties which range from advertisers, publishers, insurance companies, pharmaceuticals, banks, communications service providers to departments of government.

Why does profiling matter?

Individual profiling can raise various types of concerns. Imagine not being able to get a bank loan because your bank has bought data about you which shows that you are an "unreliable customer". Or imagine your insurance company classifying you as someone with "risky behaviour" due to the fact that your browsing activities show that you have an interest in extreme sports. Or even worse, imagine law enforcement agencies knocking on your door because you "read too much" anarchist material online.

Group profilingcan be equally problematic and can raise concerns for societies at large. Sociologist David Lyon argues that profiling is a powerful means of creating and reinforcing long-term social differences. Research has shown that clustering data about groups can lead to social stratification and discrimination, which is reinforced by an entire data brokerage industry that operates behind the scenes. Data brokers - companies that collect, analyse and sell consumer information - enable discriminatory targeting of groups based on sensitive information like financial situation or health indicators. By selling marketing lists like "Rural and Barely Making it" or "Tough Start: Young Single Parents", data brokers are putting people into categories ("data segments") which can lead to discriminatory behaviour towards them by those who acquire such lists.

Why does it matter where our data travels to?

Online tracking means that when we connect to a website, we don't only connect to the server of that specific website. Instead, we also connect to the servers of all the other companies which are tracking our access to that website. While we might only intend on connecting to cnn.com, for example, in reality we are also connecting to the servers of at least nine additional (tracking) companies. Each of these companies handles data based on its own privacy policy and complies with the laws and regulations of the country where it is based.

Many countries around the world though do not have data protection laws. In the countries that do have privacy laws, they are not always properly enforced and/or inadequately safeguard data. The European Union is considered to have the strongest privacy frameworks globally, but even its Data Protection Directive is unable to catch up with the fast paced developments on the internet.

The architecture and business model of the internet is such which enables multiple third parties to constantly collect, process, aggregate, share, sell and store data in various countries around the world. This means that while our data might initially be collected within the EU, it might end up travelling to various other countries before it is ultimately stored in a final, non-EU country - only to then be shared again with parties located in other countries. In other words, it is practically very difficult to pinpoint the precise location of our data in every given moment, which makes its regulation and protection even harder.

Trackography highlights this problem by illustrating the specific countries that our data travels to when we access websites.

What does Trackography examine?

Online tracking: Media websites

Tactical Tech started Trackography by exploring online tracking through media websites across 38 countries around the world.

The premiss was that unlike other types of websites, online news are read by most of us everyday, regardless of our background. Third party trackers can potentially identify a lot of information about individuals based on the type of news they regularly read - such as their political beliefs, economic status, and much more - and create profiles about them.

Beyond the media: Online tracking across websites

Trackography expanded to the examination of online tracking through various other different types of websites. Such websites fall under the following sectors:

Government and Politics

Finance

Health

Society

Each of the above sectors includes further sub-categories. Websites under the financial sector, for example, cover business, jobs, e-commerce, banks and consultancy. Under the health sector, we also included the websites of insurance companies. A wide range of different types of websites are included under "society" which cover human rights, activism, LGBT rights, dating, entertainment, culture and travel. We subsequently created new lists of websites and ran our software locally in the following three Asian countries:

India

Thailand

Philippines

Students from the University of Amsterdam subsequently contributed to the project by compiling lists of websites for 17 countries in the European Union which cover the following:

Ministries

Health

Security

European Union

Through the use of VPNs, the students ran our software for each of the 17 European countries and collected results which show which third party companies track Europeans' access to these websites and where around the world their data travels to.

Main findings: Online tracking through media websites

Trackography provides a snapshot of the third party tracking in over 2,500 media websites across 38 countries at specific moments in time. Some key findings based on the data that we collected include the following:

1. The United States of America (U.S.) is the main country globally which tracks what we read online.

At least 90% of all media websites have connections which pass through the network infrastructure of the U.S. due to the following reasons:

the U.S hosts the servers of most tracking companies globally, including the servers of some of the prevailing tracking companies, such as Google, Facebook and Twitter

the U.S hosts the servers of most media websites

the U.S owns most of the network infrastructure required to access the servers of media websites and tracking companies

Google, Facebook and Twitter infrastructure is included in 87.32% of the 2,508 media websites that were analysed by our software. Through these three U.S companies most data about what individuals around the world read online ends up in U.S servers.

2. In some cases, reading the news online results in individuals' data landing in the servers of adversary states. For example:

Ukraine: 80% of national media websites and, on average, 84.61% of regional media websites have at least one connection which passes through the network infrastructure of Russia.

Palestinian Territory: 27.59% of national media websites, 50% of regional media websites and 16.67% of news blogs have at least one connection which passes through the network infrastructure of Israel, while 93.1% of national media websites and 100% of regional media websites pass through the network infrastructure of the U.S.

Pakistan: 5.71% of national media websites and 12.5% of regional media websites have at least one connection which passes through the network infrastructure of India.

3. While your country might have privacy legislation, reading the news online might result in your data travelling to countries which have no privacy law. For example:

Germany: Some national media websites from our tests have connections which pass through the network infrastructure of India, which currently has no privacy law.

4. Some media organisations which advocate and promote human rights enable multiple companies to track individuals who access their websites. For example:

Spain's Libertad Digital, an online newspaper for journalism advocacy, enables 49 companies to track its website's visitors - according to one of our tests.

5. The Wall Street Journal, the Philippine Daily Inquirer and Kashmir Times enable the most companies globally to track the visitors of their websites - according to some of our tests.

6. Unlike the Global North, most countries in the Global South do not host the servers of their media websites. Instead, they are usually hosted in countries of the Global North which means that their citizens' data is subsequently handled and regulated under different laws and jurisdictions.

7. When we access media websites globally, the main countries our data travels to include the following:

United States of America

United Kingdom

The Netherlands

Ireland

Singapore

Italy

Germany

France

Japan

Even though all the above countries have privacy laws, there are limits. Such laws don't necessarily protect the data of foreign citizens nor does all data fall under these laws. Additionally, it is currently unclear who these countries share collected data with and where such data is eventually stored.

8. Country-specific highlights:

Kenya: All national media websites and news blogs from our test have at least one connection which passes through the network infrastructure of the United Arab Emirates.

Pakistan: 74.29% of national media websites and 62.5% of regional media websites have at least one connection which passes through the infrastructure of Oman.

Syria: 86.96% of national media websites have at least one connection which passes through the network infrastructure of the U.S and the UK.

Saudi Arabia: 93% of national media websites have at least one connection which passes through the network infrastructure of Italy.

Russia: 77.78% of national media websites have connections which pass through the network infrastructure of the U.S and the UK.

Australia: About 94% of national media websites, 81.51% of regional media websites and 77.27% of news blogs have at least one connection which passes through the network infrastructure of Singapore.

India: 86.3% of national media websites have at least one connection which passes through the network infrastructure of Singapore.

Brazil: All national media websites from our test have at least one connection which passes through the network infrastructure of Italy.

Indonesia: National media websites have connections which pass through the network infrastructure of India, Singapore and the U.S in all 3 of our tests.

Nigeria: All national media websites and all news blogs from our test have at least one connection which passes through the network infrastructure of South Africa.

UK: National media websites have at least one connection which passes through the network infrastructure of the Netherlands in all 5 of our tests.

Main findings: Online tracking across websites in Asia and the EU

Trackography provides a snapshot of third party tracking through various different types of websites (e.g. governmental and financial) in India, Thailand, the Philippines and 17 European countries. Some key findings based on the data that we collected include the following:

India

1. Google tracks Indians' online behaviour more than any other company. In particular, it tracks 68.7% of users' access to health, financial, social and political websites in India.

2. Even though the server of onlymyhealth.com is located in India, when users access that website they also connect to the servers of 18 tracking companies which are located in the United States, Australia, Japan, Vietnam, Germany, the Netherlands, Sweden, Ireland and the United Kingdom. Such companies include data brokers like PubMatic and AppNexus, which have data retention periods of 270 and 730 days respectively.

*onlymyhealth.com includes information on various health conditions, ranging from pregnancy and STDs to diabetes and cancer

Thailand

1. Google tracks all (100%) access to Thailand's most popular websites.

2. Google tracks the online behaviour of Thai online users more than any other company. In particular, it tracks 66.8% of users' access to financial, social, health and governmental websites in Thailand.

3. Even though the server of ohozaa.com (one of the most popular websites in Thailand) is located in Thailand, when users access that website they also connect to the servers of 11 tracking companies which are located in the United States, Japan, Malaysia and Ireland.

Philippines

1. Google tracks the online behaviour of Philippine users more than any other company. In particular, it tracks 73.2% of users' access to financial, social, health and governmental websites.

2. Google tracks 80% of access to the most popular websites in the Philippines.

3. When users in the Philippines access couragephilippines.blogspot.com they connect to the servers of 16 tracking companies which are located in Canada, Australia, France and the United Kingdom. Such companies include data brokers like PubMatic and Lotame, which have data retention periods for 270 days.

*couragephilippines.blogspot.com is a website run by the Roman Catholic Church which provides "spiritual support for men and women with same-sex attractions"

European Union

Based on the following results it is evident that Google tracks users' access to governmental websites in 17 European countries the most:

Spain

Google, 43.1%

Facebook, 5.2%

AddThis, 3.4%

Switzerland

Google, 29.7%

Facebook, 1.9%

AddThis, 1.3%

Belgium

Google, 49.2%

Facebook, 3.1%

Neustar, 2.7%

Hungary

Google, 47.9%

Facebook, 13.7%

Yandex, 1.6%

Latvia

Google, 46.5%

Gemius, 1.6%

Facebook, 3.6%

Cyprus

Google, 48.2%

AddThis, 3.6%

Facebook, 5.8%

Estonia

Google, 58%

Twitter, 11.6%

Brightcove, 2.3%

England

Google, 68.2%

Twitter, 9.1%

AddThis, 1.2%

Netherlands

Google, 32.6%

Facebook, 2.3%

Neustar, 2.9%

Ireland

Google, 50%

Facebook, 13.2%

Krux, 31.9%

Austria

Google, 80.4%

comScore, 34.8%

Zopim, 1.1%

Romania

Google, 52.2%

Twitter, 10.9%

LongTail Video, 2.9%

Germany

Google, 22.9%

etracker, 11.4%

Specific Media, 7.7%

Malta

Google, 43.1%

Twitter, 13.8%

WPP, 10%

France

Google, 40%

Twitter, 16.7%

AddThis, 8.3%

Italy

Google, 49.2%

Facebook, 4.6%

Yahoo!, 1.5%

Sweden

Google, 41.7%

LongTail Video, 2.1%

Facebook, 2.1%

View more results on online tracking across websites in the European Union, Asia, Thailand and the Philippines here.

Meet the Trackers

For our case study on media websites we ran our distributed data collection software in 38 countries around the world and identified hundreds of companies which track individuals through media websites. According to our results, some of these companies track individuals in almost all of the countries and media websites that we examined. We call them the "globally prevailing tracking companies". We analysed their privacy policies to gain an insight on how they claim to handle our data.

In particular, we collected data on the following fields from their privacy policies:

The types of data they collect (PII, non-PII, technical data)

Whether they provide safeguards to prevent the full identification of individuals' IP addresses

1. Out of 25 globally prevailing tracking companies, 19 of them state in their privacy policies that they collect personally identifiable information (PII) and disclose data to third parties, without explicitly prohibiting them from using such data for unspecified purposes.

2. Only 11 out of 25 globally prevailing tracking companies disclose how long they retain data for in their privacy policies.

3. 22 out of 25 globally prevailing tracking companies are based in the United States of America.

5. While 25 globally prevailing tracking companies state in their privacy policies that users can "opt-out" from online tracking, this option is largely conditional in some cases due to some of the following reasons:

users can only opt-out if their browser is not configured to block third party cookies

users can only opt-out by cancelling their account with a service

users need to opt-out from every device that they use

users can only opt-out from the browser that they are using, which means that cross-site tracking across other browsers might continue

if users remove tracking cookies, they will not be able to access certain services

if users opt-out, they will have restricted access to content and features

For more information about these companies, view the data we have collected on github. Additionally, read our article about how the prevailing tracking companies handle our data here.

Methodology of Trackography

Through Trackography we examined which companies track us and where our data travels to when we access websites. Our methodology includes the following:

1. Creation of datasets

Tactical Tech started Trackography by exploring online tracking through media websites. We created datasets which contain the URLs of global, national and local media websites and blogs covering the news for 38 countries around the world. These datasets were reviewed by global contributors to the project.

2. Running Trackography's data collection software

Our software is designed to emulate a browser and to connect to the websites included in the datasets. The software not only allows us to view a user's traceroute to the server of a specific website everytime he or she accesses it, but to also collect all the third party URLs which are included in the websites.

Details about how to run our software can be viewed through our repository on github.

3. Analysis of results

Some of the results collected from our software illustrate which specific companies track us and where our data travels to when we access websites included in our datasets.

In our case studies we examined the results we collected based on the following:

How and why online tracking differs in various countries around the world

which countries your data passes through when you read the news online

which countries are hosting the servers of the companies tracking you

which countries are hosting the servers of the media websites you access

how tracking companies handle your data

Frequently Asked Questions (FAQs)

FAQ: General

Which types of websites has Trackography looked at?

We started off the project by examining online tracking through media websites. Afterwards we expanded the project and are currently examining online tracking through various other types of websites.

Who is tracking us when we access websites? Who are the 'trackers'?

Embedded images and code are included in most webpages we visit, which belong to the domains and servers of companies. These companies - such as Google and Facebook - are the "third party trackers" which track our online activity through the use of cookies and other technologies.

For more information about the specific companies tracking you when you access specific websites in various countries, please view our map.

Why are these trackers interested in tracking us?

Companies track users' access to websites because they engage in (one or more of) the following:

Profiling

Advertising

Market research

Web analytics

Web crawling

Many of these companies argue that they track individuals' access to websites so that they can improve the services that they provide. Companies in the advertising business aim to understand their audience as much as possible so that they can provide targeted advertisements.

Do trackers change across time?

Yes, our results provide a snapshot and show which companies track us when we access a website in a specific moment in time. The third party trackers will change depending on the browser, location and time the website is accessed.

What data is being tracked when we access websites?

Companies in the data industry can track your online behaviour. Third party scripts (such as javascript) monitor your website usage, such as your mouse scrolling down a webpage. When you share the news with a friend, both you and your friend will be associated once he or she clicks on a link, thus mapping your network of contacts. The amount of time spent on a webpage, the movements of your mouse and the section of a text that you copy-pasted include examples of data collected by third party scripts. Such data is collected through web analytics, which is an analysis on how a website is used by its audience.

Moreover, every third party tracker collects your IP address and other identifiable data and stores browser cookies, local shared objects and other tracking technologies on your browser. This permits them to keep track of your online habits and behaviour and to create profiles about you.

FAQ: Media websites

Why are you focusing on media websites?

We chose media websites for our first examination because they are commonly accessed by the majority of citizens around the world who have Internet access - regardless of their background, gender, ethnicity, occupation, affiliations and other characteristics. We are interested in exploring how regular daily browsing habits, such as reading the news online, can result in our tracking.

Furthermore, third party trackers can potentially identify a lot of information about individuals based on the type of news they regularly read - such as their political beliefs, economic status, and much more - and create profiles about them.

Why is information about my country missing from the map?

If information about your country is missing from our map, that's likely because we haven't found someone yet to assist us with the review of the list of media websites and/or to run our software from your country. Please help us add information by connecting us with a media expert and/or someone who runs Linux from your country.

Why is my media organisation missing from the map?

We collected lists of media websites in assistance with local partners. If your media organisation is missing from our map and you would like it to be included in the tests, please contact us at trackmap@tacticaltech.org.

If you found your country's media list in the unverified section and you're a media expert, a journalist or generally have good knowledge of your country's media, you can review the media list through the following steps:

1. Add missing websites which cover the news, are of public interest and which are regularly accessed by most individuals on a national or regional level in your country

2. Delete websites from which are not regularly updated, do not necessarily cover the news and are not regularly accessed by most individuals on a national or regional level in your country

3. Separate the following in the list:

National media websites

Regional media websites

Blogs covering the news

Why is my country's list of media websites in the unverified section?

If you found your country's list of media websites in the unverified section, that's probably because it has not been reviewed by a media expert yet.

Why doesn't my country have a list of media websites on github?

If you didn't find your country's list of media websites in the verified or unverified sections on github, that's probably because we have not compiled a list for your country yet. Contact us at trackmap@tacticaltech.org, ask us to add a list of media websites for your country or help us create it.

How can my country's list of media websites be transferred to the verified section on github?

Once your country's list of media websites has been reviewed by a media expert, we transfer it from the unverified section to the verified section on github.

Should I add Facebook pages in the lists of media websites?

No, because Facebook is one of the third party trackers often included in media websites that we are interested in detecting. We are interested in media websites which include the domains and servers of third party trackers, but not in webpages hosted by third party trackers themselves, such as Facebook.

Should the media lists be restricted to citizens accessing them in my country or can they also be expanded to media websites accessed by my country's diaspora?

Preferably, we would like to restrict media websites to ones accessed by individuals residing in your country. However, websites accessed by your country's diaspora can also be included - but that is not our priority.

Should media websites accessed via mobile phones also be included in the lists?

Currently, we are not including websites accessed via mobile phones. However, we hope to expand the project to include those in the future.

What do you mean by "network infrastructure" in the map?

Companies in the "purple countries" of the Trackography map host the network infrastructure required to reach the servers of the media websites you have selected, as well as the servers of the companies which track users through the selected websites. By network infrastructure we mean the satellites, fibre optic cables, switches, routers and international or national Internet carriers.

FAQ: Beyond the media

How did Trackography expand beyond the media?

More recent tests have examined online tracking through websites that fall under the following categories:

Government and Politics

Finance

Health

Society

What other types of websites are included in Trackography's tests?

Various non-media websites have been included in Trackography's latest tests, all of which can be viewed through our repository on github. Such websites cover banks, consultancy, health insurance, government services, human rights, activism, LGBT rights, dating, culture and travel, to name a few.

Why did Trackography expand beyond the media?

Trackography expanded to the examination of online tracking across a broad spectrum of different types of websites to:

increase transparency about which specific third parties are in a position to aggregate tracked data and to potentially create profiles about groups and individuals

illustrate the countries under which our data is potentially regulated following our access to different types of websites

foster a debate about profiling which can occur through aggregated online tracking

If the various types of websites that we regularly access are matched together, one can potentially reach inferences about us. For example, if someone knows that you regularly access LGBT websites, as well as the websites of the European Union and other job-seeking websites, it's not that hard for someone to correlate such data and to reach the inference that you are an unemployed LGBT person based in the EU, right? That may or may not be accurate, but that's not the point. That is an inference that algorithms are likely to reach when the above has been correlated and aggregated.

We expanded Trackography to the examination of online tracking across various types of websites to identify which main companies are in a position to aggregate tracked data and to potentially play a key role in the profiling business.

Which countries have been included in such tests?

We have collected results on online tracking across various different websites - as mentioned above - in India, Thailand and the Philippines, as well as in 18 European countries. More details can be viewed through our repository on github.

Why was India selected for these tests?

India was selected as a case study because we are interested in exploring online tracking in the world's largest (in terms of population) democracy. Furthermore, we are interested in exploring the potential role that the data brokerage industry plays in the global south.

Why was Thailand selected for these tests?

Thailand was selected as a case study because we are interested in exploring online tracking in a non-democratic regime of the global south, which can then potentially be compared with online tracking in democratic regimes in both the global south and north.

Why were the Philippines selected for these tests?

The Philippines was selected as a case study because we are interested in exploring online tracking in the global south and we happened to be in the region for RightsCon 2015.

Why were tests run on governmental websites in the European Union?

Given that governments in the European Union are committed to protecting their citizens' data, we are interested in exploring whether their websites enable third parties to collect data through online tracking.

How can I contribute to the creation of datasets which include various types of websites from my country?

Our software is designed to run on a list of websites to detect the third party trackers and the traceroutes that are performed when we access the websites in the list. In order for our software to be able to run on a list of websites, it needs to be included in our repository on github.

You can create your own list of websites that you are interested in examining and add it to our github repository. Alternatively, if you are not a github user you can drop us an email at trackmap@tacticaltech.org and we can add it for you.

FAQ: Software

What does Trackography's software do?

Our software is designed to:

perform an HTTP connection (using phantomjs) to every media website under analysis

collect all the third party URLs which are included in the media websites under analysis

perform a traceroute for every URL included in the media websites under analysis

identify the countries which host the network infrastructure required to reach the servers of the media websites under analysis, as well as the third party servers included in the specific websites through a GeoIP conversion of all the included IP addresses

send the results to our server

Who can run the software?

Any Linux user can potentially run our software. It's quite easy and details about how to run it can be viewed here.

I want to run the software, but I am not a Linux user. Can I?

Unfortunately not (yet).

How can I run the software for all non-media websites?

When running the software on a list of websites under our "special media" category on github (which includes all other, diverse lists of websites, excluding media websites), please run the following:

The software usually requires about 30 minutes to run and sends 8-15 megabytes of data to our server.

The software hasn't finished running and I need to relocate to another location under a different ISP. What should I do?

Freeze the software by pressing control + C. When you return to the same ISP, you can restart the software and resume from where you left it.

Should I run the software over Tor?

No, because our software performs traceroutes which cannot run over Tor. If the software runs over Tor, the web connection would appear from a different network point than the traceroutes and would lead to inaccurate results.

Can I run the software over a VPN?

If you would like to run the software over a VPN, please specify the country of your endpoint in the required field right before you start running the software. For example, if you are based in the United States but your VPN ends in Sweden, please specify the country with "-c sweden". It is also recommended that you add the option "-i".

Once the software has run, should I send the results to you?

No need to. Once you've run the software, the results will automatically be transmitted to our server. If you would like to prevent your collected data from automatically being transmitted to our server, please add the option "-d".