⛅ invi.sible.link

in April 2016 I've summited a funding proposal for Internet Control Feelowship Program. During Summer, we've iterated over many feedback, shaping the 1 year project. Below you'll find the project framing, scope and roadmap

Objectives

Repressive governments have abused “third party trackers” to exploit users or even third parties, for example, malware by FinFisher/HacklingTeam was injected by exploiting unencrypted HTTP connections, or the so called “Chinese Greatcannon”, that was transforming, even international, users accessing a popular chinese website into bots part of a DDoS attack. Considering the latest trend of exploiting the presence of unencrypted HTTP connections or the insecurity of some third party services included in the common web navigation, we need a new way to raise awareness. Additionally, we don’t know yet if third party trackers are deploying more invasive tracking technology to a sub­set of users. This investigation has never been done and my goal is to perform research comparing differences in how tracking is done depending on the country of the user and how it changes over time.

The process

The analysis set would be (at the beginning), the list of the most accessed website per country and per categories, but the testing list will be opened up for community contributions at a later stage.

The host organisation, CodingRights, is doing an extensive anti­survellaince project in latin america. I picked this NGO to support me in the communication/advocacy side. I don’t need technical support for this organisation, but my limit is rather the outreach.

Explaining which are the finding, explaining which are the responsibilities and the protection tools.

Connecting the line between web tracking for advertising and surveillance is a tricky topic touching different aspect, I’m keeping attention of:

Through the low level analysis (done with the honeynet tool Thug integration), the research project would map out which tracking techniques besides cookies are used. The data produced would be open data, permitting other researchers to do their own analysis. This is the key value of the project: permit an understanding of tracking technology, providing up to date information and raw data to support a campaigning effort.

If malware/malvertising spreads through compromised third parties (this is unlikely, because I’ve not the same capacity of an anti­ virus industry), or very invasive fingerprinting techniques appear overnight, an “emergency” communication will be dispatched. This has to be done in conjunction with CodingRights, and the goal would be to raise awareness in the impacted communities and report the actors involved. These intermediary report will be published and improved during the course of the year. It is very hard for me as an individual to have enough resources to actively look for malvertising campaigns (this is something that even antivirus companies, that have much more resources, are struggling with), however the researchers interface will be designed to identify spikes (or other anomalies) as well as other privacy invasive behavior.

Provide a resource useful to website owners that include third parties so they can act more responsibly and make a more informed decision over what they should or should not be including on their site.

Investigate if third party tracking is something that changes depending on the country of the user. For example, does an Indian company serve more “aggressive” trackers to a Pakistani audience? Are certain geo­political factors linked to serving different trackers to users? To date nobody has found any evidence of this occurring, but it is something that can happen and if found would be an example of “algorithmic discrimination”.

Having a tool flexible enough to react when new conflicts arise. At the beginning the analysis takes into account the “most accessed websites per country and category”, but the goal is follow the current events and be reactive to what is happening in the world.

The research goals

The innovative aspect of my approach is in the deep analysis of the tracking techniques. Javascript is delivered after certain transformation (uglify, minify) and they are hardly analyzed via static code analysis. Using the Thug sandbox, I will profile the javascript trackers by behavior. The research produced will show many shades and ways in doing user tracking. In this historical moment the behavior of the privacy/security community is quite “binary” like reported in its people vs abuse not publishers vs adblockers. As user you can enable javascript execution or you don’t. You block via AbBlock, NoScript and others, or you permit them. Considering the potentially huge impact on society, for passive surveillance or for malvertising campaign, I want increase the elements in the debate.

The end goal of the research project is to provide a daily updated database on tracking technology; enable researchers and web content managers to understand the security and privacy implications of their third party inclusions.

The mid­term goal is to engage privacy aware community to exert pressure on site owners that include highly invasive tracking technologies. Never before has the security and privacy implication of third party trackers been assessed in this way. This represents a new way to express critical and technical judgment on trackers decisions.

The application under development is a pipeline that supports data collection, analysis, minimization and open data. The open data factor is a key value, I want to enable researchers and analysts around the world to understand the impact of tracking and tracking technologies. It is generally not so hard to understand the technology behind tracking, what is however hard to communicate is the broader impact that this phenomenon has.

This data collection will enable mine (and others, considering every result will be open data) research about tracking surveillance.

Having a community of supporters will be necessary to provide “local contextual knowledge”, that I cannot possible have for all involved regions. This community will also be providing lists of websites to test and perform campaigning based on the results of my analysis.

Coding Rights will be the first NGO implementing this workflow, this will fit in Coding Rights ongoing investigation on online surveillance practices in Mexico, Argentina and Brazil.

Milestones and dates

The project will include the following high level stages of research and implementation

Make a list of sites, technically selected and community selected. Have a flexible methodology to keep the sources updated to reflect current events.

Collect data on the sites in question from one (or more) network vantage points. One is enough, but more vantage points means that a more thorough and in depth analysis can be performed.

Extend the analysis by including more information about every website (what the javascript is doing, profiling their invasiveness, which privacy harmful behavior is present, how much can be used for surveillance, by whom, where the data is stored, how much is changed by previous collections, if the same content from a different place in the world appears different, which company is associated if any, which security transport is used or is supported)

Use the analysis to feed a daily updated visualization for researched based on parallel coordinates.

Disseminate the results of the research and analysis with help from Coding Rights.

Milestones

Having a tool able to emulate client navigation on a specific set of websites. Having a working pipeline able to publish results constantly, and (few, simple) visualization integrated. At the end of this milestone a website with a visualization on third parties presence would be published. No javascript deep analysis will be done in this stage. Timing of the 1st milestone: at 2 months since the beginning of the project (the software is partially working, in a prototypical stage, the visualization and localization not yet)

Having an outreach capacity. assess the “campaign feed by data” concept. Improve the analysis of the injected script in order to extract more information. having more than one observation data point, in order to compare the same trackers from two different points in the network. Timing of the 2nd milestone: at 6 months since the beginning of the project (a narrative and a campaign strategy has to be implemented, in order to get supporters around the world)

Improve the technical analysis of the javascripts completing the integration of the honeynet tool. Having as reference the Latin American region, complete all the “researcher visualization” to enable other analysts to perform their own research. Publish a research paper on the key findings, methodology, tests done during sensitive events.

Timing of the 3rd milestone: to be delivered in the last two months of the fellowship

Firm list of technical activities

every activity is supposed to fit in 1 month of job

Improve browser emulation and javascript sand boxing, integrating the Honeynet project Thug technically this allows us to get a list of all the javascript functions executed going beyond just a static source code analysis.

Having a data­sharing capability in every node, and look for differences between tracking code.

In browser visualization of the results, usable to monitor the trend or visually identify anomalies.

Import the browser history of a person to map their profile of exposure / support community driven input (through github files), this approach would allow a more personalized analysis, that goes beyond just looking at the Alexa top 500 sites for each country.

Research into how to identify anomalies and tracking related functionality based on the dynamic code analysis provided by 1.

Research into the privacy implications and device fingerprinting used in tracking

Support Latin American communities running the tool, interpolating their results

Write a research report

Work with CodingRights in disseminating the results in Latin American communities

Researcher visualization: the difference between this and point 3 is the amount of detail provided

Wrapping up the project and performing last touches and cleanups.

Anticipated outputs and outcomes

The surveillance implications of third party trackers has still to be explained to a wider audience. The past year debate around “ad blocking” has shown certain levels of misunderstanding on the privacy and security implications of unsafe (non https) third party inclusions.

Despite the technical findings, the massive visualization will improve the understanding about script blocking, javascript integrity and transport security.

As activists and journalists security outcome, a better understanding of the attack vector and pointers for countermeasure are the outreach goals of this project.

Note: TacticalTech , my former employer, has no role in this fellowship. I think the whole project would be beneficial also to the past product Trackography, currently maintained by TacticalTech most of my results would be published as open data.

For this project, I will use another domain name to present the result.

Why is the selected host organization best suited to mentor your project?

Coding Rights has being doing research, advocacy and awareness raising on privacy and surveillance practices in Latin America, particularly through antivigilancia.org (with content available in Spanish and Portuguese). I will integrate my results with their communication. Coding Rights on my engagement said:“We consider that the expanded version of Trackography could be a great tool for visually translating privacy rights into clear abuses in our daily transfer of data while simply browsing. And it particularly fits with our project on story telling entitled “Unveiling Surveillance Practices in Latin America”, which will be a platform/repository for investigations and storytelling experimentation on surveillance and privacy rights.”

What do you expect to be the primary outcome of this project for a general audience?

An improved understanding of tracking techniques: having a worldwide assessment of the tracking systems existing beside cookies.

Report the most abusive behavior and stimulate a technical, critical judgment of third party trackers. At the moment website owners choose what scripts to include on their site carelessly, through this project we hope to raise global awareness on the significance of making an informed choice on the matter.

How will you collaborate with other researchers working in this field?

Princeton University made a research on 1 million website, but thanks to my previous experience I know that trackers change quite fast. Researcher shouldn’t use a static data as reference. Princeton research don’t consider all the subtle way trackers can use to do user fingerprinting. In my case, with the integration of Thug, I can provide a more detailed analysis re-usable by other researcher in this field.

In theory, I’m operating in a field where at least four kind of researchers can be interested:

policy analyst: to realize if the Term of Service, EULA, international polices are aligned with the state of art

With OONI project lead Arturo Filastò, I discussed the possibility of an integration with the raspberry­pi network deployed by OONI. This can permit the usage of many advantage points in different Network. This is a viable hypothesis but should be explored only if the vantage point become essential in the comparative analysis.