The goal of this issue is to expose a Privacy challenge in Piwik, regarding the ability to spy on users tracked in Piwik over time.

What is the Visitor ID?

The unique visitor ID is a 16 characters hexadecimal string. Every unique visitor is assigned a different ID and this ID is not changed after it is assigned.

It is stored in the first party cookie. After 13 months after the first action by this user, the ID will be renewed.

The Visitor ID is stored in the Piwik database in the field idvisitor

What is the fingerprinting hash?

When tracking a new user, Piwik processes a fingerprint hash for this user. The hash is built from a list of user attributes such as IP address, screen resolution, browser plugins used, etc. (this is done in the method getConfigHash.). The fingerprint hash is used by Piwik Tracking API to try to record the actions in the correct user visit. The fingerprint hash is used when the Visitor ID (in first party cookie) was not found (otherwise by default the Visitor ID is used).

Notes about fingerprint hash is created:

The fingerprint hash is currently seeded with a salt that is different for each Piwik instance.

(ensures that a same person tracked in multiple Piwik instances could be not be cross-matched across those several instances. )

The fingerprint hash is also seeded with the Website ID (done in #6824)

(ensures that a same person tracked on several websites within the Piwik instance could not be cross-matched across several websites within this Piwik instance).

The fingerprint hash is stored in the Piwik database in the field config_id

Privacy challenges

Imagine for example if a Piwik database is seized by ex-colleagues of Edward Snowden (spies) who would like to use the Piwik data to spy on users who were tracked in Piwik.

When seizing a Piwik Database:

if IP anonymisation is not enabled, the Piwik DB will give spies the complete trail of user actions on the website for a given 'IP address' or 'Visitor ID'

if IP anonymisation is enabled, ability to spy is bit more limited. The Piwik DB will give spies the complete trail of user actions on the website for a given 'Anonymised IP address' or 'Visitor ID'

Note: when #5907 will be implemented then spies will not be able to get complete trail of user actions for a given 'Anonymised IP address'. (why? this IP address will be hashed with a daily seed when IP anonymisation is enabled, and the fingerprint hash which uses this Anonymised IP address will also be changing every day for a given user preventing ability to spy over time)

Spies can always lookup all actions for a given 'Visitor ID' assuming:

the user had First party cookies enabled.

the user was using the same browser over time, and did not delete the cookies

Spies can lookup all actions for a user that uses a particular browser, and/or a particular OS, and/or a set of plugins

(Piwik stores the browser, OS and plugin info in the tracking log tables)

Improve privacy

Since our goal is to improve the Privacy by default for users being tracked in Piwik (#6160), we wanted to explain how this works.

Note that to improve Privacy in your Piwik server and prevent long term surveillance of users via the Piwik database, you can already do the following: