wordpress is leaking user/blog information during wp_version_check()

Description

Hi,
we've noticed that wordpress will send how many users and blogs are in a given installation during the GET to api.wordpress.org together with the installation URL in the headers.

Is there any reason why this is done? It seems quite a leak of information. Can it be turned into an option defaulting to off and admins can opt-in if they want to report how many users/blogs are currently there?

thanks.

PS. slightly related, WP will also leak which blog in MU mode is requesting any URL via the user-agent in the WP_Http class (for example while updating the news feed on the dashboard)

Thanks for the quick reply, a question still stands though: can it be turned into an option to be opted in by administrators? While it is clear that wordpress is checking for updates it isn't clear that it is also leaking how many blogs and users are on the current installation.

Thanks for providing a patch, I could review it, probably it's better fitting to name the filter wp_version_check_query_variables instead of update_core_data_send as this does not modify the core data send with request headers.

I'm closing this again. The call to api.wordpress.org was discussed extensively prior to the [14010] being committed to trunk including being an agenda item at at least one of the weekly dev meetups in IRC.

I'm closing this again. The call to api.wordpress.org was discussed extensively prior to the [14010] being committed to trunk including being an agenda item at at least one of the weekly dev meetups in IRC.

The fix is backwards compatible and the privacy concerns were not resolved in [14010].

There are plugins available already which allow you to disable the checks if you don't want to send the data.

We are not going to add any UI option for this and I don't see that we need a new filter either as plugins have already sucessfully been created using the filters/actions we already have.

Please be so kind and leave the ticket open until it's solved. I'm sure that for you personally this is all okay but this should be about the common user.

Tscho is right in the point that the privacy concerns were not resolved. That those have been discussed in the past did obviously not help much so far to increase the awareness within the wordpress core team.

I think the problem is that most users are not aware which of their data is spread to which third parties and for what reason.

And wouldn't it be such an important topic, I'm sure this wouldn't come up again.

Even if a user knows that some data needs to be passed for a version check of core, plugins or themes, the amount of data passed to remote is obiously more than needed to do the version check. It has been already written in this ticket that the additional data get's passed for stats.

But users should be made aware upfront so they can freely decide on their own if they want to instead of being forced to support the project with their usage-data. They could be offered an opt-in to do so.

But instead, you're promoting that users that have no clue what a plugin is should search through many of them and probe some until they luckily find one that prevents leaking their data. It's more likely that they can not even verify if a plugin is doing what it announced. So the only safe bet is to have that as part of the application itself.

Wordpress does not offer such and the privacy settings page in the backend is not informative at all about this issue. The installation screens do not contain a single word about this either.

Maybe I've just overlooked it, but where is the information available which data gets transfered to whom, for what reason and how this can be prevented? Please keep in mind that this is about the average user and not plugin coders that have no problem to remove such a check within a minute.

Let's be more constructive here. Probably it can be created a statement so users can learn more about the privacy issues when downloading from worpdress.org.

Additionally I'm intereseted to learn about the reasons to not offer an option for submitting stats.

1) WordPress should be more open about what data they collect and why.

2) WordPress should offer a checkbox perhaps on Settings->Privacy that allows the user the option to opt-out of sending unnecessary infromation, the default setting can still be to send statistical information.

Privacy now adays is a concern for many users, transparancy and options are important. A plugin to do this is not enough as many users do not know that this data is even sent.

@investici, @toscho, @hakre: +1 Having plugins which can solve a task means nothing if you're not aware of the fact that you're sending data. If this doesn't get into core as an option, imo wp should note that on update.

@toscho: thanks for your patch, it is a step in the right direction. Any plans for review/inclusion in core?

whether this bug is fixed by that patch is debatable though, we'd much prefer having the users opt-in before leaking so much information or at least make the users/admins aware of the fact (and the reason why?) that information not related to check for updates is sent while checking for updates.

what is the best way to get consensus (and a fix) for this from the wp developers?

whether this bug is fixed by that patch is debatable though, we'd much prefer having the users opt-in before leaking so much information or at least make the users/admins aware of the fact (and the reason why?) that information not related to check for updates is sent while checking for updates.

I agree, opt-in via wp-admin/options-privacy.php would be much better. But seeing how strongly some people are against more privacy I thought my patch could be a first compromise. Not good enough – but better than nothing.

I would write a patch for a user controlled opt-in per backend if we find a consensus.

I know this ticket hasn't got any traction for quite some time, but I think a filter here could still be useful and make it easier for people to change query args. 16778.diff​ adds a core_version_check_query_args filter.

@investici: "close" means the ticket is a candidate for closure with a disposition other than fixed. So no, there's no such filter in WordPress right now. I re-added the keyword because of a lack of traction and no high demand for such a filter.

I agree with half of your statement - 'low traction'. This ticket has basically be ignored for 6 years. What basis are you presuming there is no high demand? I manually patch WordPress myself after every new release to put this into the code. I accept I'm only one person among millions using WordPress but then the average users is blissfully ignorant of privacy otherwise we wouldn't have so many DrDOS web attacks.

WordPress does not publish (to my knowledge) any statement of what data are collected about my site(s) nor what that data is used for or how long it is kept. The current code doesn't allow me to opt out of sharing any information either.

WordPress is an open source project. Unlike closed source projects, you can freely read and edit the codebase and see exactly what is sent, or learn about how different parts of the project operates.

Additionally, the WordPress project maintains an open information section, similar to Wikipedia, where anyone can contribute new documentation or information about the platform, that to a reasonable extent would be useful to other users. As such, you're free to create a page for this, and the instructions for doing so are here: ​https://codex.wordpress.org/Codex:Creating_a_New_Page. It would likely be categorized here: ​https://codex.wordpress.org/About_WordPress. As a volunteer-based project, there's no group that's "responsible" so to speak for creating content really of any sort for WordPress.org. The best way to ensure that things get done, is often to do them or spearhead them. This could be something to consider, as it doesn't appear that any of the other volunteers who work on the project have had the interest in doing so for this topic. As a place to start, the data is stored by WordPress.org for calculation purposes for 48 hours, and then discarded.

There is a balance between having too much and too little information about usage, and what that entails. There's groups on both sides of this. This ticket is mostly comprised of users who would like (from my understanding of reading the comments) less information to be sent back. There are other tickets, who want WordPress to collect more information than it currently does (generally with the argument that WordPress needs to know more about it's users to make better software OR alternatively that the WordPress core developers aren't collecting enough information to base decisions on). This is a balancing act between privacy and practicality.

As for this ticket, WordPress is now used by almost a quarter of the internet, and since 6 years ago a total of what appears to be just 6 (quick count on my part; could be off +/-2) have expressed interest in a filter for this. Aside from the performance implications of calling apply_filter() which albeit while small is still a consideration factor, there is also WordPress's core philosophies of "Design for the majority" and "The Vocal Minority": ​https://wordpress.org/about/philosophy/. It is unlikely that of the many tens of millions of active WordPress installs more than a handful would actually use this filter. Furthermore, introducing new filters have to be done with care, particularly out of consideration for future development. Does a filter here prevent WordPress from being able to achieve future goals due to backwards compatibility concerns? Probably not, but again another thing to consider.

Finally, there is already an applicable WordPress filter that can be used to achieve the same result: http_request_args, where the existence of the wp_install header (which is exclusively used on wp_version_check() calls in WordPress) could be used to filter the information from the body.

Given there's a way of filtering this data already, and there's a lack of significant interest, closing this ticket. As a reminder, ticket conversations can continue while the ticket is in a closed status.

I'm not sure I'd be the best person to start writing a new 'privacy' page as 1, I don't have all of the information about what happens to the data on server end (which is perhaps why I'm so concerned about my privacy) and 2, it would probably be worded in an incredibly negative and potentially damaging way!

The other concern here is that you suggest that the http_request_args filter can be used for the same purpose, I disagree (but am happy to be corrected. That filter takes 2 parameters, $r and $url, but only the former can be amended during that filter call, so where and how can I amended the private information added to the $url requested? That's what this patch addresses.

I've drafted a privacy page and then checked in on the Slack #docs channel. They advised that the Codex is being shut down soon, so where should this go now? They suggested attaching to this ticket but is needs to be findable by WordPress users.

This strikes me as something that would be trivially fixed by adding a sentence to wp-admin/about.php

I understand that adding checkboxes and steps is something people don't want to add, and I agree, the only suitable place for such a UI is during install and on update.

Stating what information gets sent to .org and why should only take a short paragraph of text at the bottom of the about page. If we can add an entire page talking about Freedoms I think we can write a short privacy statement.

Here's a suggestion:

Note: WordPress may send statistics to WordPress.org when requesting updates. This is to help plan and improve future updates.

With perhaps a "For more information, click here" that leads to a .org page

I like your idea but fear that it won't gain any traction and also perhaps isn't as straightforward as it first seems.

If you take the principles of individual data privacy, all collected data needs to have a stated purpose. If the assertion is that the data are collected as you say to 'plan and improve future updates' why is there a need to know how many users and blogs are running? I cannot think of any justification for collecting this data.

It saddens me to read through this ticket and notice the general unwillingness to improve.

Let me start out by saying that the number of registered users I have on my site tied to the URL that is sent with tracking request gives out vital information on how well my business could be doing. Information that is mine and mine only.

If this is really used to "help plan and improve future updates" then there are much more privacy friendly ways to go about this. At the very least we could make it very clear that WordPress is tracking this information and what exactly it is doing with it, I really do not think there is any excuse for that.

We would not opt-in to usage tracking in a plugin without knowing what exactly it tracks. WordPress doesn't have to play by this rule as the download is the opt-in, but let's at least make it super clear what we're opting into then.

This becomes even more important as the collected data is not visible to us, lone contributors outside of a8c. All we have is your word.

As for this ticket, WordPress is now used by almost a quarter of the internet, and since 6 years ago a total of what appears to be just 6 (quick count on my part; could be off +/-2) have expressed interest in a filter for this. Aside from the performance implications of calling apply_filter() which albeit while small is still a consideration factor, there is also WordPress's core philosophies of "Design for the majority" and "The Vocal Minority": ​https://wordpress.org/about/philosophy/. It is unlikely that of the many tens of millions of active WordPress installs more than a handful would actually use this filter. Furthermore, introducing new filters have to be done with care, particularly out of consideration for future development. Does a filter here prevent WordPress from being able to achieve future goals due to backwards compatibility concerns? Probably not, but again another thing to consider.

This is a very oversimplified way of looking at things. Just because only 6 people replied to this Trac ticket does not mean that no one else has an issue with this. WordPress sending the number of users your site has is undocumented behaviour which you would only know of by going through the WordPress source code, and we both know that the majority of WordPress users never does this. Furthermore, you are comparing "a quarter of the internet" vs "the # of Trac users". Certainly a quarter of the internet is not using Trac.

Wrapping up: the very least we could do to improve is to document this behavior and to create a page on what data exactly WordPress is collecting, and why.

People should know without having to go through each line of code in WordPress one by one, so they can make an informed decision on whether they want this or not. Alternatively, WordPress should quit saying stuff like "own your data", because apparently you don't.

Totally agree with @DvanKooten. Awareness is key here.
The majority of users doesn't know this, and they should. Best solution IMHO is an opt-in setting in the General settings to share this data with wordpress.org.

Completely agree, also agree that this should be opt-in, not opt-out. If a user *wants* to share this information, they should be able to do so, but by default sites should not be phoning home and sharing this information.

I just found out about this ticket a little while ago and I have a SIGNIFICANT problem with the amount of personal and detailed information about my blog and my users being relayed back to Automattic. I manage 50 some-odd Wordpress sites, all on private hosting, and this is unacceptable.

I'm hoping this can be fixed with a Plugin to strip out this data if WP core refuses to take this out (or at the very least make this optional).

Not my place to step on any legal team toes but what steps are being taken towards GDPR compliance? WP will need to publicly clarify all data collection as well as the legal basis behind it in any case.

Rather than thinking in terms of general, wooly concepts of data collection, you need to be working towards compliance with that specific framework.

Automatic WordPress updates are still something that many organizations avoid at all cost (or even pick another CMS/WCM) due to the lack of control and inherent dependency on 3rd party (be it WordPress.org in that case).

Being transparent about what internal data is being sent externally is mandatory. Handling an undefined set of data probably gets in the gray legal area within the EU, possibly some USA states, China, many Arab countries and others that actually care about data privacy.

I like the patch sent by @toscho many years ago as the safer option (or a Settings -> Privacy checkbox), together with an update of ​https://wordpress.org/about/privacy/ that @dd32 shared here with a more detailed list of internal data items transferred over the web.

The bold statement above regarding "reading and editing the codebase" completely fails to comply with "Democratize Publishing". A Content Management System is such as it doesn't require IT intervention at all times.

Note that sending the site URL (user agent and wp_blog header) along with these checks makes every WP installation vulnerable to targeted malicious updates. It is even possible that that has happened already: There are gag orders in the US making it impossible for the .org site admins to deny such a scenario convincingly. So we have a bad situation for both sides. Reducing the data and offering an opt-in would really help.

Pending action in the core code that may or may not happen I've created some code after many hours of messing about logging and blocking all requests and come up with a few functions that reduce the leaking of data. Apologies it's not well documented in what it is doing at the moment and there may be more in there than you need (like blocking auto-updates) but if you are concerned already you are free to use my code:

Pending action in the core code that may or may not happen I've created some code after many hours of messing about logging and blocking all requests and come up with a few functions that reduce the leaking of data. Apologies it's not well documented in what it is doing at the moment and there may be more in there than you need (like blocking auto-updates) but if you are concerned already you are free to use my code:

I just created a simple plugin for this as well, although it only strips off the number of users from the version check request for now. It's on GitHub here: ​my-precious. It also does not get rid of the auto-update functionality, which is super valuable IMO and makes for another discussion altogether. :-)

Not my place to step on any legal team toes but what steps are being taken towards GDPR compliance? WP will need to publicly clarify all data collection as well as the legal basis behind it in any case.

I would note that this information is being sent to WordPress.org, not Automattic. WP is an open-source community project, not an Automattic product

I'd also note that an opt in is going to be much more complicated to implement as the immediate result is no stats or a prompt on update, both of which are bad. WP just needs to state what it sends and where, and we should be doing this anyway if only for documentation purposes

I would note that this information is being sent to WordPress.org, not Automattic. WP is an open-source community project, not an Automattic product

I'd also note that an opt in is going to be much more complicated to implement as the immediate result is no stats or a prompt on update, both of which are bad. WP just needs to state what it sends and where, and we should be doing this anyway if only for documentation purposes

Your bold text misses one valuable point - I agree that WordPress need to tell users what information is being sent and where to, but users also deserve to be told exactly why, and for each individual piece of data collected.

... you can freely read and edit the codebase and see exactly what is sent, or learn about how different parts of the project operates.

The "you" used here seems to infer that everyone has the knowledge, time and willingness to inspect the entire WordPress codebase prior to installing or upgrading it. It also contradicts the "Design for the majority" philosophy you quoted later.

... the data is stored by WordPress.org for calculation purposes for 48 hours, and then discarded.

That is enough to warrant disclosure. People need to know what you are collecting, if it is anonymous and for what purposes you use that data.

I don't know the full details of that, and I'd wager a lot of other WordPress users do not either.

You speak of editing the Codex, but:

How does someone know what to add to the Codex, if one doesn't know what you do with the data?

How will the ordinary WordPress users come to know of it PRIOR to installing or upgrading?

That is a problem and the reason @investici opened this ticket six (!!) years ago.

Also, keep in mind that if the data is not entirely anonymous, then in addition to disclosure, WordPress.org will also be required by the upcoming EU GDPR (2018) to allow WordPress users to opt-out from this data collection, as that regulation will also apply to non-EU organisations.

As for this ticket, WordPress is now used by almost a quarter of the internet, and since 6 years ago a total of what appears to be just 6 (quick count on my part; could be off +/-2) have expressed interest in a filter for this.

Has it occurred that this may have been due to the lack of information to begin with? Had I known about it when I started using WordPress (2008), then I would have certainly chimed into this debate then too.

Aside from the performance implications of calling apply_filter() which albeit while small is still a consideration factor

To sacrifice privacy or security over performance sets a very, very dangerous precedent. I certainly hope this is not the case for other parts of the WordPress codebase.

I wholeheartedly agree with @DvanKooten closure statement, and would like to repeat it in closing:

the very least we could do to improve is to document this behavior and to create a page on what data exactly WordPress is collecting, and why.

I would note that this information is being sent to WordPress.org, not Automattic. WP is an open-source community project, not an Automattic product

That doesn't matter for the user. It is an external institution.

I'd also note that an opt in is going to be much more complicated to implement as the immediate result is no stats or a prompt on update, both of which are bad. WP just needs to state what it sends and where, and we should be doing this anyway if only for documentation purposes

It is clear that the exact version numbers of PHP, the database and WordPress itself are needed to generate a useful response. The rest needs to be removed. And even then the user should be made aware of the fact that these data are sent.

As for this ticket, WordPress is now used by almost a quarter of the internet, and since 6 years ago a total of what appears to be just 6 (quick count on my part; could be off +/-2) have expressed interest in a filter for this.

Has it occurred that this may have been due to the lack of information to begin with? Had I known about it when I started using WordPress (2008), then I would have certainly chimed into this debate then too.

Working with WP since 2006 I wrongly assumed that core developers would care about user privacy, and I wrongly assumed that the development process was open and trustworthy for everyone. Now I know that I can't really trust both anymore, I'll take actions by my side to limit this issue.

The suggested solution - a simple filter - would have been enough, but six years without the intention to solve this issue is a show stopper for me and my business partners as well.

On a side note: I really trust @investici and their work, as many other people in EU and elsewhere do because they take action to protect digital rights and user privacy from day one. Feel free to have a look to one of their projects to know more: ​http://www.autistici.org/en/index.html

Like @pixline, I too didn't care for this until now, assuming that there wouldn't be any concerns/issues about user privacy. Not sure how would number of registered users or blogs help, especially during version check.

A checkbox in the Settings -> Privacy page along with a page on what data WP sends back would be ideal.

I have a major problem with any of the software that I use at work that "calls home" (for lack of a better term). I run WordPress Multisite on three different networks, two of which are completely closed. I can't have WordPress reaching to .com or .org and timing out. Please let me opt-in for stuff like this.

I had been WP user since 2008 and I remember privacy issues being repeatedly raised as long. In numerous channels from blog posts, to trac, to the stage of WordCamp Europe. More often in context of plugin/theme updates, which have relatively more impact than multisite stats.

Silent collection of private data is fine — no, it's not.

WP org is run by good guys and that makes it ok — it does not.

There is a way with HTTP API — as an author of Update Blocker plugin that "solution" is ugly, unreliable, and extremely inconvenient. More so at one point in history WP changed API format, which broke every existing implementation of such filters.

I do grasp the context and challenges of such old and grown in system.

But WP should either treat such concerns with respect and attention they deserve or get off the corpse of its "own your data" high horse.

Not sure how many here know this or have participated in the old discussions on Gravatar #14682 which was closed last year as wontfix again. In my mind everything here and in that ticket converge to what view on privacy that the project should have. Currently the view is users should educate themselves, the project has no responsibility, and that in my mind is not a good way to handle things.

Not sure how would number of registered users or blogs help, especially during version check.

Specifically, this code was imported during the merge of WordPress to WordPress-MU, when multisite was added. The reason for this particular data being sent has to do with the possible necessity of providing alternate upgrade paths for large installations.

When in multisite mode, WordPress stores each site in the network as a separate set of tables. For anybody who has done an update on such a site, you'll know that there is an "Update Network" process which runs through each of those tables performing any changes to them that may have happened in the update. These can be as simple as running some queries to update a few rows in the options table, or entire schema adjustments. While this process is fine for smaller installations, it won't work on truly big installations. Too slow, basically.

The "number of blogs" is thus sent back in case there is some major update to the schema that might require the dev team to create a separate update path. It's one thing to have a couple of blogs do some schema changes, but multisite instances exist with hundreds of thousands of sets of tables. Those sites might need special handling at some future point. It has not happened yet, but it is certainly possible.

The "number of users" information is for the same reason. Large multisite instances can have tens of millions of rows in the users table. Schema changes to that table can literally cripple sites and bring them to a screeching halt. Such large updates might need special handling if the users table schema changes (and that has happened before).

So, these checks exist in the API just in case an update ever needs to be created specially for larger instances of WordPress. They were added to the update checks solely for that reason. They're not there specifically for data gathering purposes.

If you wish to filter the data for privacy purposes, then you can do so and it will not affect the update process for small WordPress installations. At present, there is not a secondary update path for large installations, but that does not preclude the possibility of one occurring in the future.

If you wish to filter the data for privacy purposes, then you can do so and it will not affect the update process for small WordPress installations. At present, there is not a secondary update path for large installations, but that does not preclude the possibility of one occurring in the future.

Let me just add here that the pre_http_request filter here does not seem to be entirely sufficient, as any code hooking into it has to fire off a request off its own, meaning a request can only be pre-fired (or pre-emptied) once. So if two plugins are short-circuiting the request like this, a request is effectively fired twice and the entire point is defeated.

@DvanKooten If you're wanting to not send the number of users or blogs back as part of the update request, I would recommend using the pre_site_option_blog_count and pre_site_option_user_count filters to simply return whatever values you want. To make it only apply to the WP update check, I'd use the wp_version_check action hook to attach these filters. Like so:

Essentially this preempts the data sent back, making it send zeros for that data instead. By hooking to wp_version_check action with a priority of 1, your actions connect before the data is retrieved in the wp_version_check function, and won't be connected the rest of the time (like when you're looking at the network dashboard). Since the wp_version_check action is fired via wp-cron, it's not fired in the main web process at all, and thus can't affect anything else.

Multisite instances with hundreds of thousands of sets of tables aren't going to be automatically updating, are they?

They certainly should be, yes. And to be clear, the alternate upgrade path I was referring to would be a case where we prevented update notifications from being sent to sites with large number of tables or users. That's why the info is needed: Because a case might exist where we don't want to send update notifications to those sites, in favor of a more manual upgrade path.

In any case, this is all orthogonal to the main issue, which is user data privacy.

As you like, I was only providing information as to the question of why this data is included in update checks. It's there because it is relevant to whether the API sends update information or not.

Thank you for taking the time to clarify matters for the ticket and at the very least explain the 'why' part.

As already indicated in the sentiments above however, the greater issue of data privacy still exists and being able to explain what data are collected, what for and why is now a little too late to the discussion. Privacy needs to be addressed in full consideration of pertinent laws and users wishes and I'd suggest this is one area where the wishes of the minority need to be heard - this isn't going to go away.

experience shows the information is not needed, gathering info just because it might turn useful but with no idea when, is just not cool

I am not a lawyer but most likely it violates EU privacy laws or directive, if not by letter then by spirit.

It still gives Matt an insight on the business of competitors, which is very not cool. Obviously the conflict of interest has a bigger impact than just this ticket, and should not be the reason to do anything, but it is worth keeping it in mind.

When we're talking about the data being passed it's important to clarify whether it contains personal information or identifiers. Aggregated and de-identified data is not in violation of European laws or directives, although users should still have a right to opt out of it.

GDPR is a fresh opportunity to build in better privacy structures and legal certainty. Although it is a European law, it creates a very healthy baseline for all users (see, for example, yesterday's piece on tracking data which European Uber users have a legal right to see but US users don't.) Everyone needs to be working in implementations for their own businesses and sites in any case ahead of deadline day, in addition to any changes that need to be made in the WP code. Start there by working towards specific requirements for GDPR compliance, rather than being sidetracked by a general discussion of ethics.

Suggest we loop in the A8c legal team on further discussions of this as they'll be needing to appoint a DPO as part of GDPR, and these questions will be part of that person's job.

Essentially this preempts the data sent back, making it send zeros for that data instead. By hooking to wp_version_check action with a priority of 1, your actions connect before the data is retrieved in the wp_version_check function, and won't be connected the rest of the time (like when you're looking at the network dashboard). Since the wp_version_check action is fired via wp-cron, it's not fired in the main web process at all, and thus can't affect anything else.

Interesting piece of code - however I cannot find anywhere in the WordPress core files where do_action( 'wp_version_check' ); gets called. Can you confirm where and when that hook gets fired?

I can see in wp-includes/update.php where the wp_version_check() is added to the action, but if that action never gets called how is it going to work?

I've been having a look at what other providers position is on this situation.

Joomla! have an opt out on first log in it seems. Drupal explicitly cover it is their Privacy policy although I disagree with their assertion that collecting data linked to an IP address is no personally identifiable!

Just sharing my support +1 for fixing this as "opt-in" and a UI setting for non-technical users for the reasons and rationales by @idea15 @mark-k @andreasnrb @NathanAtmoz @Rarst @roberteessels and others.

And politely giving my cents:

For what I could grasp from previous conversations, it's less about technical reasons (*) and more about decisions. So, besides the reasons state by others, i would add that this has a potential to become another #wpdrama really quickly (maybe even beyong WP community) and I don't think the project needs more of drama. Specially if the project really want to gain traction with non-tech savyy users and corporative users. Maybe not because the #16778 issue on it's own, but because of the subject of privacy and the way the project handles it.

The expectations and awareness about user privacy and transparency when this ticket was opened and discussed often has changed a lot. It's almost an cliché to cite #Snowden, #Wikileaks and all the NSA thing and in conversations, but, here I am citing these subjects along side transparency, privacy and data protection laws within EU and other territories (e.g. Brazil, my country), corporative compliance, et cetera.

So, maybe the we should go forward with fixing it while this is "small"?

I think it's appalling that Wordpress has not been open about this and it’s even more worrying that certain people here have been so obstinate about doing the right thing. This sort of devious activity may be expected of certain dubious plugin developers…but Wordpress? Really?

Precisely. Mentioned that ticket on a blog post this week after reading @mor10 post on his blog and checking other tickets like the gravatar one. That's why I believe the issue is goes deeper than this ticket here as mentioned on

"Maybe not because the #16778 issue on it's own, but because of the subject of privacy and the way the project handles it."

The answer is definitely nope. It’s a niche option that could be covered in a potentially super cool plugin.

So in a nutshell, WordPress.org considers sending 'non-identifiable' data without the knowledge nor consent of the end-user as "okay"?

If that is so, then WordPress.org is certainly running a conflicting standard between its own developers and third party developers (a.k.a ​plugin developers):

The plugin may not “phone home” or track users without their informed, explicit, opt-in consent.

In the interest of protecting user privacy, plugins may not contact external servers without the explicit consent of the user via requiring registration with a service or a checkbox within the settings.

...
Documentation on how any user data is collected, and used, should be included in the plugin’s readme, preferably with a clearly stated privacy policy.

The plugin guideline does not state that there's an exception to 'non-identifiable' data, and there certainly ought not be any "It is okay if WordPress.org does it" exception.

It seems like this ticket is getting out of hand and it's impossible to follow the discussion any longer.

Concerns about WordPress sending anonymous data have been raised multiple times now, slowly drifting away from the original request to add an option or a filter.

In the past, we've tried to use Trac as a platform to discuss the "how"-side of things, with debates on principles (the "why") and +1's happening elsewhere, e.g. on make.wordpress.org and Slack.

Recently some people began with some research on opt-in data collection in WordPress, which seems to be what the majority of people commenting here is striving for. See #38418. Why not join forces?

Also, I'd like to quote @chriscct7 here and encourage folks to document current behaviour:

Additionally, the WordPress project maintains an open information section, similar to Wikipedia, where anyone can contribute new documentation or information about the platform, that to a reasonable extent would be useful to other users. As such, you're free to create a page for this. As a volunteer-based project […] the best way to ensure that things get done, is often to do them or spearhead them.

I'm curious why nobody followed up on @TJNowell's suggestion as well:

This strikes me as something that would be trivially fixed by adding a sentence to wp-admin/about.php
Stating what information gets sent to .org and why should only take a short paragraph of text at the bottom of the about page. If we can add an entire page talking about Freedoms I think we can write a short privacy statement.
Here's a suggestion:

Note: WordPress may send statistics to WordPress.org when requesting updates. This is to help plan and improve future updates.

With perhaps a "For more information, click here" that leads to a .org page

It would be awesome if someone could whip up a small proof-of-concept for this. This is open-source software after all, everyone can get involved.

Folks on the ticket mentioned "lack of chiming in" as a reason to dismiss the ticket, so naturally, I feel that's an invitation to comment. ;-)

On the topic of EU privacy regulation raised by @idea15 and others: There's certainly blog identifying info transmitted, but not person identifying info. I'm not a lawyer, nor an expert, nor do I have access to api.wordpress.org's code, but I really don't think this runs afoul of privacy regs in the EU (I'm not even European, so 'grain of salt' on this).

Because of the fact that I can only read the code that sends the data, and not the code that creates the dataset, I have to make some assumptions. And because I can't find any real source for how or where that data is stored, I do have some concerns. I don't see how creating analytics that can determine when a blog becomes abandoned by when it stops pinging (which is how blogs drop off the php support on the stats page)

I assume that the potentially outdated and vulnerable versions are aggregated by blog URL, into a list somewhere. Seems straight-forward enough. What's the worst that can happen to such a list? :-/

Some commenters are expressing concerns about security and privacy, while some are being dismissive and closing the ticket, or otherwise trivializing the concerns. Perhaps a better way is to have someone who has access and knowledge of the system adequately describe the potential hazards that they've considered, and how they deal with those concerns, and what security is in place to safeguard that list of vulnerable sites. Basically, assure everyone that they've privately done what is usually publically done with most other things WordPress.

For the record, while I see a potential concern here, I wouldn't opt-out of the stats. And as I said above, I could be wrong about all of this, since I don't have any access, I don't know.

TL;DR I don't think this is a minor enhancement. I think it's at least somewhat security related, and of at least normal severity.

On the topic of EU privacy regulation raised by @idea15 and others: There's certainly blog identifying info transmitted, but not person identifying info. I'm not a lawyer, nor an expert, nor do I have access to api.wordpress.org's code, but I really don't think this runs afoul of privacy regs in the EU (I'm not even European, so 'grain of salt' on this).

Internet URL can be anything, including but not limited to your name! That the flips this argument completely because all of a sudden the data transmitted is linked to a person.

Equally, the reassurance that the data are only held for 48 hours (see above) is fine but your blog stats are transmitted back to the WordPress API every 12 hours so it's a continuous record unless WordPress is deleted from the server or it can be turned off or filtered as suggested in this ticket.

Under the current directive and the 2018 regulation, personal data includes information about a device that an individual uses such as a IP address, MAC address, browser fingerprint, etc. (This is why analytics are constantly in regulators' sights.) You do not even have to have the individual's name, address, billing information, etc to constitute a personal data record - an IP address attached to a dataset is personal data.

What I would like to know is - is whatever data is being collected and passed being pseudonymised. Pseudonymised data (information separated from personal identifiers which could be put back together as required) is a special category. It has less stringent requirements and ticks the PBD box required under GDPR. It is a good place to start where this issue is concerned.

I would also highly recommend you take a look at Recitals 19, 20 and 21 (pages 16-17) of the draft ePrivacy Directive refresh, which dovetails with GDPR. ​http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=41241 It underlies the need to be absolutely clear about what data is essential for technical purposes (e.g. version and security updates) and what data is for non-essential purposes, e.g. telemetry.

Will be at the WCLDN contributor day on Friday if anyone wants to dive into this.

Can someone from the core team tell me why exactly adding a simple filter gets pushback for years? It's backwards compatible, it doesn't affect people who don't care about it, and it would make people who do care about it happier.

I understand the need to do a different upgrade path for sites that are too large, but as it was indicated earlier, that information has never been used. Essentially this is preoptimisation, which is not something you should do in software do begin with, certainly not in a project that powers 1/4 of the internet.

Can also members of the core dev team tell us how many features that are in WordPress core today that started with them being a plugin? Why were those brought in? Why weren't the argument that "oh, there's a plugin that does already that" used then?

Then there are the examples of other projects dealing with the issue differently (better):

Not only that, Ghost offers the ability to turn off updates / gravatars / google fonts, etc, because each and every one of them are leaking personally identifiable information (no, I'm not interested in debating how that information is personally identifiable, that's been established in other tickets / in blog posts, etc).

Concerns of business users have been mentioned in that thread, but I do non-profit work for charities, and they will be concerned about even pseudonymised data being sent to a 3rd party, for whatever reason.

A filter - with no side effects to the rest of WP core - is a good initial solution. This is going to be an issue when GDPR arrives next year, and should go hand-in-hand with a solid WP privacy explanation and opt-in/out on install and upgrade.

I would keep in mind that WP still sends this data, adding a filter documented in a dev handbook doesn't indicate to an end user that their data is being sent elsewhere, nor would this stand up against regulators.

Considering GDPR is incoming in europe, we need to explicitly state what information is collected, why, what it's used for, who it's shared with, how long it's retained for. What's more we need to actively gain opt in consent to do so in an unambiguous, straight forward way using plain language anybody can understand. Telling users there's a filter, or a plugin that they can use to opt out isn't enough

I understand the reasons this data was collected, and why it was done as a developer. From a technical standpoint it makes good sense to do so. The problem here is that of privacy, and more pressingly, legality and compliance. At the moment, this issue is low hanging fruit for any regulator who wants to shut down or hurt a site running WP in the EU once legislation comes into effect in May