First pass, how does that look?
This adds a stable client id to the telemetry payload. The FHR client id is used, if FHR is enabled.
To keep this (and upcoming later state) around between sessions, i introduced a TelemetryState module, which persists state in a JSON file in $profdir/telemetry/state.json.
If we keep it that way, we could move the saved-pings directory into that dir as well later.
I noticed that logging is mostly absent from Telemetry, is there a specific reason?
I added logging here because proper logging facilities turned out to be very helpful with the telemetry experiments feature.
I ran into test-setup issues with the fhr/datareporting part, so left this part out for this pass.

Per talks in #fhr with gps and rnewman, this doesnt work properly in regards to early initialization synchronization and we should do it differently.
We want to move things to a common datareporting client id now, that properly imports the client id from the FHR state.json, if any.

Comment on attachment 8501865[details][diff][review]
Migrate FHR client id to the datareporting service
Review of attachment 8501865[details][diff][review]:
-----------------------------------------------------------------
Aside from some robustness issues, this is mostly good.
I am a bit concerned about the drop in test coverage. We should add unit tests for client ID handling in DRS.
::: services/common/utils.js
@@ +398,5 @@
> * @return a promise, as produced by OS.File.writeAtomic.
> */
> writeJSON: function(contents, path) {
> + let data = JSON.stringify(contents);
> + return OS.File.writeAtomic(path, data, {encoding: "utf-8", tmpPath: path + ".tmp"});
Oooh - when did support for this land?
::: services/datareporting/DataReportingService.js
@@ +304,5 @@
> + if (this._loadClientIdTask) {
> + return this._loadClientIdTask;
> + }
> +
> + // Try to load the client id from the DRS state file
You should add some comments here that explain the history of the state files and client IDs. Pretend rnewman, you, and me all get hit by a bus. The pour soul left with this doesn't want to be haunted by our ghosts.
@@ +318,5 @@
> + // If we dont have DRS state yet, try to import from the FHR state.
> + try {
> + let fhrStatePath = OS.Path.join(OS.Constants.Path.profileDir, "healthreport", "state.json");
> + let state = yield CommonUtils.readJSON(fhrStatePath);
> + this._clientID = state.clientID;
If the state file is empty, state.clientID will be undefined. Please add code and a test for this.
You also dropped a branch from FHR that checks the type is a string. Please also port that. (Normally I'm not this paranoid, but you know the pain that's been caused by orphans.)
@@ +324,5 @@
> + this._saveClientID();
> + return this._clientID;
> + } catch (e) {
> + // fall through to next option
> + }
There's an interesting failure case here. But I'm not sure how realistic it is and whether we should pay any attention. But given the implications on orphan creation and the immense pain that's caused us, it's tempting to be safe rather than sorry.
Scenario:
1) Client is running FHR as written today and has client ID in FHR's state.json file.
2) Client upgrades to this code. Client ID is copied to DRS state.json.
3) For whatever reason we generate a new client ID. It gets stored in DRS state.
4) DRS state.json somehow gets corrupted. We throw an exception trying to read it.
5) We fall back to reading the client ID from FHR's state.json.
6) We start using the old FHR client ID and our metrics get screwy.
Now, we can't simply delete the client ID from the FHR state because if Firefox downgrades, a new client ID will be generated. (This is probably rare, but it is a source of orphans.)
We shouldn't fall back to using the FHR client ID if we've ever created a new ID in DRS because that will look like a client came back from the dead.
The only bulletproof solution is to write the new client ID to both DRS and FHR state.json when DRS generates a new client ID. Oy.
Of course, if DRS's state.json becomes corrupted, we've created an orphan. So, uh, does that mean we don't need to worry about this scenario?
Making this even more complicated, if you fall back to reading FHR state.json if DRS is corrupted and the client ID hasn't changed, you are in a *better* position than if you didn't fall back because you'll be using the proper ID instead of generating a new one.
When in doubt, measure. I'd really like to see a counter in FHR reporting on how many times the DRS state.json is present but corrupted. If data says it is a problem, we can address it.
What to do? If we don't plan on regenerating client IDs, fall back to FHR. If we do, consider touching a file whose presence indicates "client ID migrated to DRS *and* regenerated" and don't fall back to reading FHR state.json if that file is present.
These are hard problems.
@@ +330,5 @@
> + // We dont have an id from FHR yet, generate a new ID.
> + this._clientID = CommonUtils.generateUUID();
> + this._loadClientIdTask = null;
> + this._saveClientID();
> + return this._clientID;
This pattern of setting the load task to null *then* saving the file asynchronously and then returning is racey. A subsequent caller can get a new task that generates a new ID before the first caller has finished saving the file. Since a few components will access this client ID during startup, I'd say this race is more than theoretical, especially on machines with slow I/O.
Let's add a yield for _saveClientID() and let's not set _loadClientIdTask to null until immediately before the return. Consider putting this in a closure since the code is repeated and prone to one-off bugs.
@@ +339,5 @@
> + yield OS.File.makeDir(this._stateDir);
> + yield CommonUtils.writeJSON(obj, this._stateFilePath);
> + }),
> +
> + get clientID() {
Document this!
I find it a bit weird this is a property and not a function. I think anything involving sufficient magic should be a function.
::: services/healthreport/healthreporter.jsm
@@ +131,5 @@
> yield OS.File.makeDir(this._stateDir);
>
> + let drs = Cc["@mozilla.org/datareporting/service;1"]
> + .getService(Ci.nsISupports)
> + .wrappedJSObject;
It might be time to create a proper IDL interface for DRS.
@@ -1527,5 @@
> - // This is done for privacy reasons. Why preserve the ID if we
> - // don't need to?
> - if (!this.haveRemoteData()) {
> - yield this._state.resetClientID();
> - }
Please get bsmedberg's sign-off on nuking this behavior.

(In reply to Gregory Szorc [:gps] from comment #17)
> @@ +324,5 @@
> > + this._saveClientID();
> > + return this._clientID;
> > + } catch (e) {
> > + // fall through to next option
> > + }
>
> There's an interesting failure case here. But I'm not sure how realistic it
> is and whether we should pay any attention. But given the implications on
> orphan creation and the immense pain that's caused us, it's tempting to be
> safe rather than sorry.
>
> Scenario:
>
> 1) Client is running FHR as written today and has client ID in FHR's
> state.json file.
> 2) Client upgrades to this code. Client ID is copied to DRS state.json.
> 3) For whatever reason we generate a new client ID. It gets stored in DRS
> state.
> 4) DRS state.json somehow gets corrupted. We throw an exception trying to
> read it.
> 5) We fall back to reading the client ID from FHR's state.json.
> 6) We start using the old FHR client ID and our metrics get screwy.
Let's do metrics for state.json corruption in a follow-up?
I'd rather not build fallback solutions on a system part that we want to move away from.
> @@ +330,5 @@
> > + // We dont have an id from FHR yet, generate a new ID.
> > + this._clientID = CommonUtils.generateUUID();
> > + this._loadClientIdTask = null;
> > + this._saveClientID();
> > + return this._clientID;
>
> This pattern of setting the load task to null *then* saving the file
> asynchronously and then returning is racey. A subsequent caller can get a
> new task that generates a new ID before the first caller has finished saving
> the file. Since a few components will access this client ID during startup,
> I'd say this race is more than theoretical, especially on machines with slow
> I/O.
>
> Let's add a yield for _saveClientID() and let's not set _loadClientIdTask to
> null until immediately before the return. Consider putting this in a closure
> since the code is repeated and prone to one-off bugs.
There are no repeated calls to _loadClientID(), so i don't see a race here.
State is loaded once, changing the ID is not supported here. I'd rather not block callers waiting on _save() as well if there is no urgent reason to do so.
I only see a potential issue with tests having to wait for _saveClientID(), which i may have to account for.
> ::: services/healthreport/healthreporter.jsm
> @@ +131,5 @@
> > yield OS.File.makeDir(this._stateDir);
> >
> > + let drs = Cc["@mozilla.org/datareporting/service;1"]
> > + .getService(Ci.nsISupports)
> > + .wrappedJSObject;
>
> It might be time to create a proper IDL interface for DRS.
Now or follow-up?

(In reply to Gregory Szorc [:gps] from comment #17)
> @@ -1527,5 @@
> > - // This is done for privacy reasons. Why preserve the ID if we
> > - // don't need to?
> > - if (!this.haveRemoteData()) {
> > - yield this._state.resetClientID();
> > - }
>
> Please get bsmedberg's sign-off on nuking this behavior.
Context: Previously we would reset the client ID here after we deleted all of FHRs remote data.
Now we use the stable client id for more than FHR, so we can't just nuke it there.
I propose that we take care of clearing it later, after we made FHR & Telemetry activation interdependent.
Acceptable?

Long-term, the behavior we want is:
* turning off FHR data collection should delete all the pings associated with the user ID (telemetry and FHR). The user ID should be removed from the client. Currently telemetry doesn't have an endpoint for this, so part of the breakdown for this should include filing both client and server-side bugs to make this possible.
* If/when FHR is re-enabled, we generate a new ID.
Short-term:
* when FHR is disabled we should delete the FHR records and remove the client ID.
* since it's currently possible to have FHR off and telemetry on, in that case telemetry should not send any client ID. In the future this won't be possible (bug 1075055)

(In reply to Georg Fritzsche [:gfritzsche] from comment #27)
> Created attachment 8504769[details][diff][review]
> Migrate FHR client id to the datareporting service
Note that the changes in utils.jsm and policy.jsm are needed to address comment 25, i.e. waiting on the whole "delete remote data operation" and not just on the server requests etc. being finished.

(In reply to Georg Fritzsche [:gfritzsche] from comment #25)
> So, currently there is no good way to wait on this in testing (and the tests
> only worked fine by accident before).
> Not quite sure yet how to fix that.
Yikes. That's ugly.
So, sometimes you need to introduce ugly hacks to facilitate testing. This might be one of those times.
The correct approach is to move the "initiate FHR upload" APIs into DRS. healthreport.jsm is kinda/sorta laid out in such a way to facilitate this. There is an inline comment about that somewhere...

(In reply to Georg Fritzsche [:gfritzsche] from comment #47)
> As of the patches here we have two places - it is required in
> test_TelemetryPing.js too, which tests the client id.
Ah, very good.
> I'm not sure if pre-emptively adding them to more tests is useful right now.
Let's leave well enough alone, then.

Comment on attachment 8515306[details][diff][review]
Beta rebase - Add datareporting client id to Telemetry ping
In future, please try to put beta approvals on the rebased patches in order to make it clear which patches should be uplifted to which branches. (I took care of putting the approvals on the right patches this time.)

(In reply to Gervase Markham [:gerv] from comment #65)
> Can someone point me at the privacy review or analysis which was done for
> this change?
There were some privacy discussions and privacy policy changes etc.
I can't find the relevant one off-hand, Benjamin, can you?

(In reply to Benjamin Smedberg [:bsmedberg] from comment #67)
> The first public note about this was
> https://mail.mozilla.org/pipermail/fhr-dev/2014-March/000186.html
> https://mail.mozilla.org/pipermail/fhr-dev/2014-October/000341.html has a
> link to the current plan.
Forgive me if I've missed it, but neither of these emails nor the linked document indicate that we were planning to change the released build of Firefox to send back a persistent per-user identifier, something we explicitly disclaimed doing when Telemetry initially shipped IIRC.
(If that's not quite what's happened, please do clarify - which channels send this identifier, how persistent is it, and how does it depend on the telemetry and FHR prefs?)
Gerv

Yes:
* FHR upload is enabled by default on all channels, but no upload happens until we should the data notification
* Telemetry is off by default on release, on by default on pre-release channels
The client id is only submitted if both of these are on, which matches previous FHR behavior.
Given that we are moving the FHR data into Telemetry, we will end up submitting data that was previously in FHR via Telemetry in different "buckets" (pre-release, release and some different pings for data that should be separate from the ping... that is detailed in the referenced document) to end up with matching behavior re privacy.