yianni: First topic — what term
should we use to describe "unlinkable" / "deidentified" /
etc.?

… Concern is that there's always a chance of
reidentification since always a chance.

<johnsimpson> someone typing
near telephone should mute

<BerinSzoka> unlinkable =
unfair & deceptive trade practice! ;)

<BerinSzoka> much like "Do
Not Track"...

<BerinSzoka> link again,
please?

robsherman: Deidentified.

<BerinSzoka> how about
"Anonymish?"

<BerinSzoka> or
"Pseudononymish?"

<BerinSzoka> Or...
Deidentifish?

What's the difference between "deidentified"
and "anonymous"?

<aleecia_> perfect for all
your phishing needs

… "Deidentified" covers both unlinkable and
what could be relinkable

yianni: Anyone on call have a
view on this?

<dan_auerbach> I don't feel
strongly either way

<dan_auerbach> but the
substance matters more than the word we use

<aleecia_> If we've started,
note on the phone I cannot hear well at all

<aleecia_> And we appear not
to have a scribe

<dan_auerbach> i'm in your
situation, aleecia_

Thomas: If deidentified is closer
to anonymous data, then deidentified is less able to explain
"unlinked." If we use "unidentified," then we are closer to
both ideas - unlinkable but also acknowledge the possibility of
reidentification.

Rachel_Thomas: Are we getting too
far into semantics?

<BerinSzoka> again, could you
please send the link again to the questions? some of us joined
the IRC after it was shared

Thomas: We need to decide how far
we are able to go and come up with a way to describe it.

<aleecia_> If someone in the
room could please scribe, those of us on the phone might be
able to keep up

<BerinSzoka> AMEN, Rachel

<aleecia_> thank you, Rob

Rachel_Thomas: Deidentified
implies that you have taken reasonable steps to unlink;
unlinkable is an impossibility.

<aleecia_> note that
unlinkable means something specific and different in the EU

yianni: robsherman made this
point - we've done this in the HIPAA context and elsewhere.
What we say in the text is reasonableness.

<BerinSzoka> could whoever's
typing move away from the mic? it's loud

Thomas: Explain that there's a
small gray area that's not completely anonymous.

Yianni: "Reasonable
deidentification" conveys that there's always the chance of
reidentification, but has to be a low chance. Are you okay with
that?

Thomas: Yes.

<aleecia_> "deidentification"
to me suggests after a process.

Dan_Auerbach: Can we all agree
that whatever word we use shouldn't inform the details of the
process or standard we agree upon?

Rachel_Thomas: What do you
mean?

Dan_Auerbach: Don't feel strongly
about terminology, but I do care about substance.

Yianni: Agree. Just because we
use the word deidentified or unlinkable.

Rachel_Thomas: But they're
different concepts - we're trying to decide whether working
toward deid or unlinkability.

robsherman: I think consensus is
that we're not going for theoretical unlinkability but for
reasonable delinking.

<aleecia_> data is not
“reasonably linkable” to the extent that a company: (1) takes
reasonable measures to ensure that the data is de-identified;
(2) publicly commits not to try to reidentify the data; and (3)
contractually prohibits downstream recipients from trying to
re-identify the data. Commission's definition of
"de-identified": "First, the company must take reasonable
measures to ensure that the data is de-identified. This means
that the company must [CUT]

dan_auerbach: FTC language is a
good starting point. Makes sense and not too prescriptive. Also
favor adding to it the idea that there should be privacy
penetration testing.

<aleecia_> cut off, once
more: "First, the company must take reasonable measures to
ensure that the data is de-identified. This means that the
company must achieve a reasonable level of justified confidence
that the data cannot reasonably be used to infer information
about, or otherwise be linked to, a particular consumer,
computer, or other device."

… Having people try to reidentify data. Has to
be some sort of normative way to distinguish companies that
just try to hash IDs and companies that make a more serious
effort to it.

rachel_thomas: Sounds like you're
onboard with the idea as long as we come up with some clarity
around what deidentification is?

Dan_Auerbach: Yes.

?: Should we be requiring people to publicly
commit not to reidentify?

yianni: I think that's what we
are trying to do here - get people to publicly commit to follow
a standard.

bryan: Privacy policy.

robsherman: Much cleaner to not
have lots of individual "public commitments," as FTC would have
under Section 5. Let's just agree on what's required — for
example, not reidentifying data — and treat server response as
the public commitment.

yianni: Is it better to define
reasonableness here? Have examples?

Rachel_Thomas: There's already a
legal standard for "reasonableness."

Dan_Auerbach: Helpful to have
examples.

<rachel_thomas> For reference
(since we're also looking at the FTC language) here is the DAA
definition of De-Identification Process: Data has been
De-Identified when an entity has taken reasonable steps to
ensure that the data cannot reasonably be re-associated or
connected to an individual or connected to or be associated
with a particular computer or device. An entity should take
reasonable steps to protect the non-identifiable nature of data
if it is dist[CUT]

<rachel_thomas> (cont)
Affiliates and obtain satisfactory written assurance that such
entities will not attempt to reconstruct the data in a way such
that an individual may be re-identified and will use or
disclose the de-identified data only for uses as specified by
the entity. An entity should also take reasonable steps to
ensure that any non-Affiliate that receives deidentified data
will itself ensure that any further non-Affiliate entities to
which such dat[CUT]

bryan: Regarding examples, that's
something that advocacy groups or public sites will do as far
as creating best practices. Maybe something that could be
documented through W3C community group process,
webplatform.org, etc. Agree with Rachel that we should limit
non-normative language within spec itself.

<rachel_thomas> i don't feel
strongly about not having examples (just depends on what they
are)

<aleecia_> And we likely do
NOT want to hard code what is required

… We do have specific examples, and it would
be tremendously helpful to have that language included in a
non-normative way.

<aleecia_> +1

<dan_auerbach> +1

robsherman: Agree w/ aleecia that
hard-coding technology is really hard to implement and will
break the standard 5 years from now.

<aleecia_> Could put them
into an appendix if they clutter up the text

yianni: What examples do people
have as "clearly good enough" or "clearly not good enough"?

<aleecia_> Not good enough:
removing names, removing unique ids

Sam: Looking at Ed's examples, he
talked about admin/procedural controls that leaves database or
is reported outside of controlling entity. Focusing on that
would be interesting. Looking at 3rd parties, how they
anonymize and report out, that's really what we're talking
about in terms of protecting privacy.

… Lots of technologies to do that. Will be
different for every entity. So I'd like to focus on what are
the controls that keep identifiable info from leaking out.

<Yianni> ack:
dan-auerbach

… No specific examples.

<dan_auerbach> whoops seems
i've dropped off the call

<dan_auerbach> will try back
in a moment

rachel_thomas: In IRC, put in DAA
language on deidentification. We don't have specific examples,
but text is helpful in terms of explaining what's meant by
de-identification.

<aleecia_> (The DAA text
isn't bad)

unmute aleecia_

<dan_auerbach> agree that DAA
text is OK, though FTC seems a little better to me

<aleecia_> Ah, sorry. Rachel
will be better taking that than I would anyway, but I note it's
similar to FTC's in large measure. Same direction.

… Any comments on Ed's presentation about
hashing or k-anonymity not being a good method?

Sam: I'm a big opponent of
hashing. It's great but has its limits. If you're using it to
anonymize or other deidentify, some day it will be broken.

… Have to come up with better ways of doing
this long-term.

rachel_thomas: We don't need to
identify specific standards.

<aleecia_> My hope is that
DAA's members won't have to change much, though from Shane's
questions, presumably Yahoo! would

<dan_auerbach> (i'm unable to
rejoin the call -- it seems to be on W3C's side? -- so will
follow via scribe

<aleecia_> (Dan, me too)

<bryan> robsherman: talking
about specific tech is a mistake, also demonizing hashing,
there are good uses in the de-id context

<vincent> it strongly depends
of how often you change the seed you use to hash

<aleecia_> Vincent - and the
richness of data collected

Sam: Good examples of how hashing
works, but it's in the context of a specific data set. When you
become able to correlate, things break down, so hard to say
whether a particular technique will work.

<dan_auerbach> here's an
example of something which is NOT good enough: a wide table
keyed with pseudonyms (say, hashes of cookies), but which also
has timestamps, urls, etc

bryan: This is a point in time
where anything we write on technology today will be superseded
tomorrow. Let's establish a reasonable expectation, let
technologists figure out what is reasonable today, and not put
it in the spec.

johnsimpson: If we know what
works, it should be cited in non-normative language as an
example.

… Also, in normative language, we should
require that there be transparency about method you use to
hash.

<aleecia_> can we actually
just nail this down (for the group) right now?

Thomas: From my presentation
today, there's a split between anonymous and pseudonymous.
Anonymous is nearly absolutely impossible, and with
pseudonymous someone has the key to reidentify.

yianni: [summarizes Dan's comment
from IRC].

<dan_auerbach> regarding
pseudonyms, it's dangerous to think of data that way because it
leads to the false impression that only certain fields are the
"identifiers", whether they be real names or pseudonyms

SamS: Comes down to a contract
that says, if I get data, I won't try to reidentify it. Even if
it's pseudonymous or a weak version of anonymity, as long as I
protect data, don't leak it, and don't reidentify, it's
equivalently the same thing. But the burden is on me to be sure
I have the right admin controls to be sure all of this happens.
Ultimately that's where we are going.

<dan_auerbach> in fact, each
field will have some bits of identifying information

<vincent> where are we
standing on "hashing" do we agree that it's not enough or it's
the opposite?

yianni: You think contractual
language does a lot?

SamS: Yes.

bryan: Want to be sure we're
still talking about a third party.

<vincent> I think there is a
misunderstanding about pseudonym definition

<aleecia_> right now i cannot
imagine a straight up pseudonym helping

<dan_auerbach> i want strong
guarantees on the data itself, not just contracts

<vincent> agree with
robsherman

<aleecia_> replacing one GUID
with another does not help

<vincent> for instance
cookieID is a pseudonym

<rachel_thomas> From Thomas'
presentation this morning: “Pseudonymising” shall mean
replacing the data subject’s name and other identifying
features with another identifier in order to make it impossible
or extremely difficult to identify the data subject.

robsherman: (Clarifies use of
pseudonyms in third party context)

<dan_auerbach> aleecia_:
especially when that guid is linked with tons of other
identifying information

<vincent> IP address+ User
Agent could be a pseudonym

<aleecia_> yes

<rachel_thomas> Pseudonymous
info is...Unique identifier does not identify a specific
person, but could be associated with an individual. Includes:
Unique identifiers, biometric information, usage profiles not
tied to a known individual. Until associated with an
individual, data cannot be treated as anonymous.

<vincent> imho, hasing a
cookie ID does not bring a lot of garantee (if any)...

<aleecia_> +1 vincent

<dan_auerbach> getting into
the mentality that certain fields are the "identifying" ones
(e.g. a cookie, hash of ip+ua) is a mistake

David_Stark: Thought the
conversation this morning was excellent. At my market research
co, when people who participate in a survey, we assign them a
pseudonymous identifier that allows for quality control.
Without this, you'd have know control. Just anyone could come
in and respond many times - fundamentally undermining data
quality.

<dan_auerbach> if i visit the
domain, webmail.danauerbach.org, that field becomes
identifying

<BerinSzoka> whoever was
typing but just stopped--was REALLY loud

… Only have identifiers of panel members and
their numbers. Researchers have access to survey responses. But
nobody in our company has access to individuals and their
responses.

yianni: You're bringing up the
point of administrative controls, which Sam also mentioned.

<aleecia_> as a suggestion:
you could have a marker in a cookie of the number of times
people have taken a survey, rather than the unique id on the
person

… If I can get at data in a way that bypasses
proper controls, I can maybe reidentify.

bryan: It has strong value. It's
something we do for a variety of regulatory requirements
already. Tons of data and admin controls are only way to meet
those requirements.

<dan_auerbach> to clearly
state my opposition to this, administrative controls are NOT
enough

<dan_auerbach> we need real
de-identification in the data itself

MikeZ: Some people have this in
place. Small companies. Which is why you have a sliding scale
in FTC standard. We're solving for the web, not just a handful
of big companies.

yianni: That's the problem with
having specific examples in text.

bryan: Same goes for
sophistication of tech approaches that we mandate.

<vincent> aleecia_,
dan_auerbach have you retried? call quality is fine for me

<johnsimpson> quality ok for
me, too

<dan_auerbach> vincent, i've
retried several times but am just unable to connect, but will
do so one more time...

<aleecia_> did get back in.
is somewhat better

<aleecia_> still
scribe-dependent but that's ok

robsherman: Concept of
reasonableness encompasses admin, tech, and physical practices,
and the specific steps might vary based on circumstances.

<dan_auerbach> vincent, now
connected but call quality is quite bad

yianni: Would having specific
examples of what's appropriate be helpful in bridging the gap
between John Simpson's demand for transparency and the need to
keep it security.

<johnsimpson> +q

rachel_thomas: Security
information is kept secret because disclosing it gives hackers
a way in.

… Let's protect the information in the most
effective way.

<aleecia_> getting enough of
this - depending on "we don't tell people about our security
measures" does not offer real protection

bryan: Also opens the door to
social engineering around company practices.

johnsimpson: Not calling for
"putting the secret sauce of everything you do out there."

<dan_auerbach> i do think we
need transparency here, +1 to johnsimpson

… What I think needs to be explained publicly
is the category of measures that are taken. Not enough to say
"reasonable".

rachel_thomas: What would be
reasonable in your opinion?

<aleecia_> if we had a high,
specific standard transparency wouldn't be necessary, but that
would break future-proofing

johnsimpson: Difference between
security and fraud, where getting specific could tip your hand.
But you could say, if you believe that hashing is reasonable,
you can say you rely on hashing.

… You can describe techniques that would
provide meaningful insight without giving away the store.

rachel_thomas: But you've just
narrowed the world in terms of what a hacker needs to think
about.

<dan_auerbach> +1 to
aleecia

… You're narrowing a bunch of other things a
hacker will need to think about to get into your system.

<vincent> rachel_thomas, I'd
argue with taht

<SamS> +q

<aleecia_> anyone who can
break your hash will be able to identify that's what you did
without disclosure

vincent: Transparency doesn't
provide a solution for hackers to break in. I don't see how it
would allow someone to break into your system.

<aleecia_> how useful is this
detail to anyone making privacy choices? likely not very. so
while i disagree with Rachel on this one, i'm also not strongly
pounding the table to support John

SamS: Let's say I published a
privacy policy on my website that showed all of the encryption
techniques that I use to anonymize data. How does that actually
protect the data that I have?

… Will consumers read it and say, "They use a
lot of encryption? That's really good."

yianni: So you're saying there's
no information that is worth releasing.

SamS: Saying that you use
reasonable technical measures to protect the data - which is
what the FTC requires - then that's reasoable. And if I have a
breach, people will hold me accountable.

… But using Encryption v.1 vs v.2 isn't going
to make a difference.

<rachel_thomas> agree with
aleecia!

aleecia: I think you're not
giving away the store to announce which of the few available
measures you might be taken. In terms of whether end users will
understand the difference, no. If it were in a privacy policy,
no.

<rachel_thomas> consumers
won't know what the techniques mean anyway.

… But what this could be helpful for is to
require companies themselves to write down what they do and
review it periodically to make sure it's what they do.

… There's no state secret or competitive
advantage.

<rachel_thomas> internal
policies are different from public policies - no one is saying
that companies can't review their internal policies to ensure
that they're keeping their practices up to date.

… Most value of privacy policies these days is
for internal analysis.

<aleecia_> smaller cos will
not

rachel_thomas: Agree with aleecia
that consumers won't understand what they are reading in
privacy policies. I'd also note that there are endless internal
policies for companies that describe in much greater detail
what you do. So I don't think that leaving it out of privacy
policy omits the process of internal analysis.

<aleecia_> major cos will

<dan_auerbach> rachel,
"privacy and data security" is too broad to cover what we're
after, which is ensuring data is de-identified

vincent: Maybe a few users might
like to know this, and this data might be useful. For example,
you might want to know if data you deleted can still be linked
back to you.

<aleecia_> having "if you
share de-id'ed data, state what you do to ensure it's actually
de-id'ed" is a small ask

<vincent> disclosing an
alogirhtm without data won't help hackers to break in, just
help the community to assess the security of your
algorithms

robsherman: We need to be clear
about what problem we're trying to solve. User transparency?
Challengeability? Forcing internal analysis? I tend to think
over time it's bad to disclose security practices because it
creates a vulnerability.

<SamS> +q

<aleecia_> so, i'd say we
don't have agreement here, but could have agreement on adopting
the FTC text and adding examples

… Beyond that, the more transparent you are —
sufficient to allow people to test vulnerabilities — you
actually become more vulnerable.

<dan_auerbach> i'd like to
voice my strong objection to conflating "security" with the
specific area of de-identification

<aleecia_> can we do that and
leave transparency for the full group? if we're split here,
odds are we're split with more people too

<aleecia_> not sure agreeing
here has much value

<dan_auerbach> they are very
different things

vincent: Google talks about how
they anonymize their server logs. Doesn't create
vulnerabilities but does help to evaluate.

SamS: We're not going to come to
agreement to get to minimum standards here.

yianni: Aleecia made a good
point. FTC language is a reasonable starting point. Examples
could be helpful if they don't limit what companies can do or
limit tech advancement.

… When I first read existing "unlinkable"
language, I was confused about why it was here.

… Specifically, why do we say "commercially
reasonable steps"?

<dan_auerbach> apologies all,
i have to take off

… Should it be in the document? From a legal
standpoint, "commercially" doesn't do much.

<aleecia_> i'd drop
"commercially"

<johnsimpson> should just be
reasonable

bryan: "Reasonable" is
enough.

<aleecia_> and agree
reasonable is sufficient

yianni: Anyone disagree?

<rachel_thomas> should be
reasonable. no need for commercial.

[General agreement that we should cut
"commercially."]

yianni: Also, "high
probability"?

rachel_thomas: What is the
context in which we're looking at this language? I think these
breakouts are designed to take us past what's already in the
text. We've been focused on de-id, so it's odd to be looking at
a definition of "unlinkable."

yianni: We had a statement
earlier from Dan that he doesn't care what it's called as long
as substantively that doesn't affect it.

<aleecia_> Let's please
please please use a different term other than unlink

… Trying to figure out why we're saying what
we're saying in the text.

… If it's substantively accurate to call it
"de-id," we should call it that.

<aleecia_> We've agreed to
change that, but not what to -- if deid'ed works for everyone,
let's run with that

rachel_thomas: We had two
definitions that we looked at earlier — FTC and DAA — that we
agreed were good. We shouldn't go back to draft. Peter made
clear that editing text from the draft should be done in the
full group.

yianni: Depends on what we decide
to call it.

… So, with the text we have in the draft, we
decided commercially should be cut. Should we change "high
probability"?

<vincent> we could remove
"high probability" as well

<aleecia_> None of these
things are absolutes. High probability is about as good as I
think we can reasonably get (no pun intended. This time.)

MikeZaneis: "Unlinkable" vs
"de-id." I think "unlinkable" presents a unique challenge as we
move forward when we talk about permitted uses, etc.

… We may as a group decide after x time, you
should de-identify data.

<johnsimpson> seems to me
"high probability" is necessary

… We identified reasons why companies should
need to go back to re-identify.

… I know this is really about what the process
for de-id might be, but if we're talking about unlinkable.

… Connotation of "unlinkable" is a more
permanent break that would limit this group's long-term
success.

<aleecia_> If companies *can*
re-identify globally, they have not actually de-id'ed in the
first place

yianni: Agree, especially if what
we think of as unlinkable it wouldn't make sense to use that as
the word.

<aleecia_> To beat a dead
horse: unlinkable means something specific and different in
Europe. We should use a different term.

<aleecia_> As another
example, if you can append data to a record based on new data
you've collected, you don't have deidentified data

robsherman: I think we shouldn't
work off of old "unlinkable" definition if we've decided to
move away from it. We've identified DAA / FTC language as
reasonable baseline, so let's start there.

yianni: Do we think it's
reasonable to start with FTC language? DAA? They're very
similar.

<bryan> +1 to avoid
word-smithing the text here, and use DAA / FTC as base

johnsimpson: Earlier discussion
was about using de-id data, but the way you get to de-id is to
make it unlinkable.

<aleecia_> We appear to all
be agreeing

<aleecia_> So close

… It seems like the definition of unlinkable
is an excellent articulation of how you get to
de-identification.

yianni: Personally, looking at
text as proposed, it's similar to FTC with exception of "high
probability." Seems to be similar.

… For Option 1. Option 2 seems very
different.

<aleecia_> I'd be happy to
add "high prob" to FTC text, toss in a few examples, and go
home.

<BerinSzoka> +1 to Rob

<aleecia_> Ok: that persuades
me.

<aleecia_> (cannot understand
current speaker, did mostly get Rob)

<aleecia_> Agree that there
is utility to adopting FTC text unchanged.

robsherman: Worry about tweaking
FTC definition. FTC will develop body of caselaw around de-id,
and that will give guidance to industry. If we add "high prob"
to make people here happy, then it introduces uncertainty about
how things diverge.

SamS: Agrees.

<aleecia_> Adding examples
sounds reasonable, since I'm not sure implementers will get
it

MikeZaneis: Hesitate to add
adjectives where we don't know exactly what they do. It seems
like we're having a discussion in this group about whether
there's any reason or possibility for re-identification. Is the
goal to completely break even the possibility for
re-identification?

yianni: Everyone here understands
that when you de-identify data, there's a chance of
reidentification.

… If you got to 100% certainty, the data would
become almost useless

… So we're trying to come up with language we
could move forward with. If we just adopted FTC language
without modification, would people be okay with it?

<aleecia_> Q was do we adopt
FTC text? Yes, with examples.

MikeZaneis: I don't think
anyone's ready to sign off, but it's a good starting point for
the discussion.

<aleecia_> And agree with Rob
that not changing it has value

Thomas: Agrees with johnsimpson
that there is legal opinion, and we need to think about whether
it is globally adoptable.

johnsimpson: I thought that the
proposed bare-bones language was the result of many long
discussions that took a lot of things into account, and we seem
to be throwing those out the window.

… I'm also puzzled about the implications of
going to a specific US agency's language.

… This needs to be thought through in terms of
a global standard.

yianni: Thomas, do you think FTC
language won't coincide with European standards?

Thomas: We need to think about
it, and also consider ongoing discussions about EU
Directive.

… We have the chance to create a level playing
field, and compare legal assessment of various
jurisdictions.

<BerinSzoka> given the
historical lack of meaningful enforcement in Europe, and the
very aggressive enforcement actions taken by this FTC, does
anyone really think it won't be the FTC that takes the lead on
the hard work of defining what level of deidentification is
reasonable?

<aleecia_> if it turns out
there are problems between EU and US, that's new information
and we reopen

… Also need to consider other
jurisdictions.

yianni: So not sure if language
works, but need to think about it?

Thomas: Yes.

<BerinSzoka> +q

<aleecia_> Berin, could you
kindly knock off the EU baiting?

yianni: With FTC language, what
would be wrong with it, assuming EU perspective works out?

<vincent> BerinSzoka, I'd
wait for a couple of month and then answer ;)

<aleecia_> In general I'd
hesitate about adopting FTC or EU or other "local" bits, but
I'll go with math being global. Ideally this works the same
everywhere.

berinszoka: There's no caselaw on
this yet. I'm trying to point out that, whatever anyone thinks
about EU vs US, it's pretty likely that the definition of that
term is going to happen in US.

… I don't think it's unreasonable to start
with that as a baseline.

<aleecia_> John's point is
right to consider and in other places I'd agree violently.

rachel_thomas: FTC is a good
starting point, but it requires looking at the whole document
and the impact of changing the definition.

<aleecia_> I think we need to
make decisions and get to drafts, which we will revise many,
many times.

<aleecia_> But deadlocking on
not making any decisions is getting boring

<MikeZaneis> Agreeing to
begin the discussion with the FTC language is good progress
that will allow us to iterate further

robsherman: (1) This group won't
get to a place of being able to sign off on FTC definition and
go home. But sounds like the consensus of the group is that we
like FTC and DAA definitions as starting points, with the
considerations we've discussed, and we need to think through it
more. (2) Don't think we should look at this as an effort to
codify all of the privacy laws in the world. It may be that
people have to comply with this standard AND local law[CUT]

<aleecia_> Ok. Do we have
specific issues with the FTC text, or a general "we aren't
willing to say yes to anything without going back for review"
as sort of a general approach to not getting anything done?

<aleecia_> I hear need for
review to see how it works with the EU

<aleecia_> Anything else
specific? What's the to do list to move forward?

<johnsimpson> what are you
saying about transparency?

yianni: Generally like FTC and
DAA language, want to think more. Okay with some examples as
long as not prescriptive. Admin, tech, physical measures. Do we
agree on these points?