HDYC, login requirement and "privacy"

Re: HDYC, login requirement and "privacy"

sent from a phone

> On 5. May 2017, at 01:36, Frederik Ramm <[hidden email]> wrote:
>
> Only if you want to distribute it outside
> of OSM you'd either have to remove/pseudonymize the user names or get
> explicit permission (as in: "I am ok with you publishing this particular
> work with my name in it") from the participants. Would that really be
> such a big issue? I think you're making this into a much bigger issue
> than it needs to be.

you write a lot about personal data, but all osm admins have about users is some email address, which often isn't even existing anymore and an associated user name, and this email address is never published. For gpx tracks you can already choose the level of privacy, and even for identifiable tracks you don't know if the timestamps are real, if the track was recorded with an gps device or is simulated, and who has recorded it. In the planet there are only usernames, which can be chosen freely, and if I wanted I could choose "Frederik Ramm" or anything else, and nobody could know if this was my real name or not. HDYC allows to roughly locate someone in an area, but it doesn't allow to say who someone is or where exactly she lives. If you know which username is used by which real person then it is only because the person has disclosed this information and you believed her. If, for example, I map a nightclub frequented mostly by lgbt people it doesn't mean I have been there, it just means I know where it is (and unless I have told you, you won't know who I am), and even if I've been there you still wouldn't know when and for what reason.

Also everyone can create new users at will, if your concern is privacy, you could use a new user for every edit and nobody could associate these edits to the same person.

There are serious issues with surveillance and privacy in the world, but IMHO osm is the least of these problems. Does someone who sells a can of paint have to put a disclaimer on the can because people might write their name on a wall? Does an internet provider have to warn people not to disclose personal information in their blog? IMHO we have to account for different people wanting different levels of privacy: some people like to write their name on a wall (looking at the success of Facebook et al it seems that they are in a majority btw), others prefer to remain in the shadow.

Maybe it could become an option not to disclose usernames, but actually this metadata is useful for other mappers: you can see if a user is local to a place, how much experience she has, how many discussed changesets, where and for what reason.

Really the people being able to tell who someone likely is are those that already have a huge collection of really private data from everyone, for example those that store the location data of every single step of you from mobile cells (you mostly can't get anonymous sim cards but have to identify with a document) and wireless networks, from passport controls at the borders and from flight lists, from your online orders and credit card payments, from cctv face recognition and fotos you uploaded, from your personal network in social networks, from the network of people you called and that called you, from the emails you send and receive, etc. Whom are you hiding from, the secret services, the government, big multinational companies? These actors will already know so much about you that your osm edits won't change anything, and if you have been able to hide your details from them you can also hide them already in OSM.

Putting a log in to hdyc, from my point of view, doesn't change anything (because everybody can sign up), besides that there are now more data created (Pascal will know who is interested in whom, and osm admins can see how often someone uses the service, and if it becomes common to do it like this, which third party services someone uses).

Re: HDYC, login requirement and "privacy"

On Friday 05 May 2017, Frederik Ramm wrote:
>
> I think that a viable middle ground could be to make user data
> available to signed-up project members only, and they'd have to
> promise to only use that data for project-internal purposes.

You know i have not formed an opinion on this matter yet but i wonder
how this is supposed to work. Do you suggest to have an addition to
the contributor terms, kind of a 'terms for access to metadata' and
require existing users to newly agree to that? And after a transit
period disable api access for those accounts who have not agreed?

In principle that would certainly be possible although there are tons of
practical problems that would come with such an approach. But
ultimately this would probably lead to the vast majority of people who
routinely get mapping metadata in bulk for whatever purpose to use
anonymous accounts for downloading it and to also publish possibly
problematic results of processing it in an anonymous way. Under this
scenario there would probably be some open source HDYC clone, you could
run it either privately for yourself, use an access restricted
officially sanctioned instance of it with your real or anonymous OSM
account or use some rouge open instance running anonymiously somewhere.

For a balanced discussion - and i am not saying i would actually prefer
this approach to what you are suggesting - the whole problem could also
be approached from the other side by reconsidering the possibility for
partly anonymous edits. We don't have this primarily to fight
vandalism but it could be considered to give mappers the option to
activate an anonymous editing mode on their account which would mean
their edits and any other access to their user identity through for
example the API gets scrambled on a daily basis and resolution of the
generated random id to the real user is only available to the DWG.
This would certainly also generate tons of problems but i think it is
important to keep this possibility in mind when considering the matter
of privacy.

> Hence,
> anyone with an OSM account could make such an animated progress map,
> and it could be shown to anyone with an OSM account. Only if you want
> to distribute it outside of OSM you'd either have to
> remove/pseudonymize the user names [...]

That part is really tricky, you'd have to be very specific on what kind
of aggregation is necessary to make the data ok to be published.
Obviously just replacing each user name with user<hash_value> is not
going to cut it. Without clear rules here anyone who publishes
anything based on such data would be in a legal mine field.

Re: HDYC, login requirement and "privacy"

Again on the term "personal data". According to the General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679) [1], pseudonymized data is not concerned, unless it would be possible to attribute it to a natural person:

___(26) "The principles of data protection should apply to any information
concerning an identified or identifiable natural person. Personal data
which have undergone pseudonymisation, which could be attributed to a
natural person by the use of additional information should be considered
to be information on an identifiable natural person. To determine
whether a natural person is identifiable, account should be taken of all
the means reasonably likely to be used, such as singling out, either by
the controller or by another person to identify the natural person
directly or indirectly. To ascertain whether means are reasonably likely
to be used to identify the natural person, account should be taken of
all objective factors, such as the costs of and the amount of time
required for identification, taking into consideration the available
technology at the time of the processing and technological developments.
The principles of data protection should therefore not apply to
anonymous information, namely information which does not relate to an
identified or identifiable natural person or to personal data rendered
anonymous in such a manner that the data subject is not or no longer
identifiable. This Regulation does not therefore concern the processing
of such anonymous information, including for statistical or research
purposes."___

Usually in statistics, information down to the block level is not considered personal informationn. You won't be able from OSM edits to say in which house someone lives, or who she is, so it doesn't seem to apply. The part "Personal data ... which could be attributed to a
natural person by the use of additional information should be considered
to be information on an identifiable natural person. To determine
whether a natural person is identifiable, account should be taken of all
the means reasonably likely to be used, such as singling out, either by
the controller or by another person to identify the natural person
directly or indirectly." leaves some risk, but is essentially stupid, because with any kind and amount of additional personal data you will hypothetically always be able to get to a person, and costs and amount of time are always neglectible in the times of electronic data processing, and given the rapid technological development. So as pseudonymization is suggested in the directive to be applied, it likely does restrict implicitly this paragraph to reasonably expectacle and not every hypothetical case. To get from OSM edits to a natural person you will need so much information about this person that you won't gain more insights from looking at their edits.

Also, I am not sure whether this applies at all to OSMF, because OSMF never collects personal data, it only collects an email address and doesn't verify to whom it belongs and never publishes it, so probably there is no "personal data which have undergone pseudonymisation", rather there wasn't any personal data at any time.

At the moment we can't know what kind of data protection rules will govern OSMF in the future, given that EU rules will not automatically apply any more, soon, if Brexit is not stopped (nonetheless, local chapters might be an issue here).

____________________

Btw: I think we should require our contributors to confirm to be adults (or get explicit permission from their parents?), because children aren't able to legally sign the CT, and their data is particularly protected. Current CTs don't seem to account for this (or I haven't seen it).

Re: HDYC, login requirement and "privacy"

Am 05.05.2017 um 10:37 schrieb Martin Koppenhoefer:
> ..
> Also everyone can create new users at will, if your concern is privacy, you could use a new user for every edit and nobody could associate these edits to the same person.
>
> ..
Well if a "new user" includes

Thanks. Would it be possible to have the link in one's account page from
OSM to link directly to the historic version that was signed? Now I have
to judge that "about 6 years ago" will probably be later than the 1.2.4
version.
It wasn't even clear to me that this is a wiki page because it is so
modified.

Re: HDYC, login requirement and "privacy"

Am 05.05.2017 um 11:38 schrieb Martin Koppenhoefer:
>
> Usually in statistics, information down to the block level is not
> considered personal informationn. You won't be able from OSM edits to
> say in which house someone lives, or who she is, so it doesn't seem to
> apply.

Anybody that participated in contacting editors during the licence
change knows that the above, is, sorry, rubbish. While it is true that
you can't identify every single contributor the large majority can be
easily.

>
> At the moment we can't know what kind of data protection rules will
> govern OSMF in the future, given that EU rules will not automatically
> apply any more, soon, if Brexit is not stopped (nonetheless, local
> chapters might be an issue here).
>
The GDPR applies to anybody that processes data of EU residents (that
has been pointed out to you before) regardless of where they are located.

Re: HDYC, login requirement and "privacy"

I am aware that no matter what we do there will always be "rogue" uses
of our data.

Therefore making all contributors aware of what they are releasing about
themselves and how it could be used against them remains important no
matter what we do. (And we have to find ways to do that without sounding
alarmist.)

In fact, we have a similar situation with our license: We spent
countless years debating and then changed our license to what we thought
was best. We all know that we cannot keep a rogue user from ignoring our
license - but at least we can define what we want to allow.

I am expecting the same for the sensitive user data. We will never be
able to ensure that the data is not used against the wishes of the users
- but we can ensure that those who do this are in clear violation of our
terms and hence "bad guys".

Just to pick a random example:

Today, if you are looking for a job and you're being interviewed by a
potential employer, the potential employer could say: "I can see from
OpenStreetMap that you've been editing a lot during the day in your last
job. Did you not have any work to do?" - and the employer would not even
be "wrong". Harvesting the full history file for totally OSM unrelated
information like that is not against any of our rules; it might be
against the law in some countries but certainly not in others. If you
publicly complained about what happened to you, it is very likely that
there will be many people like in this thread who will say "duh, you
idiot why didn't you use a pseudonym, didn't you read what you signed up
for, lah lah lah".

I would like to come to a point where, if this happened to you in a job
interview, you could afterwards point to an OSM policy and say: Clearly
this company has violated OSM rules, they must have created an account
under false pretenses to get at this data and they're using it for
purposes not sanctioned by OSM. That won't make you get the job, but it
would at least make clear that we stand with our contributors against
abuse of their data.

(If that hasn't become clear already, I am of the opinion that the
current contributor terms don't necessarily mean that the contributor
asks OSMF to distribute their *metadata* under ODbL - I think it just
applies to the *geodata*, and if we wanted we could slap restrictions on
the *metadata* part of things.)

> For a balanced discussion - and i am not saying i would actually prefer
> this approach to what you are suggesting - the whole problem could also
> be approached from the other side by reconsidering the possibility for
> partly anonymous edits.

Yes. I think both approaches could be grouped under "restricted access
to personal information", and there will probably be still other
approaches with their own advantages and disadvantages.
, and I would even assume that "restricted access to personal
information" and "

>> Hence,
>> anyone with an OSM account could make such an animated progress map,
>> and it could be shown to anyone with an OSM account. Only if you want
>> to distribute it outside of OSM you'd either have to
>> remove/pseudonymize the user names [...]
>
> That part is really tricky, you'd have to be very specific on what kind
> of aggregation is necessary to make the data ok to be published.
> Obviously just replacing each user name with user<hash_value> is not
> going to cut it. Without clear rules here anyone who publishes
> anything based on such data would be in a legal mine field.

Yes; even today if a person uses a nickname with OSM and not their real
name, I think it would in many cases be easy to make the case that it is
very easy to de-pseudonymize the person. Currently when someone asks us
to delete their account we simply replace their user name with user_1234
(their numeric user id); it is quite possible that this is totally
insufficient at least in countries with strong data protection laws such
as the UK because the person can still be identified and connected to
all their edits.

Re: HDYC, login requirement and "privacy"

On 05.05.2017 08:49, joost schouppe wrote:
> Putting a somewhat pointless access limitation to
> HDYC is counterproductive, as it might give people a false sense of
> security.

This is correct, but so would

> A system to opt-out of being
> included in this particular system

because it would give people the idea that if they don't opt in then
their data wouldn't be visible, when in fact anyone can run a software
like Pascal's.

I think that "raising awareness" is good; and if we could all unite
behind the idea that just because someone voluntarily contributes to OSM
that shouldn't mean they're automatically sacrificing their privacy then
that would already be a great step forward.

How the goals of transparency and quality control in the project and the
goal of protecting the privacy of the individual contributor can be
reconciled is something we can, and should, think about; I would be very
happy if as a first step we could at least agree that protecting the
privacy of the individual contributor *is* desirable. The knee-jerk
"well you knew what you signed up for" reaction doesn't help a
vulnerable community member when they see their privacy violated.

Re: HDYC, login requirement and "privacy"

On 05.05.2017 10:37, Martin Koppenhoefer wrote:
> you write a lot about personal data, but all osm admins have about
> users is some email address, which often isn't even existing anymore
> and an associated user name

Many people choose their real name, or at least something easily
linkable to their real name via one hop on Github, Facebook, etc.; many
social media platforms even *expect* you to give your real name.

Of course they don't *have* to in OSM. But if they do use their real
name then I don't think you can interpret that as willfully signing away
their right to privacy. "Ha ha, your own fault for using your real name,
didn't you think about your job application with the Chinese government
25 years later, shoulda been more careful!"

I think that even if they are careful enough not to use their real name,
the identity of a mapper will often be easy to reconstruct if you have
access to just a little bit of extra information (might be as little as
a name on a doorbell).

> Also everyone can create new users at will, if your concern is
> privacy, you could use a new user for every edit and nobody could
> associate these edits to the same person.

This is true. It would actually be possible to write a plugin for JOSM
to do that - automatically sign up to OSM with a different throw-away
account for each changeset you upload. Do we want to encourage that?
Frankly, I'd rather not. But if that is our official suggestion on how
to balance privacy with contribution to OSM, maybe we should offer such
a plugin.

> Putting a log in to hdyc, from my point of view, doesn't change
> anything (because everybody can sign up), besides that there are now
> more data created (Pascal will know who is interested in whom, and
> osm admins can see how often someone uses the service, and if it
> becomes common to do it like this, which third party services someone
> uses).

That is true. The log-in required for HDYC currently only has symbolic
character and it says "this is for community members only". We're an
open community and you can become a member with a few mouse clicks. But
I think the symbolism is of value and I support Pascal's decision.

Re: HDYC, login requirement and "privacy"

How the goals of transparency and quality control in the project and the
goal of protecting the privacy of the individual contributor can be
reconciled is something we can, and should, think about

I still don't see how someone can be individually identified within OSM by her edits, and I fail to understand how these edits are qualifying as "personal data". Either the mapper is editing not much (so there is not sufficient information about her, these are most mappers), or she is editing a lot and according to his editing habits you could maybe say something about her interests and the area where she lives, how often she goes to other places, at what times she is active in OSM and similar. This still won't help to identify single persons unless you have a very huge database of many people which _already_ knows a whole lot about everyone, including when they went abroad or in vacation, what their interests are etc., so you won't probably gain more insight from looking at the OSM edits as well. I also fail to understand who would attack someones privacy by looking at OSM edits and for what scope, and why this can't be legally excluded by stating you must not do it if you want the data (which on the other hand will make OSM non-free data, at least with respect to data referring to mappers).

Re: HDYC, login requirement and "privacy"

I think that even if they are careful enough not to use their real name,
the identity of a mapper will often be easy to reconstruct if you have
access to just a little bit of extra information (might be as little as
a name on a doorbell).

if I look at my "local area" in hdyc, there are probably a million people living within, but even if it were just a few thousand it would effectively not be possible to look at all those doorbells (where you won't have your name anyway if you are really concerned about privacy) and get a clue to which username this might be related. If you are living in a _very_ remote area (which most mappers are not), in very rare exceptional cases it might be possible to see who is which mapper, and that he mapped this remote area. Congratulations.

What is the scenario? The chinese government? Your ex-wife? The NSA? Nazi-terrorists? Your friends? According to who it is, the countermeasures will have to be very different.

Re: HDYC, login requirement and "privacy"

On 05.05.2017 12:27, Martin Koppenhoefer wrote:
> I also fail to understand who would
> attack someones privacy by looking at OSM edits and for what scope, and
> why this can't be legally excluded by stating you must not do it if you
> want the data (which on the other hand will make OSM non-free data, at
> least with respect to data referring to mappers).

I think it would be good to separate - at least in our minds - the core
geodata from the "user data" or maybe "metadata" of who did what when
and using which operating system and editor.

The core geodata will always be freely available under the ODbL, and you
would not "make OSM non-free" by omitting e.g. user information from
that. Many current distribution forms (e.g. standard Overpass responses,
vector tiles, Garmin maps) already omit user information.

You could then offer the user information (needed for quality control
etc.) under separate rules (that say "for project internal use only").
This would automatically mean, that someone who runs a HDYC-like site
would have to put a login in front of the site in order to ensure that
he complies with the "internal use only" rule.

Re: HDYC, login requirement and "privacy"

> On 5. May 2017, at 12:24, Frederik Ramm <[hidden email]> wrote:
>
> This is true. It would actually be possible to write a plugin for JOSM
> to do that - automatically sign up to OSM with a different throw-away
> account for each changeset you upload.

then you'd know it's either a German or a Chinese and could see from the region of the edit which one ;-)

Re: HDYC, login requirement and "privacy"

On Friday 05 May 2017, Frederik Ramm wrote:
> [...]
>
> I would like to come to a point where, if this happened to you in a
> job interview, you could afterwards point to an OSM policy and say:
> Clearly this company has violated OSM rules, they must have created
> an account under false pretenses to get at this data and they're
> using it for purposes not sanctioned by OSM. That won't make you get
> the job, but it would at least make clear that we stand with our
> contributors against abuse of their data.

One of the things i was trying to point out is that this would not be
the case. That company would simply say: "We got that info from <rouge
website> or from our human ressources consulting contractor and never
agreed to any terms not to use such data. Thanks for informing us that
they are using this data without permission, we will not use it any
more in the future." ;-)

> > For a balanced discussion - and i am not saying i would actually
> > prefer this approach to what you are suggesting - the whole problem
> > could also be approached from the other side by reconsidering the
> > possibility for partly anonymous edits.
>
> Yes. I think both approaches could be grouped under "restricted
> access to personal information", and there will probably be still
> other approaches with their own advantages and disadvantages.

Well - the difference with the scenario i outlined is that it much more
clearly aims at the protection of the mappers' privacy and gives the
mapper much broader and more immediate control over this. This is no
replacent for a solid strategy on educating mappers on what kind of
privacy risks are involved with contributing in OSM but it kind of
seems a more logical approach to the matter than a purely
after-the-fact approach to protecting the data.

This does not mean i am convinced this is ultimately the best solution,
this depends on a lot of details of the implementation.

Re: HDYC, login requirement and "privacy"

Actually, can an OSM username be considered as 'personal data'?
Can somebody point out to a definition of 'personal data' ?
How would this be different from, say, my github account?
Yves_______________________________________________
talk mailing list
[hidden email]https://lists.openstreetmap.org/listinfo/talk

Re: HDYC, login requirement and "privacy"

"It depends" the critical part (regardless of if it is your real
name or not) is that it can be used as a key to generate a profile
a la HDYC and that can then be associated with the help of
additional sources with a real person, potentially revealing all
kind of things about your life. But strictly speaking the display
name is not necessary for that as the changeset meta data and
likely the edits themselves probably contain enough information
to generate unique or near unique fingerprints.

That is why I suspect that the consequence of this discussion
could be fairly drastic and result in essentially all meta data
being removed from the planet dumps, including changeset ids and
so on.

Simon

Am 05.05.2017 um 18:25 schrieb Yves:

Actually, can an OSM username be considered as
'personal data'?
Can somebody point out to a definition of 'personal data' ?
How would this be different from, say, my github account?
Yves

Re: HDYC, login requirement and "privacy"

This topic started a bit backwards -- with an action taken by one project within the OSM ecosystem. We've covered a lot of perspectives on the topic of privacy in OSM, and possible actions and their implications. To turn this thread into some forward movement for us, a good course of action will be as follows. This does not clearly fit into one Working Group responsibility, so the OSMF Board can consider taking up the design of the process at least.

* We need to considerately research and assess the personal information (PI) risk. Including defining what is PI, and what various part of OSM might expose.

* LWG get informed legal advice on EU and other jurisdiction's PI laws

* Consider the range of possible activities to address the risk

I reckon the most reasonable and effective starting activity will be to clearly define what OSM users need to know about contributing geodata to OSM, and the PI considerations they should keep in mind. As Frederik says, "raising awareness". For this to be effective, this means smarter design in the learning process and onboarding of new mappers.

And perhaps that's the ending point. Personally I can't see any way the removing contributor metadata from geodata would 1) really protect anyone 2) not hobble the project, which depends so much on user reputation to retain quality. In any case, let's kick that question down the road.

Re: HDYC, login requirement and "privacy"

Le 05. 05. 17 à 19:11, Simon Poole a écrit :
>
> That is why I suspect that the consequence of this discussion could be
> fairly drastic and result in essentially all meta data being removed
> from the planet dumps, including changeset ids and so on.
>
So, if you suspect, ... don't ?
Editing the map *yourself* *is* Openstreetmap !!

I'd really like to have a defintion of 'personal data' in this context.
Otherwise, this discussion is quite useless, cause while interesting, an
OSM-talk definition won't be anything close to a legal definition.

Re: HDYC, login requirement and "privacy"

"‘personal data’ means any information relating to an identified or
identifiable natural person (‘data subject’); an identifiable natural
person is one who can be identified, directly or indirectly, in
particular by reference to an identifier such as a name, an
identification number, location data, an online identifier or to one or
more factors specific to the physical, physiological, genetic, mental,
economic, cultural or social identity of that natural person;"

Re: HDYC, login requirement and "privacy"

> 2017-05-05 12:24 GMT+02:00 Frederik Ramm <[hidden email]>:
>
> > I think that even if they are careful enough not to use their real
> > name, the identity of a mapper will often be easy to reconstruct if
> > you have access to just a little bit of extra information (might be
> > as little as a name on a doorbell).
> >
>
>
> if I look at my "local area" in hdyc, there are probably a million
> people living within, but even if it were just a few thousand it
> would effectively not be possible to look at all those doorbells
> (where you won't have your name anyway if you are really concerned
> about privacy) and get a clue to which username this might be
> related. If you are living in a _very_ remote area (which most
> mappers are not), in very rare exceptional cases it might be possible
> to see who is which mapper, and that he mapped this remote area.
> Congratulations.

You're seriously underestimating how much information it's possible to
get from editing patterns. There are a quarter-million people in the
area I keep an eye on; maybe four of them are active OSM contributors.
Just from looking at changesets, I know where two of them live: which
house for one of them, and the general neighborhood for the other.

(I also know which university a couple dozen hit-and-run editors
attend, and can make a good guess at which class they took last fall.)