Developer tips

GDPR – A Practical Guide For Developers

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for getting your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.

I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.

The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data in a machine-readable format), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them).

Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.

It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)

Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s. Note that (as pointed out in each feature) they don’t necessarily have to be automated – you could just have a manual process in place. But for bigger systems it would be much better to have them automated.

“Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent or based on the legitimate interests of the controller (see more below), and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blockchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse. What about backups? Ideally, you should keep a separate table of forgotten user IDs, so that each time you restore a backup, you re-forget the forgotten users. This means the table should be in a separate database or have a separate backup/restore process.

Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.

Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.

Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while). You don’t need to implement an automated export, although it would be nice. It’s sufficient to have a process in place to allow users to request their data, which can be a manual database-querying process.

Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.

Consent checkboxes – “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”. Another important thing here is machine learning/AI. If you are going to use the user’s data to train your ML models, you should get consent for that as well (unless it’s for scientific purposes, which have special treatment in the regulation). Note here the so called “legitimate interest”. It is for the legal team to decide what a legitimate interest is, but direct marketing is included in that category, as well as any common sense processing relating to the business activity – e.g. if you collect addresses for shipping, it’s obviously a legitimate interest. So not all processing activities need consent checkboxes.

Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.

“See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. I wouldn’t say this is mandatory, and you can leave it as a “desirable” feature – for example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned). This is not all about the right to access – you have to let unregistered users ask whether you have data about them, but that would be a more manual process. The ideal minimum would be to have a feature “check by email”, where you check if you have data about a particular email. You also need to tell the user in what ways you are processing their data. You can simply print all the records in your data process register for which the user has consented to.

Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviously, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).

Keeping data for no longer than necessary – if you’ve collected the data for a specific purpose (e.g. shipping a product), you have to delete it/anonymize it as soon as you don’t need it. Many e-commerce sites offer “purchase without registration”, in which case the consent goes only for the particular order. So you need a scheduled job/cron to periodically go through the data and anonymize it (delete names and addresses), but only after a certain condition is met – e.g. the product is confirmed as delivered. You can have a database field for storing the deadline after which the data should be gone, and that deadline can be extended in case of a delivery problem.

Now some “do’s”, which are mostly about the technical measures needed to protect personal data (outlined in article 32). They may be more “ops” than “dev”, but often the application also has to be extended to support them. I’ve listed most of what I could think of in a previous post. An important note here is that this is not mandated by the regulation, but it’s a good practice anyway and helps with protecting personal data.

Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption

Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.

Encrypt your backups – kind of obvious

Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person. Pseudonyms could be reversible or not, depending on the usecase (the definition in the regulation implies reversibility based on a secret information, but in the case of test/staging data it might not be). Some databases have such features built-in, e.g. Orale.

Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.

Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.

Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose. This does not follow directly from the provisions of the regulation, but it is kinda implied from the accountability principles. What about search results (or lists) that contain personal data about multiple subjects? My hunch is that simply logging “user X did a search for criteria Y” would suffice. But don’t display too many personal data in lists – for example see how facebook makes you go through some hoops to get a person’s birthday. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies (or companies processing data regularly) to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)

Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register.

Finally, some “don’t’s”.

Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent. Note here that additional legitimate interests of the controller might be added dynamically.

Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case

Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.

Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.

Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards

Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.

I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.

49 thoughts on “GDPR – A Practical Guide For Developers”

Hi Bozho,
Excellent article. I was wondering what your thought were on how to handle historical backups when implementing “Forget me”. Would every backup containing data on the subject need to be restored in order to delete the relevant data and then subsequently backed up again? This could be a nightmare scenario for a large company with a lot of data and a lot of forget me requests
Kind Regards,
Darren

I really like your article! I have just one comment which I think is worth to mention, you do not have to implement everything, if you could with high probability assume that for example right to portability will be used very rarely you could define manual process for extracting personal data from database and use it when it will be needed. I think GDPR put a requirement on data Controller to provide possibility to do so, the way is up to the controller.

Further to Darren’s comment/question about right-to-erasure and backups: A similar problem also occurs when using the event sourcing architecture, if personal data is stored in an immutable event log. One option for these scenarios is to use cryptographic erasure: encrypt personal data field upfront, with a key specific to the data subject, and deleting the key when needed to enforce deletion of the data. This is something we’ve implemented for Java. More info here: https://axoniq.io/events/2017/11/gdpr-webinar.html

@Darren I added a little more about backups. Basically, you keep a list of forgotten user IDs and re-delete them on restore.

@Frans yes, that’s a good approach. In some cases events (in event sourcing) can be deleted or modified/anonymized without affecting anything else, so that’s also an option (slightly easier, but potentially breaking)

@Albert – that’s right. It better be automated, but it doesn’t have to be. I’ll add a clarification
@Dawn – yup

Basically you assume, that you already have perfect data quality and have identified all persons with some account id. But the regulation never mentions some id, it requires to identify natural persons, not accounts.

Some example from my real live experience with data we have seen at almost every customer companies. You have an contract with an ISP for your internet and another contract with the same ISP for your mobile phone. What we have seen is, that most of the companies create TWO seperate accounts for this and don’t get the data connected. Especially, if there are some company fusions or just different departments.
The result is currently, that you might get two ad mails for a new product of the ISP.

For the GDRP it would NOT be sufficent, to make some buttons after the account login, if the natural person has two accounts. You have to find ALL data regarding this one natural person. So the buttons are good, but if you don’t control your data quality you could get into trouble.

So, you are right, GDPR is not THAT complicated, but it isn’t THAT easy as you say. The basic implementation for some features might only require some weeks, but only if you already have solved some very hard problems. Maybe it is quite easy for small or “new” companies, which only have ONE (at most two) database, but for most companies we talk to, this is not the reality.

Just my thoughts (I work at a company with heavy experience with data(-quality))
Greetings Marcel

Hello Bozho,
first of thank you for a comprehensive and an exhaustive article.
I have a bit of an obscure question.
How about third parties who generate user interaction data which is used for ROI, conversion and such measurements?
Especially where they don’t explicitly or implicitly know the user ?
do those 3rd parties need to provide data export for the specific user?
I am asking because in order to offer an export, they’d need to be able to bind the actual app user to their user agnostic tracking system.

If they can’t deduce the user, they cannot do any of the above.
However, they should follow the e-privacy directive and the upcoming e-privacy regulation which defines how cookies and other tracking mechanisms are used

I think it covers the most common use cases . There will certainly be edge cases depending on the business needs that are not covered above, though. The other day we got such a question – “what to do in case we get the data of the user and their consent over the phone”. Seems like the proper thing is to just mark the consent in a CRM on behalf of the user, but it is not yet clear – maybe some call archiving will be needed in case of sensitive data? Can’t say at this point without consulting with legal experts.

Thanks for sharing your analysis.
I’ve spent some days in 2017 to scan official ressources, including the original GDPR text, and for some points, I came to a slightly different conclusion.
Basically, almost every time you write “must” (encrypt data base content; provide a data download button; allow direct personal data editing; etc…), on my side, went to the conclusion that this is an option, not a requirement.
What is required is to grant each individual access to their personal data; the how (is it automated or manually) is not enforced. Thus, a snail mail process would meet the requirement.
Regarding encryption, the text states “shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk”.
The notion of “level of security appropriate to the risk” is key here : whether data are usual ecommerce data (postal address) or personal insurance data (history of failures, …) does matter, and measures are to be adapted.
By the way, it is not only a technical point, but also a process point : what about the developer who would code the encryption of the data : how do you ensure that he will not be able to access/decypher all data ?

I was curious if GDPR only applies to client data or if it also applies to employee/admin user data as well.

For example with event sourcing or access logs, would have something like “Employee X changed Customer Y’s address on 01/01/2018”. Can the employee/admin ask for their data to be forgotten? (eg when employee leaves company)

What would you recommend for people who are both customers and employees?

It applies to employee/admin data as well, yes, BUT it is based on contract, rather than consent. So the employee can’t ask to be forgotten. You just have to define a data retention period for that kind of audit data (it shouldn’t be “forever”)

You are correct. It is more fuzzy than “must” vs “must not”. I’ve listed the general good practices that would make you safer, but whether a compliance audit will absolutely require them – depends on many factors.

Great article. Helps our developers really grasp the concepts i’ve been trying to get across on our implementation journey. We are now more focused. Our real challenge is in implementing a solution for data at rest that avoids having to encrypt the whole database.

Great article, I’m just in the process of ensuring GDPR compliance in our reporting databases, and one issue that we’re having is with our main Datawarehouse, in which we’ve got data from about 5 legacy systems combining, we’re finding that realistically we need to obfuscate the personal details identically for the systems, so that I’ve got the same fake name and postcode in each system (for instance) to be able to match or throw up anomalies in an exception report.

Ideally, the best approach would be to start afresh with empty systems and populate each with specific test data but getting the diversity, volume and historical issues would mean the data was hugely unrealistic and at the point of going live we’d hit new unforeseen issues that the clean, sensible test data didn’t expose.

Another challenge is when updating the data with live deltas, the obfuscation needs to be similarly consistent – so, for example, we’ve got me, Mr Smith, first appearing in our CRM system as a lead, so that creates a record in the warehouse and after anoymisation, I’m ‘Mr Jones’ (along with obfuscated email, phone, address etc) – then I sign up as a customer in the sales system, we have to go back to the CRM system and find my pseudonym and use that, whereas if I’ve just appeared directly in the Sales system, they’ll need to come up with a new pseudonym, randomly generated, and then later, if I appear in the CRM system (if they did a customer mailout for instance) they’ve got to do the same. Essentially, the first time I appear in any system I’m given my pseudonymous values, and appearances in subsequent systems must tie back to that first appearance. Also this needs to be done on a field by field basis, as not all systems have all the same fields – (one might have email, another not).

Hi Bozho,
I am not developer but a “manager” 🙂 However I would like to ask a developer type question. Is it possible that access to data within a database is granted via an API that ensures you should have access. That way developers cannot use PHP encoded into the webpage to see all data without logging their access.
Thanks
James

For me there is a contradiction between the “forget me” functionality and when you were saying you can restore the database with a backup and then erasing those users’ personal data.
In my understanding this should not be enough to do compile with the regulations. Same goes for encrypting your backup, I just fail to see how is that compatible with GDPR.
My understanding is that you have to delete EVERY personal information you storing about the specific person in your system. Doesnt matter if its in a log file or database or happen to be in a backup file.

“Yes, but that just shifts the responsibility to the developers of the API. Ultimately someone will have to write queries”

Well, yes and no. The company can get audited, so it’s not really the developer responsibility of the API. It’s the whole company, and the developer has to make the requirement efforts to compile with the regulations. And again, I don’t think leaving the user personal data in backups is compatible with GDPR.
Although I don’t know a better solution neither, since every company has incremental backups and it just makes it close to impossible to do such a thing – removing personal data from those backups too -.

About the API – from organizational point of view it is of course better to limit the number of people (and applications) that have direct access to the database. No doubt about it.

As for backups – since eventually old backups are discarded (even in the case of incremental backups, full backups are performed), then I think you are fine with having an encrypted backup + a separate table with forgotten users. Apart from that, I agree, you can’t delete personal data from backups. It’s sufficient to acknowledge that, to protect the backups (encrypt, limit access to them), and have them expire. I guess..

Hi! Thanks for the article. I wonder about the needs of adding a chexbox to express consent vs showing the text “By submitting, I accept…”. I’ve been doing some research and Recital 32 of EU GDPR says:

> This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject’s acceptance of the proposed processing of his or her personal data.

By “statement or conduct which clearly indicates in this context the data subject’s acceptance of the proposed processing” I understand that the current legal solution “By submitting, I accept the terms and conditions…” works.

Restrict processing – […] That means it should no longer be visible to the backoffice staff, or publicly. […]

But if our staff is no longer able to see (for instance in an e-commerce environment) previous orders, addresses, phone numbers, e-mails or even registered issues, we will not be able to offer our services. If that would be the case, could we block that user data processing action to make it mandatory?

IPs are peresonally identifiable information, however you can store them in logs as soon as you have some rotation policy. They are unstructured information used for diagnostics. And I believe they fall under the “legitimate interest” of the controller. So just mention in your privacy notice that you collect the IP and it should be fine

Do you have to delete the User record? Or is it OK to just nullify all of the identifying fields (or all of the fields apart from the primary key and a ‘This user is deleted’ flag if you want to be sure)?

The user may remain identifiable by pattern. “This is a diabetic in North London who likes size 38 pink trousers” – based on all the Orders that have the same User ID?

Alternatively delete the User record and switch foreign keys to a special “The Forgotten User” instead?

The definition of personal data is not clear to me:
“every piece of data that can be used to uniquely identify a person or data that is about an already identified person.”
Does that mean that an IP address or a Mac adress are personal data or customer_id?
Is it allowed to cross data from different systems according to this fields ? for statistics on services for example?

Does any of this also apply to emails that a company has received from and sent to its customers?

Also, what’s to stop this from becoming the next target for the scum of the world to prey on companies, like patent trolls and ransomware authors already do. Imagine a new industry of ill repute, where bad guys intentionally interact with company websites, apps, and take other actions to get their personally identifiable information in the company systems. Then, bad guy enacts some provision of the GDPR to test whether said company is in compliance, and when they see the isn’t , e.g. one piece of information they know they are provided wasn’t reported on during a request for their stored information…lawsuit! Rinse and repeat for infinite profit.

Both may be okay. You’d have to assess how likely it is to be able to identify a particular person (and not just one, accidentally, but many) by the data. Probably not very likely, especially if you replace full addresses with a larger area.

IPs and MAC are personal data, explicitly noted in the regulation. customer_id is an identifier and is personal data only in combination with the rest of the personal data.
You can have statistical data.

While deduplication might be important to have correct data, deduplication might also be against the data minization principle. Suppose a customer orders two times but with seperate addresses and you are allowed to store the data, because you still owe a warranty obligation to the customer, there is no need – and often no right – to link both purchases to the same id.
If users can have multiple accounts, you do not have to link them together in order to comply – rather to the contrary. This can be against the data minimization requirement of the GDPR.
If a user called “Jack Smith” wants to know, what data you know about him, you do not have a requirement to use a unique id for this Jack Smith in your whole company. You can ask the user, to provide his prior addresses in order to fullfill his information or deletion request.
Art. 15-18 are no excuses to join datasets that do not have to be linked together otherwise.
The same is true for large companies. If there is a request from an indivdual, rather than centrally collecting the data at one place you should send the request to all relevant systems, so they can individually comply with the request.

Thank you for your effort.
Let me ask common questions, but with a concrete scenario.
To pose this very question, I was asked to provide my email address. Will the email address alone be “personal data” subject to GDPR?
I suppose that WordPress here has sent you an email notification reporting my question and my email address.
So, if the email address were subject to GDPR and I should ask you to delete it, should you remove the notification email from your mailbox? Should you also make sure that WordPress and your email provider do the same on their archives?

Yes, the email is personal data. (You don’t have to give consent, because I’m processing it for a legitimate interest – to verify you and to notify of follow-ups.
As for erasure – if you send me an email “delete all my comments”, I’d have to. Whether I’ll have to delete my emails is a good question. I think so (can’t think of a reason not to)

We are a Spanish based company and would like to use Contactually CRM, I have not found nothing about GDPR (maybe they are marketing mainly the US). I found this in their privacy policiy statement. Does this mean we could not use Contactually as our CRM? Thanks a lot, Ramon

“Consent To Processing In The United States. By providing any Personal Information and/or Content to Contactually, all users, including, without limitation, users in Canada and the member states of the European Union, fully understand and unambiguously consent to this Privacy Policy and to the transfer of such Personal Information across international borders in accordance with Contactually’s standard operations, including the collection, storage, and processing of such information in the United States of America or other countries in which our employees and contractors may be located.”

this article was one of the reasons why we started our work on GDPR SDK – I’m a developer, and it’s easier for me to understand something looking into a source code than into legal stuff.

We’ve developed and recently published open source client SDK, which can be used as a starting point to make application GDPR compliant. Some of the things covered in this SDK are data subject rights (inform, object, rectification, erasure, …). Since there is no one universal solution for the GDPR, our approach was to create and document interfaces (with explanations and links to specific GDPR articles) and default implementations. It’s on a developer to implement actual code for deletion of personal data or code for rectification, etc. – but – I believe that guidance that this SDK provides can be helpful.

Great article Bozho! Re: backup strategy: our current thinking is (this is a brand new application) to store any personal data in a PersonalData table, and have a trigger-updated version of it called PersonalDataBackup that is automatically updated whenever the first changes, but in this latter table all actual personal data is stored encrypted (there will be a few fields that are not encrypted, mostly FKs; there will be a process (stored procedure) that will allow the restoration of this data to the PersonalData table, if you have the PersonalDataKey key). The encryption key is stored in a separate PersonalDataKey table. Now, PersonalData will never be backed up. PersonalDataKey will only be backed up e.g. for a two-week moving window only. PersonalDataBackup will be backed up with the rest of the database with standard backup policies. When a “forget me” request arrives, we will delete the personal data part of the PersonalData record and will delete the key part of the PersonalDataKey table. This way it is guaranteed that within two weeks we will completely forget the personal data and we still maintain our PKs that may be FKs elsewhere. What do you think?