GDPR: Data Residency and Application Architecture

By now you’ve read numerous articles about how the European Union (EU) will enforce its General Data Protection Regulation (GDPR). They tell you how to figure out your exposure, establish processes for dealing with data, appoint a data protection officer, and figure out what “clear affirmative action” means for you. Great ideas in a world of Facebook, credit report agencies, retailers and other companies leaking personal information wholesale.

Data Residency and Processing Requirements

The GDPR’s Data Residency requirements stipulate that organizations can neither store personal data of EU data subjects in, nor transfer it through, countries that do not enforce equivalent data protections.1,2 The United States does not require these protections, so data on EU data subjects must be stored and processed in the EU. This means that if you have a US and international customer base, you will need to store and process data in multiple countries. Further complicating this is that China and Russia are imposing similar restrictions. This has major implications for your processing architecture.

GDPRchitecture: Bringing Processing to Distributed Data

One definition of big data that we like is data that is too large to move. Regardless of how much data you have, if you do business in the EU you now have “big data”. It’s become too expensive to move. So let’s look at how architectures typically handle big data.

Perhaps you’re already using Apache Hadoop or something else that uses a network of compute/storage nodes to process data in parallel. Grossly oversimplified, the MapReduce algorithm works like this:

Distribute the incoming data to a set of nodes (map)

Process each node in parallel

Combine the results (reduce)

A parallel processing pattern similar to Hadoop can be used to meet data residency and processing requirements. Every affected region must have an isolated data set and application architecture to host and process its data.

A geographically distributed “MapReduce” pattern uses these independent data storage and processing nodes. Route or distribute incoming data to an appropriate node, process in parallel, and then combine aggregated and/or pseudonymised results back to a single location. Using this approach will allow you to keep user data in the appropriate administrative domain in order meet data residency requirements, still be able to run local operations on the data, and to run detailed analytics globally.

Of course, there’a a lot more to the architecture than this brief introduction. Please contact us if you want to go into more detail.

(1) ‘personal data‘ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

(2) ‘processing‘ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction;

(5) ‘pseudonymisation‘ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;

2Article 45, EU GDPR, “Transfers on the basis of an adequacy decision”: 1. A transfer of personal data to a third country or an international organisation may take place where the Commission has decided that the third country, a territory or one or more specified sectors within that third country, or the international organisation in question ensures an adequate level of protection.

Part 1 of the Data Ingest Series The process of data transforms and load (DTL) goes by many names: Data acquisition Data ingest Enterprise transform and load (ETL) But they all are about getting external data into the system. The problem that most businesses face is that there are no easy to follow best practices […]

A UI that is responsive to device and browser size is critical to provide usable access to your website and services. One of the most important parts of the UI is the navigation bar (navbar), which allows users to easily find and access information. The good news is that building a responsive navbar is not […]

Growth We do a lot of work with growth companies, across all scales from startup to multi-national firms. This post on LinkedIn from Deby Joevita is a really great encapsulation of a growth lifecycle that works, and draws on well-established disciplines and methodologies. How Growth Stage Entrepreneurs Build Meaningful Product – D3BY Oracle Java SE […]