We had an issue today where several thousand customers received a notification in error due to a subset of data from our production system being used in a test system. There's been some disagreement between our team as to the best way to handle this. Management wants to scrub the production data, while some team members want to leave the (non-sensitive) data as-is.

Is there a general best practice for this? We already scrub sensitive information when using data in our test environments, but what about scrubbing less sensitive information such as email address or phone number?

7 Answers
7

Is there a general best practice for this? We already scrub sensitive
information when using data in our test environments, but what about
scrubbing less sensitive information such as email address or phone
number?

Most shops have practices for using test data. The specifics differ, but in general you don't want to use Production data which would be illegal, called out in an audit, could be embarrassing to your customers, etc.

Some locales require special handling for PII (personally identifiable information), so check local laws.

In addition, some audits will ask about the handling of such information, so check with your company's standards.

But your customers received an unintended notification due to your failure to scrub data that you have deemed "less sensitive".

In order to prevent another occurrence (which would be particularly embarrassing), you might consider reviewing your test data practices with an eye toward "escape" rather than just protection of PII. Ask yourself, what could possibly happen with the data as it exists. If your system involves automated calls, for example, phone numbers might provide a means of "escape" and hence should be scrubbed.

Additionally, do you really need to use a scrubbed copy of Production data? In many cases, synthesized data (which you could generate in an automated manner) is better than a copy of Production anyway. Synthesized data can be seeded so that it exposes test conditions that may not occur with a snapshot of Production. For example, my synthesized data usually includes at least some cases where fields are filled to their maximum length - a condition which might only occur in production by happenstance.

Besides the data, look at your test environment itself to see how you can avoid such escapes in the future. Does your test environment actually need to deliver emails across the internet? Or could it be set to either just queue up the emails without delivering them, just dump them to the file system, or block delivering them to the outside world?

You should absolutely be scrubbing email addresses and phone numbers. They are, as others have said, personally identifiable information, and failure to scrub them is a breach of data security standards.

That said, the scrubbing method doesn't have to be terribly difficult - if there has to be something in those fields, set all the email addresses to a bogus address that meets format standards for testing (user1@deadmail.badserver or something) and an equally non-existent phone number. If the fields don't need data, clear them.

You had an embarrassing incident arising from insufficient data security - you're lucky you didn't have worse.

Data security and privacy standards vary from country to country around what can and cannot be stored, and are generally only guidelines not legislation.

That said, if your management are saying that data needs to be scrubbed, then it needs to be scrubbed regardless what the technical team thinks.

There are other ways to look after data that do not require scrubbing. In some organisations they simply treat their test environments with all the security and rigour and access controls that they treat production environments.

Best Practice I always find as a loaded term. While there are industry-wide practices some of those may be lenient in any particular space or be far more strict, especially in places where there is a lot of Personally Identifiable Information (PII) or financial information. Much of that PII has to be very well protected, or you can encounter situations like you have noted. Although it's often a useful practice to use Customer data (in one place I worked we insisted on it for testing) but in another it was just not feasible (as 90% of the data was considered PII). What you need is a good process to determine what your needs are, then adjust the product data to fit your needs.

First and Last Names are often ok, so long as they are not linked to other information

Addresses can be PII, but sometimes you can get away with it unless you are testing a mailing system, in which case you can always update Zip Codes or Change the Street Addresses

Emails are difficult, at times they can be problematic if you have jobs or processes in TEST that run them, such as your case, but I have often backed that up with no Emails from TEST go to the outside world. That way in case there are issues in TEST and some process or job does fire off, no email gets to its destination. It can be problematic if you really need that outside email but you should be able to plan for that. I worked closely with my IT group on how to configure the Mail Server so I could have emails blocked.

Phone Numbers may be fine, unless you have systems that make phone calls

Credit Card numbers should never be kept in TEST, nor should that be readable, in the US there are financial liabilities for these, whenever I have done TEST data this is one of the first I scrub

Everything else take on a case by case basis, look at the data and see if you really need it or if you can find a way to mock, or scrub it.

People have already covered the privacy side of this issue. Couldn't have said it better than Kate.

What concerns me is the risk of another embarrassing round of notifications greatly outweighs... whatever benefit there is to not scrubbing (which is unclear). What is the reason some team members want to leave the data as-is?

You clearly have email notifications in place already. And if you have them for one action, chances are you have or will have more in place. A good rule of thumb: Scrub anything that could possibly connect your test actions (automated and manual) with your real users. There should be no link that would change production data, or send notifications via email or sms.

Trust me, once you get this in place, you will sleep much more soundly while running batch jobs on test overnight : )

It is considered a good practice to scrub any identifying information as jruberto noted. We also put different protections in place to such as dead drop mail servers for internal environments and other things to further segregate our test environments and data from production and prevent incidents like this.