Obfuscating your data with SQL Server 2016 Dynamic Data Masking

In today’s IT landscape, data security is a big deal. Just ask people like Troy Hunt (Blog | @troyhunt), who is a security expert, how losing data can destroy your company. Companies all over the world lose data every single day, and you might not even be aware of this. I’m still seeing companies who give all IT employees admin permission on the whole IT infrastructure: from webserver to database servers, and from customer relations system to financial system. But why is it still a struggle for IT professionals to protect their data? In some companies they don’t even hash passwords, let alone protect them with a certificate or even heavier protection.

In this blog I’m not going to tell you how to secure your data, because honestly I’m not an expert at that. I do want to introduce you to a new way to obfuscate your data, that was introduced in SQL Server 2016. According to Wikipedia obfuscation means: “(…)the obscuring of intended meaning in communication, making the message confusing, willfully ambiguous, or harder to understand”. Or in easy terms: make it harder to read for your users and/or employees. Let’s take a look at an example.

If I would give you an email address like “JohnDoe@domain.com”, it’s really easy to guess the name of the person. Let’s be honest, it’s not rocket science right?! But what if I would give you the email address “Jo*****@d*****.com”, would you still guess the name correctly? Yes I agree with you there’s a small chance you might guess the name right, but those chances are slim. And instead of writing your own code to obfuscate data this way, with SQL Server 2016 you’ll get it out of the box.

Creating resources
The following is tested on SQL Server CTP 2.1. If you’re not running that version or higher, this might mean the scripts or test cases might not work the same as in my case.

This table contains some basic information about the users of our application. And of course we need some user data in the table. I’ve used Redgate’s SQL Data Generator to generate data for my table. Because I’ve generated a full table, I’ve made the script available here.

Now that we have a table with sensitive data, let’s look at the ways of obfuscating this data. From SQL Server 2016 onwards you get to use the Dynamic Data Masking feature. This means you can add a “mask” on top of your data. This is done by altering the columns in your table:

This takes care of the data obfuscation for you, and you can determine the level of obfuscation by using specific functions:

Default: Completely replaces the value in the column with “xxxx”Email: Replaces the email address with X’s, except for the first character, and the Top-level domainCustom: The most dynamic of the 3 functions. This allows you to configure a prefix and suffix, and replaces the rest of the values with X’s

Testing data masking
Now that we have a table with masked data, how does that look from different perspectives? Let’s create 2 users: an application login with just read permissions, and an admin login with db_owner permissions:

Now open up 2 new SSMS tabs, and log in with the 2 accounts. In both sessions run the query below:

USE DynamicDataMasking
GO
SELECT * FROM dbo.Users

The difference is clearly visible when you select the top 10 records in both sessions:

In the admin connection you get to see the full dataset, and in the application login the masked dataset is returned.

Querying data
The retrieval and querying of data isn’t changed when you have data masks applied. If you run a query where you search for users with a specific first name, the correct results is returned:

SELECT *
FROM dbo.Users
WHERE FirstName = 'Jeffrey'

Security issues
Now we get to the interesting part: the security issues. Even though this is not a form of data encryption, I did expect this could be a part of data security. But unfortunately this was a false sense of security…

Let’s try a normal query on the masked data, with the AppLogin session:

USE DynamicDataMasking
GO
SELECT * FROM dbo.Users

This returns the data as expected: masked. But is there any way around this? Let’s make the AppLogin a db_owner on another database (run this statement with a user that has permissions to do this):

I’m not sure what the idea behind of this feature was, or how the SQL Server designed it, but this sounds like a real security issue to me. I expect my data to be obfuscated for my application users, but it’s the complete opposite. If you have a direct connection to the database (and not through an application only) it’s really easy to get a hold of the unmasked data.

Conclusion
Even though I really like this feature (and yes, I’m serious!) there are some things the SQL Server team needs to look at. If you’re expecting your data to be masked for all non-admin users, you’re in for a nasty surprise.

Hoping this will be fixed soon, I’ve created a Connect item for this. If you agree with me on this, and you think it’s a problem as well, please up-vote it or leave a comment in the connect item. There is also a specific item for the join issue. Let’s make sure the SQL Server team is aware of these issues.