How TunnelBear Chose an Analytics Tool That Respects Privacy

Most companies believe you need a lot of data to build a good product. Recent security breaches from companies like Equifax however, have taught us how risky it is to have too much personal information stored in one place and how hard it is to keep safe.

At TunnelBear, we run our service with as little personal client information as possible, but in order to understand what features to develop, or how to scale our service, we do need some information. With our security audit, we shared how we protect your data when you’re connected to our network. Today, we’d like to share how we built a new analytics database that allows us to analyze product trends, quickly and securely without including any personal information.

Building a better database

Our production database was originally designed to securely store events that help us keep track of things like account creations or when people have paid for subscriptions. This was fine when TunnelBear was small, but now that we have more than 17 million users, we've realized that we built our database for operational use, and not for long term trend analysis.

we needed to be able to quickly process information and it had to meet our strict privacy requirements in the process.

Six months ago, we began to build a new Redshift analytics database with two goals: we needed to be able to quickly process information and it had to meet our strict privacy requirements in the process.

Choosing the right level of detail

Building a new tool for analytics would be the first time we'd be giving a wider group of employees access to product analytics. We wanted our team to have the ability to spot trends and help us build a better product, without being able to identify individual customer accounts. While deciding how to correctly store analytics information, we asked ourselves these questions:

What event details do we need?

Is there anywhere we can remove timestamps from events?

Can customers be identified from information in our analytics database?

Who will be able to access this information?

When can the data we import be deleted?

What we came up with was a way to strip personal information to a bare minimum when it’s transferred from our production to analytics database. Our new data pipeline removes information like email addresses, Twitter IDs and device data as outlined in our privacy policy. By separating our operational and analytics databases, we now have the ability to store information in one central place which is restricted to the small group of senior engineers with clearance to access it.

We’ve got the data, now we need to view it

When it comes to customer information, our preference is to control every step of the process by building our own tools, but sometimes that’s not possible with our internal resources. Luckily for us, there were a few third party analytics options available that looked promising. We reached out to several companies with questions about their service and views on privacy and security. We wanted to learn more about their practices so we asked:

What do you need us to provide as a unique identifier for an account?

Where is our data stored?

Who inside your company can access our account?

Do we own and control the data stored on your servers?

How long is the data stored for?

How do you secure the stored data?

Are you required by law to share the information we store with third parties?

As we worked our way through our vendor list, we found that many vendors didn't have answers to the previous basic questions. They’d refer us to their privacy policy or terms of service but couldn't help us with further details on how our information would be stored and secured.

We chose Periscope Data because they had solid answers. They gave us the ability to configure their software to act as a display tool for accessing and querying our analytics data, without any need to store it themselves.

What happens when we access data through our analytics database

With our new system in place we created an information flow that allows us to query large amounts of data faster, in a more readable way. The flow looks like this:

1 Information is stored in our encrypted production database
2 We discuss what specific information we need to migrate to be stored in Redshift
3 Data gets processed, reformatted and error checked
4 Data gets sent to our analytics database
5 We use a SQL query in Periscope Data to find the data we need
6 Periscope Data sorts, compiles and “prints” the result
7 A Periscope Data dashboard lets us view and sort the data in a meaningful, readable way

Our analytics database has access to a much smaller and well curated set of data than is available in our production database.

Our analytics database has access to a much smaller and well curated set of data than is available in our production database. It contains no identifying information. By using Periscope Data, we've been able to split our data usage based on role inside the company. Our engineers can focus on securing personal information while teams like marketing and product can focus on spotting bigger trends that will help TunnelBear grow.

Securing your trust, and your privacy

Our security audit was the first step in a long line of initiatives that TunnelBear has taken to earn your trust. We’re committed to proving that your data is secure when it flows through our network. If we integrate a third party tool into our system, it requires us to learn more about their company culture, not just how our information is stored and accessed. We’re sharing this process so you can understand how we research services in order to find partner companies who share our vision of privacy.

If you have feedback on how we can improve our data privacy practices, we want to hear from you and encourage you to get in touch with us at privacy@tunnelbear.com