Recently, Voltage Security attended the Strata + Hadoop World conference in New York. While there, Voltage’s Sudeep Venkatesh, vice president of solutions architecture, spent the course of the three days interacting with other attendee’s. Upon returning from Strata + Hadoop World, Sudeep sat down to answer some questions.

Q:What did Hadoop World tell Voltage about the state of security and Big Data?

S. Venkatesh: It was interesting that among the hundreds of vendors represented at Hadoop world, only three provided security solutions. Among the 200+ breakout sessions listed in the Hadoop World schedule, not a single session dealt with security of any kind. However, it was completely opposite to what we were hearing from hundreds of attendees visiting the Voltage booth. Several of these attendees were from the financial services and healthcare verticals. Their data analysis needs require them to have sensitive information such as customer names, addresses, social security numbers, credit card numbers, and dates of birth in Hadoop. These attendees repeatedly said that several Hadoop projects were stalled because of a lack of security.

Q: What common Big Data security issues did you hear?

S. Venkatesh: In industries such as financial services, healthcare, government and telecommunications, this sensitive data includes PII (Personally Identifiable Information), PHI (Protected Health Information), and PCI (Payment Card Industry) data. Potentially exposing this data to a wide audience through a Hadoop deployment is not only bad practice from a security perspective, but also violates several industry and government regulations. This was the biggest security issue that we were hearing at Hadoop World.

Q:What element of data security were attendees the most interested in?

S. Venkatesh: At Hadoop World, attendees were most interested in field level encryption and tokenization technologies. Technologies such as Voltage Format-Preserving Encryption (FPE) and Voltage Secure Stateless Tokenization (SST) can protect sensitive data elements such as customer names, addresses, social security numbers, credit card numbers, dates of birth, etc. and still retain their format. This means that the vast majority of analysis can now happen on de-identified data. We have observed that up to 90% of Hadoop jobs can run on de-identified data, without ever having the need to access the real data.

Q: Why is security for Big Data important now more than ever?

S. Venkatesh: The lifecycle of data in Hadoop is very different to that in RDBMS technologies such as Oracle, SQL Server, and MySQL. With RDBMS technologies, data-centric protection can be “bolted on” at a later date and the sensitive data can be replaced with its encrypted counterpart or with tokens. This is in sharp contrast to HDFS, which is great at storing data but not at editing it. Once sensitive data enters HDFS, it is extremely hard to get rid of it without erasing the entire cluster. Enterprises that plan to store sensitive data in Hadoop should invest in a data-centric encryption or tokenization solution from day one.