Prior to ProtectWise, organizations wanting to analyze network traffic had to purchase on-premises solutions often costing millions of dollars. As data volumes grew, the performance of these systems would degrade, limiting data retention periods to as little as a few days.

“Without historical data, you can’t analyze what attackers were up to in the early stages of a threat,” says Gene Stevens, cofounder and chief technology officer at ProtectWise. “Empowering security researchers to quickly analyze large stores of historical data helps them understand their adversaries.”

ProtectWise knew it needed to come up with an innovative architecture to arrive at a cost-effective endpoint. “If customers add data feeds or experience surges in traffic, we want to accommodate their needs seamlessly.”

Data limits also have security implications. “If we cap data flows, it would be easy for an attacker to bypass ProtectWise with a DDoS attack by flooding the system,” says Eric Stevens, principal architect at ProtectWise. “We need elasticity on demand to make sure that can’t happen.”

ProtectWise uses Amazon Web Services (AWS) to accomplish the massive scale the company required. “A modest-sized network can produce multiple petabytes of data, and our service is aimed at the world’s largest enterprises,” says Gene Stevens.

Broadly speaking, ProtectWise deals with two types of data. The first type is raw packet data, which is encrypted and optimized using Amazon Elastic Compute Cloud (Amazon EC2) instances with local instance storage, and then sent to Amazon Simple Storage Service (Amazon S3) within milliseconds.

ProtectWise also collects metadata about network packets—billions of items per day, generated by software sensors on customer networks via The ProtectWise Grid. Metadata is written to Apache Cassandra databases accessible from the Apache Solr search platform, all running on Amazon EC2 I3 instances. While the original solution provided high performance and scalability, it was also expensive. “Solr works best on using in-memory indexes running on fast hard drives,” says Eric Stevens. That meant paying for petabytes of solid state drive (SSD) storage. ProtectWise needed to find a way to optimize its architecture so it could minimize costs and maintain profitability.

Led by Josh Hollander, director of platform development at ProtectWise, the company developed an innovative solution to increase agility and reduce costs using Amazon EBS to achieve a “stateless” architecture. The company uses Amazon EC2 I3 instances for the most active “hot” shards and Amazon EC2 M4 instances with Amazon EBS volumes for “warm” shards. Using Amazon EBS volumes enables ProtectWise to change instances as needed to handle tasks such as re-indexing the database in the event of a schema change, and then return to a less expensive instance size when the task is complete. The architecture provides a flexible balance of price and performance to meet the company’s dynamic needs.

ProtectWise also uses low-cost Amazon S3 storage, storing and indexing older metadata in Apache Parquet files (a columnar storage format for the Hadoop ecosystem) on Amazon S3. Tertiary indices remain on Cassandra, enabling rapid in-memory search of archival information. Hollander says, “Instead of Cassandra being the primary system of record, it functions as an index of indexes, helping us find the right files in Amazon S3.”

Using Amazon S3 has significantly reduced costs for ProtectWise. “Our savings per unit of data has gone down an average of 30 percent per month over the last two years,” says Hollander. “Using AWS, we have reduced our storage costs by 95 percent, meaning we are spending $1 for every $20 we would have spent on a traditional system.”

ProtectWise increased performance at the same time. “Our solution built using Amazon S3 and Amazon EBS ingests 50 terabytes of compressed network data and metadata each day, adding up now to many petabytes of stored data,” says Gene Stevens. “We can search all that data with response times of 1 to 3 seconds. Nobody else in the security industry can do this. It blows our customers away.”

Because Amazon EBS is persistent, ProtectWise can dynamically move storage among instances as needed without replicating the data as would be required in traditional clustered databases. “Decoupled storage means we can move data in minutes that would have taken days or weeks using traditional architecture,” says Gene Stevens.

AWS made it easy for ProtectWise to develop the storage architecture and gain value incrementally over the course of a year with a team of only two developers. “AWS supports experimentation at a pace that would be impossible if we were dealing with physical infrastructure,” says Robert Tarrall, director of DevOps at ProtectWise. The same efficiency applies in operations, enabling a team of only six engineers to manage more than 2,000 server instances.

ProtectWise relies on AWS to serve large, data-intensive customers with ease, from on-demand video providers to major sports leagues to oil and gas giants. “By taking advantage of AWS, we have been able to make our business completely viable, so we can be as ambitious as we desire,” says Gene Stevens.