Data Retention and Analytics Platform (DRAP)

With the rise of cyber security and Internet-of-things (IoT), we are entering a new age of “machine data.” In it, applications are expected to process data generated from millions of interconnected devices at unprecedented speeds. Data is analyzed in real-time, and sometimes retrospectively, to monitor the network, identify threats, visualize trends, evaluate risks, find causalities, and in some cases control devices.

Cyber security poses new data volume and query complexity, along with integration challenges to applications. Cyber security threats, as they get more complex and stealthy, require new thinking on how cyber data is persisted and processed. Cyber security and IoT are defining a new mixed database workload of read-only and insert-intensive transactions.

At SS8 we have engineered the Data Retention and Analytics Platform (DRAP) to process many different forms of machine data. By building a distributed database engine on columnar storage and clustering architecture, we’ve made it easy and economical for applications to scale, store, process and analyze billions of machine generated data points.

Using DRAP, applications can meet some of the most stringent data processing requirements, including:

Ingesting hundreds of thousand data points per second

Querying data in real-time

Process data at the edge and in the cloud

DRAP uses a share-nothing architecture in its design. In a shared-nothing architecture, compute and storage scale together as shown in Figure 1. This tight connection ensures that I/O bandwidth, the key to read performance, is abundant. Taking this a step further, DRAP uses indexing, partitioning, compression and column-store all that tries to avoid I/O as much as possible. When I/O cannot be avoided DRAP pre-fetches data into memory, and once fetched, keeps data in memory as long as possible.

Figure 1: Shared-Nothing Architecture

In the Cloud: compute, processors and memory, scale independently of storage, disk and I/O bandwidth as shown in Figure 2. This independence allows for elasticity: more compute can be dynamically added with full access to data on a shared disk subsystem. DRAP can also operate in such an environment where the 1:1 connection between storage nodes and compute nodes is broken and compute and/or storage can scale dynamically with the workload.

Figure 2: Elastic Architecture

As data is ingested, DRAP automatically partitions the incoming data and creates self-contained data frames each with its own index and metadata. The data in each partition is highly de-duplicated and compressed before storage. Data is organized into columnar structures that lead to over 90% compression in some cases.

DRAP was built from the ground up with in-database analytics in mind. Developers can define complex algorithms and execute them using DRAP’s query engine to execute over massive data sets. DRAP seamlessly maximizes query execution parallelization and minimizes internode data transfers so that queries are run faster and require smaller in-memory footprints.

Data Retention defines the policies of persistent data and record management for meeting legal and business data archival requirements. The different data retention policies weigh legal and privacy concerns against economics and need-to-know concerns to determine the retention time, archival rules, encryption and data formats.

DRAP provides configurable Data Retention policies that can automatically erase data permanently from the platform based on retention time and storage space availability without impacting query performance – no table locks!

Database portability makes cloud and edge architectures easier to implement, so we wrote DRAP in Java. Thus, DRAP can run anywhere, on JVMs in the data center or remotely if internet network latency overhead is negligible. Streaming interfaces such as ZMQ, JMS, Kafka and AWS Kinesis are supported.

Subhra is a principal software engineer at SS8. He is a key player at SS8, bringing over 10 years of experience in telecommunications to the DRAP team. He is passionate about innovations in the cybersecurity industry.