Frequently Asked Questions

What is ENTRADA?

Why use Apache Parquet?

Parquet is a columnar storage format which allows for very efficient encoding and compression of the data. DNS data is highly structured and each column often has repeating values. Storing the data for each column sequential on disk and using run-length encoding, Parquet would only need to store a single 0 and a count of the number of zeroes. Compared to writing all the zeroes to disk, this saves a lot of bytes.

Why use Apache Impala?

Impala provides high-performance, low-latency SQL queries on large volumes of data stored on Apache Hadoop. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.

How fast is ENTRADA?

This all the depends on the type SQL query and the volume of data that is to be analyzed. For relatively simple queries using a couple of billion rows, expect to get a result within a few seconds to a couple of minutes (using a small cluster of 4 data nodes). For more detailed questions about performance please contact us.

Can ENTRADA be scaled out?

Yes, because ENTRADA is built on top of Hadoop it is very easy to scale out by adding more Hadoop nodes to the cluster to increase compute and storage capacity. Adding more harddrives to existing nodes is also possible if storage is the bottleneck.

Can i also use other Query engines?

Yes, you can also use Apache Spark to query the generated Parquet files.

What network protocols can ENTRADA handle?

Currently only the IP, TCP, UDP, DNS and ICMP protocols are supported.

Can ENTRADA be made highly available?

Yes, ENTRADA is built on top of Hadoop which has high availability features.

What language is ENTRADA written in?

The ENTRADA components used for converting network data are written in Java, the workflow used to tie everything together is done with Bash shell scripts.

Who developed ENTRADA?

ENTRADA was initially started by SIDN Labs, the R&D team of SIDN, the domainname registry for the .nl ccTLD. At SIDN Labs ENTRADA is used to analyze DNS network data, the SIDN Labs DNS database currently holds over 100 billion rows.