A Technical Overview of Cloudera Altus Analytic DB

A few weeks back, we announced the upcoming beta of Cloudera Altus Analytic DB for cloud-based data warehousing. As promised, the beta is now available and we wanted to spend some time describing the unique architecture.

Architecture of Cloudera Altus Analytic DB

Altus Analytic DB is built on the Cloudera Altus platform-as-a-service foundation, which also supports the Altus Data Engineering service. The architecture of Cloudera Altus is based around a few simple but important premises — customers operating in the cloud want to have control of their data and keep it secure, all while making it easy to run analytic services on that data. It’s for this reason that Cloudera is able to bring the data warehouse to the data with Altus Analytic DB powered by Apache Impala.

A Single Shared Repository of Data in Open File Formats

Cloud object storage, such as Amazon S3 and Azure ADLS, is becoming an increasingly popular way to collect and store large amounts of data in a single location that is both scalable and cost-effective. Altus Analytic DB is modular by design and takes advantage of existing object storage. It operates directly on data in Amazon S3 to support a number of analytic use cases, including exploratory analytics over newly acquired or yet-to-be modeled data as well as data mart or data warehouse use cases on file formats optimized for analytics such as Apache Parquet.

Unlike legacy analytic databases and many other cloud-based data warehouse services where the first step involves copying data into the database, there is no need to copy data into Altus Analytic DB. Users can instead immediately begin operating on the full breadth of data in the object store. Another advantage of storing data in the customer’s account and using open file formats is that it avoids lock-in and keeps data accessible to other applications, processing engines, and services that customers may want to use (or already be using) to operate on their data.

Multiple Clusters Over Shared Data

Altus Analytic DB also leverages the cloud’s ability to provide separate but scalable storage and compute resources. This means organizations can now scale compute resources independently of data size. In fact, with Altus Analytic DB, organizations can easily provision multiple compute clusters over the shared data to enable the isolation of key analytical SQL workloads as well as providing infinite resource scalability. With Altus Analytic DB, customers have the ability to choose from a list of optimized instance types as well as the number of nodes, allowing them to pick the configuration that best meets the needs of their specific workloads.

Through the Altus console, administrators are able to get a holistic view across all these clusters, as well as terminate them on-demand to control resource costs. For those also using Altus Data Engineering, this same console provides visibility and monitoring of data processing jobs. This makes it easy to support a common use case of running ETL jobs to prepare data for analytic reporting, leveraging both Altus Data Engineering and Altus Analytic DB.

Figure 1: Cloudera Altus Analytic DB Architecture in AWS

Data Security

The Altus Analytic DB service deploys nodes running Apache Impala into an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and computing resources. Deploying into VPCs allow customers to isolate the network that is used by their Altus deployments from the rest of the networks in their AWS account, control and limit access to it via Security Groups, and change or revoke permissions at any time. Data never has to leave the customer’s cloud infrastructure.

Access to data in S3 can be controlled using AWS Identity and Access Management (IAM) and instance profiles. This means that the services running on the data never need to be provided long-lived text-based credentials, so administrators need not worry about S3 credentials being leaked.

Easy Provisioning

Cloudera Altus makes it easy to provision a cluster. In just four steps and a matter of minutes, one can have an Altus Analytic DB up and running.

Name the cluster, select the software version, and environment (eg. dev, stage, production)

Choose the instance type and number of nodes

Enter the security credentials

Click “Create Cluster”

Once the cluster is up and running, users can connect to it via JDBC or ODBC.

Figure 2: Altus Analytic DB Provisioning Page

Benefits for Both Knowledge Workers and IT

This architecture provides a number of different benefits for both knowledge workers (data analysts, data, engineers, data scientists, etc.) as well as IT professionals.

For knowledge workers, it means:

Access to all of the data quickly and easily, including the raw data not typically loaded into data warehouse

Teams can provision their own clusters with just a few clicks and work on the datasets they need, without impacting critical production reporting

Direct data access outside of Altus Analytic DB, so non-SQL analysis and access from other applications can be done

For the IT organization, it means:

Data teams get resources quickly and securely on-demand (and for however long needed) without any upfront sizing or planning

All data can reside in a single shared repository, thus eliminating the need for data movement and data silos

IT can empower knowledge workers through a self-service workflow and speed up the time to delivery for the business

Conclusion

We’re excited about the advancements that Altus Analytic DB provides to the world of cloud-based data warehousing, bringing the warehouse to the data. If you’re interested in learning more or trying out the beta, visit https://www.cloudera.com/products/altus/altus-analytic-db.html to sign up for the waitlist.