Case Study

Data Lake

Challenge

Building an enterprise data lake requires a reliable, repeatable and fully operational data management system, which includes ingestion, transformations, and distribution of data. It must support varied data types and formats, and it must be capable of capturing the data flow in various ways. The system must do the following:

Benefits of Cask Solution

The company’s non-Hadoop developers were able to build an end-to-end data ingestion system without training, saving time and resources.

Rapid Time to Value

Developers were able to build the data lake and get it to customers faster.

CDAP’s ingestion platform standardized and created conventions for how data is ingested, transformed and stored, allowing faster on-boarding.

Business Agility

Developers provided a self-service platform for the rest of the organization, enabling departments to use data to make better business decisions.

Scalability

CDAP was installed in eight clusters with hundreds of nodes.

Using CDAP, data lake users were able to quickly locate and access datasets and metadata, data lineage and data provenance. This allowed them to efficiently utilize their clusters, aided them in data governance and auditability and improved data quality.