hydrator

This summer as an intern at Cask, I had the opportunity to work on Cask Hydrator. Since its launch in 2015, Cask Hydrator has been a broadly used and important application on CDAP to help users easily build and run big data pipelines. I helped evolve Hydrator further by adding the Action function to it. … Read more

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more

I am very excited to announce the release of Cask Data Application Platform (CDAP) version 3.5. The focus for CDAP 3.5 is security, with a number of significant new capabilities added to the platform, in addition to major improvements to the Extensions, Cask Hydrator and Cask Tracker. CDAP 3.5 introduces authorization to the platform with … Read more

Data lakes comprise of unstructured, semi-structured and structured data. A common practice for a lot of companies is to bring in data from relational data stores so that various consumers can use it. In this blog post, we will describe how to easily and quickly build a data pipeline with Cask Hydrator. We will take … Read more

Cask Hydrator lets you easily create ETL pipelines through a simple drag and drop user interface. We’ve found that our users like the simplicity of Hydrator, but often want to create pipelines that are more complex than simple transformations. For example, you may want to remove duplicate data, count how many records satisfy some criteria, … Read more

I am very happy to announce the general availability of our flagship product, the Cask Data Application Platform (CDAP), version 3.4. This release introduces a fresh new look for Cask Hydrator, and improvements to it that extend beyond data ingestion use cases, such as building aggregations and performing data science on the ingested data. The … Read more

I am very excited to announce the release of version 3.3.0 of the Cask Data Application Platform (CDAP). This release of CDAP includes new functionality and improvements to CDAP Metadata, Cask Hydrator, as well as improving the overall installation experience. It also adds support for CDH 5.5. CDAP Metadata Improvements CDAP allows annotating various CDAP … Read more

Front-End Engineering has evolved significantly over the past couple of years. We have seen the rise and fall of JavaScript frameworks – and new frameworks are getting introduced every day. Among the contenders, Angular framework has become one of the most popular frameworks. With their $scope double-bind, you do not have to worry about syncing … Read more

Before every CDAP release, we at Cask conduct an internal hackathon to use CDAP and work on interesting features. A few Cask engineers got together and, wanting to open up the capabilities of Cask Hydrator beyond Java developers, decided to build a transformation that uses user-written Python. Beginning with CDAP release 3.2, the CDAP UI … Read more

We introduced Cask Hydrator to provide an easy way for users to build a data lake. Users can create ETL pipelines by simply choosing a source, one or more sinks, and optional transforms. When we were designing Hydrator, we wanted to make sure that it was easy to use; users should be able to configure … Read more