Vinisha Vyasa is a software engineer at Cask, where she is developing software to enable big-data developers to quickly build data-centric applications. Prior to Cask, she worked on Container Monitoring as a Research Engineer at Ericsson.

We are very happy to introduce the general availability of the 4th generation of Cask’s flagship product – CDAP 4. This release builds on what we learned over the past few years from our users and the community. This post summarizes the major enhancements in CDAP 4, namely, New & Revamped User Experience, Cask’s “Big Data App Store”, Cask Market, the new Cask Wrangler Extension, Cask Hydrator Enhancements, and New Platform Features & Improvements.

New & Revamped User Experience

CDAP 4 contains a completely revamped user experience that focuses on improving speed and productivity by thoughtful reduction of the number of clicks users need to make to get things done. The new user experience is centered around three major themes:

I. Jump Button and Fast Actions

One of the unique aspects of CDAP is the combination of the platform and multiple extensions, each providing different functionality across the same datasets. The Jump button allows you to “jump” to various parts of the product for a given entity. For example, jump to lineage information in Cask Tracker from CDAP search results or create a pipeline in Cask Hydrator from a dataset you are analyzing in Cask Tracker. Additionally, we provide fast actions that give single-click access to frequently used actions for various entity types. These fast actions icons are customized for each entity. For example, you have fast actions like exploring a dataset or deleting an application.

II. Global Navigation Refresh and Plus Button

CDAP and its extensions now have a consistent look and feel for navigation. We have put a global “Plus” button so you can get to the Cask Market and Resource Center from no matter where you are.

III. Cleaner Card Views

The new UI presents card views of all entities including Applications, Streams, and Datasets . This allows users to view and filter the entities that they are interested in, get a quick snapshot, and instantly access the jump button and fast actions.

There are other user improvements such as a “spotlight search”, easy navigation via keyboard shortcuts, an improved management screen for viewing operational stats of Hadoop ecosystem components, and a splash screen for “new user” onboarding experience.

Cask’s “Big Data App Store”, Cask Market

Cask provides an ecosystem of pre-built big data solutions, reusable templates, and plugins via our new big data app store, Cask Market. Within CDAP, users can access the market and deploy pre-built Hadoop solutions and big data applications with easy to use guided wizards. Enterprises can create their own internal ecosystem by hosting a private instance of Cask Market, fostering discoverability and reusability in their controlled environment.

Cask Wrangler Extension

One recurring theme across our customers is the pain experienced by data scientists and data engineers when performing data preparation like loading, cleansing, and transforming data. While the custom transformations operator in Cask Hydrator in our earlier release solved some of that problem, the new Cask Wrangler’s simple and interactive way now makes the data preparation process not only easier but more fun too. The result of the wrangling process is a set of rules and output schema which can be seamlessly integrated into a Cask Hydrator production pipeline.

Cask Hydrator Enhancements

I. Pipeline Preview

One of the most common requests from users is the need to actively debug pipelines with real, live data. We have introduced a feature to preview data pipelines within Cask Hydrator without deploying them, meaning, you can see the data as it flows through to create, debug, and fine-tune data pipelines. This allows users to create, enable, and deploy data pipelines correctly, significantly improving time to value.

II. New Pipeline Plugins

CDAP 4 also introduces a variety of new plugins including plugin to ingest and process Mainframe data, plugin for Amazon Kinesis, and plugin to Stream files in batch and Spark Streaming. We are also working with our customers to improve the traditional form of ingesting data via MapReduce using JDBC, and have introduced new Cask Hydrator plugins for faster export of data from relational databases like Oracle.

New Platform Features & Improvements

I. Transactional Messaging System

CDAP 4 introduces a foundational transactional messaging system for reliable messaging between different CDAP components and programs. This will enable many upcoming use cases that both the platform and programs need like reliably publishing and subscribing audit log messages for audit trail and lineage computation. With ACID transactional guarantees, combined with simple and easy-to-use APIs ensures messages can be published and consumed reliably with consistent, exactly-once delivery semantics.

II. Operational & Management Stats

CDAP 4 provides greater visibility into the components that CDAP relies on such as HDFS, YARN, HBase to bring in all the relevant operational metrics in a management user interface. This prevents administrators from having to switch over to multiple different UIs for getting relevant stats to efficiently manage and monitor these components operationally. These stats can also potentially help to debug issues with CDAP applications. Since Operational Stats are implemented as extensions, CDAP users can now bring in metrics from any system that they are dependent on and view them in the management screen of the new UI, by implementing a simple Operational Stats API. In addition, the stats are also published to JMX, so users can also monitor them using external tools such as JConsole and Ganglia.