Last week, as part of the HDF 3.1 Blog Series, we talked about support for Apache Kafka 1.0 and the powerful HDF integrations including Apache NiFi’s Kafka processors, Apache Ambari for provisioning/management/monitoring and Ranger for access control policies and audit for Apache Kafka.

Today, in this fourth part of the series, we discuss the innovations added to Hortonworks Streaming Analytics Manager, aka SAM, specifically around tooling for developers to test streaming analytics apps.

Customers are Building Streaming Apps Faster with SAM

Last summer when SAM was unveiled as part of HDF 3.0, the fundamental problem we were trying to solve for our customers was to help them build streaming analytics app faster. It was to address the following sentiment expressed by so many of our customers:

“Using NiFi with its rich UI has been a refreshingly delightful experience for us as we build flow management applications. However, we desperately need the same type of experience when building streaming analytics apps. Flow management only gets us halfway there. We need a rich UI to build analytical apps that operate on the stream.”

As our customers have started to use SAM to build streaming analytics apps in different verticals ranging from transportation, healthcare to insurance, we are seeing app dev teams and business analysts being able to deliver value to the business faster.

To demonstrate this, lets build off the trucking company’s use case that we presented in the last blog. This trucking company wants to build real-time data flow apps to ingest the streams, perform routing, transformations, enrichment and deliver them to downstream consumers for streaming analytics. In the previous blog, we discussed how Apache MiNiFi, NiFi and Kafka combined can implement the flow requirements of edge data collection, routing, transformation, enrichment and delivery of the streams to downstream consumers for streaming analytics. SAM can then be used to implement the streaming analytics requirements like the following:

The below showcases how SAM implements each of these requirements.

As the above SAM app showcases, building complex streaming analytics apps using constructs like joins across streams, aggregations over time windows, enrichment, normalization and executing machine learning models becomes easier.

SAM’s New ‘Test Mode’

A common challenge that we often hear from app dev teams who specialize in implementing streaming applications is the following:

“It’s difficult to test my streaming analytics apps locally before deploying to a cluster. There needs to be better tooling to help developers with unit and integration testing of streaming apps.”

SAM’s new Test Mode solves this problem by enabling developers to test SAM apps by mocking out sources using test data and stubbing out the destination sinks.

To showcase Sam’s Test Mode, assume for the above truck-streaming-analytics-app, we have the following assertions we need to test.

The following demonstrates how to create the test case in SAM to validate the assertions.

When the test case is executed, SAM displays the output at each component/processor in the app as it flows across your application. This enables the developer to validate the outputs visually for different test cases. The following is the result of SAM test case execution.

What Do Customers Really Want? Automated Unit Tests, CI & CD.

As the above diagram illustrates, SAM’s Test Mode allows the developer to validate/test visually before deploying to a streaming cluster. The feedback from customers has been that using SAM Test Mode is helpful for testing but what they really want are following:

JUnit Tests – Be able to write JUnit tests using SAM Test mode to programmatically validate the assertions.

Continuous Delivery (CD) – Deliver to business new features/improvements in a continuous fashion.

Writing Unit Tests with SAM’s Test Mode REST

SAM addresses each of these needs since all the capabilities exposed in SAM are powered and exposed via SAM REST services. This includes SAM Test Mode. Hence, the seven assertions above can be written as a JUnit test using SAM Test Mode’s RESTful services as shown below.

Creating CI and CD Pipelines using SAM REST

Most enterprise organizations have standards on continuous integration and delivery pipelines for custom applications to increase software quality and decrease the time to market. One of the fundamental design principles of SAM is to expose all capabilities via REST. This allows customers to easily build CI and CD pipelines for SAM applications.

The CI/CD pipeline can be implemented with SAM REST using Jenkins. The following demonstrates this.

For more details on each of the CI & CD steps outlined above, see the following artifacts:

The following is the result of a CI/CD Jenkins pipeline execution for the trucking streaming analytics app.

In Summary and Whats Next?

With SAM, it becomes incredibly easier to build SAM apps. With SAM Test Mode, the developer can test/validate the app visually before deploying to the cluster. With SAM REST, teams can build automated unit tests, continuous integration and delivery pipelines to meet the needs of the enterprise. Next week, we will talk about the new NiFi and Atlas integrations that was added in HDF 3.1. Stay Tuned!

Facebook

Data Science Tidings is a leading media platform for Data Science Evangelists and entrepreneurs, dedicated to delivering interesting innovative curated stories from the Data Science world. It aims to provide useful and latest curated feed on Data Science. It is a great destination to find the most fresh updates and murky strategies you have missed.