Any time now, the Apache Hadoop community will declare the General Availability of Hadoop 2.0 which includes the much anticipated Apache Hadoop YARN. The YARN-based architecture of Hadoop 2 is the most significant change to Hadoop introduced in the past six years and enables Hadoop to expand from a single-purpose, batch-oriented data platform based on MapReduce into a truly multi-purpose platform supporting a wide range of data processing approaches. The general availability of YARN – which in 2.0 essentially becomes the Hadoop Operating System – promises to open up the range of ways Hadoop is used to process data.

In fact, one of the most common use cases that we see emerging from our customers is the antithesis of batch: stream processing in Hadoop. Early adopters are using stream processing to analyze some of the most common new types of data such as sensor and machine data in real time.

For some users, this means monitoring a continuous stream of server log data and taking action immediately in the case of component failure. For others, it means monitoring a stream of market data for signals and then taking action in real-time or for powering real-time analytic dashboards. This is being done on dedicated Hadoop clusters today.

A recent customer example of the need for streaming comes from UC Irvine Medical Center. The organization recently launched a new technology called SensiumVitals® to monitor and transmit patient vital signs every minute. The minute-by-minute snapshots of vital signs (4,320 per patient, per day) are the building blocks for algorithms that ultimately lead to dramatically reduced average time-to-insight.

This sensor data enables real-time predictive analytics that can allow caregivers like us to respond before a patient’s vital signs ever cross a dangerous threshold.Charles Boicey, UCI’s Informatics Solutions Architect

Enter Storm

A few weeks ago the Storm project, originally conceived and built by the team at BackType/Twitter to analyze the tweet stream in real time, became an official Apache incubation project. Over the past year or so it has enjoyed increased interest as many early adopters have embraced it as the preferred option for streaming analytics in Hadoop. Yahoo!, an unabashed Hadoop trailblazer picked up Storm late last year and started to build out Storm on YARN. At Hadoop Summit this summer they presented a use case for Storm on YARN where they realized five-second analytics windows on streaming data. And broader usage of Storm is well documented on Github.

Many applications use Storm for low-latency processing and Map/Reduce for batch processing while sharing data between Storm and Map/Reduce. By placing Storm physically closer to the data source and/or other components in the same pipeline we can reduce network transfers and in turn the total cost of acquiring the data.Andy Feng, Distinguished Architect, Yahoo!YDN : Storm-YARN Released as Open Source

An engineering commitment to deeply integrate Apache Storm with Hadoop

At Hortonworks, we know the fastest path to innovation is the open community and have dedicated our entire development model around this point: every Hortonworks developer is a contributor and every contribution is done in the open. To that end we are pleased to announce that we have initiated an engineering commitment to deeply integrate Storm with Hadoop, specifically as a supported component of the 100% Open Source Hortonworks Data Platform.

Availability of Storm with the Hortonworks Data Platform

Hortonworks will be making a preview of Storm integration available in Q4 of this year and will be including Apache Storm in the Hortonworks Data Platform in H1 of 2014.

We’re bullish about the possibilities that stream processing brings, and excited to be bringing Storm to HDP. Please drop us a line and let us know how you intend to use Storm!

Tags:

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Related Posts

BLOG

11.19.15

Creating the next generation mobile ad...

Our business in Europe continues to expand and I'm excited to share this guest blog post from Geoff Cleaves, Business Intelligence Manager at Billy Mobile a new Hortonworks customer based in Barcelona, Spain. This week at Billy Mobile we are migrating our core technology stack onto HDP 2.3 and boy are we looking forward to…

HDP in the Cloud Accelerates Symantec’s...

Symantec helps consumers and organizations secure and manage their information-driven world by protecting digital information and online transactions. The Symantec Cloud Platform team turned to Hortonworks to ingest an enormous volume of security logs, analyze that security metadata and then use that insight to protect its customers. Symantec now analyzes threat data much more quickly…

Fault tolerant Nimbus in Apache Storm

Everyday more and more new devices—smartphones, sensors, wearables, tablets, home appliances—connect together by joining the "Internet of Things.” Cisco predicts that by 2020, there will be 50 billion devices connected to Internet of Things. Naturally, they all will emit streams of data, in short intervals. Obviously, these data streams will have to be stored, will…

Introducing Availability of HDP 2.3 -...

On July 22nd, we introduced the general availability of HDP 2.3. In part 2 of this blog series, we explore notable improvements and features related to Data Access. SQL on Hadoop Spark 1.3.1 Stream Processing Systems of Engagement that scale HDP Search We are especially excited about what these data access improvements mean for our…

Announcing Apache Ranger 0.5.0

As YARN drives Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. The Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a platform for centralized security policy administration across the core enterprise security requirements of authorization, audit and data protection. On June 10th,…

Announcing Apache Slider 0.80.0

Last week, the Apache Slider community released Apache Slider 0.80.0. Although there are many new features in Slider 0.80.0, few innovations are particularly notable: Containerized application onboarding Seamless zero-downtime application upgrade Adding co-processors to app packages without reinstallation Simplified application onboarding without any packaging requirement Below are some details about these important features. For the…

Announcing HDP Developer Portal

Historically, the strength of a platform lies in the abilities of developers to learn, try, and build against the platform APIs and capabilities. As Apache Hadoop matures as a platform, it’s the creativity and efforts of the developer community that is driving the innovation that makes Hadoop a vibrant and impactful foundation of a modern…

The Power of Advanced Analytics for...

Newbold Advisors is a Hortonworks® services partner that works with customers in the oil and gas industry. The company delivers big data analytics strategies and solutions across all segments of the oil and gas industry. I recently spoke with Ram Seetepalli, Senior Director at Newbold Advisors. We discussed the challenges facing midstream companies and how…

How to Become an Analytics-Ready Insurer

On March 25th, Josh Lee, Global Director for Insurance Marketing at Informatica and Cindy Maike, General Manager, Insurance at Hortonworks, will be joining the Insurance Journal in a webinar on “How to Become an Analytics-Ready Insurer.” Register for the Webinar on March 25th at 10am Pacific/1pm Eastern time Josh and Cindy exchange perspectives on what…