Docker Ships HDP Into the Cloud

SequenceIQ is a new Hortonworks Technology Partner and recently achieved HDP and YARN Ready certification for Cloudbreak, the SequenceIQs Hadoop as a Service API. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), describes provisioning and autoscaling HDP cluster with Cloudbreak.

During our daily work at SequenceIQ, we are provisioning HDP clusters on different environments. Be it for a random cloud provider or on bare metal, we were looking for a common solution to automate and speed up the process. Welcome Docker—this case study shows how easy it is to provision and autoscaling HDP cluster using Cloudbreak.

What is the origin of the name Cloudbreak?

Cloudbreak is a powerful left surf that breaks over a coral reef, a mile off southwest the island of Tavarua, Fiji. Cloudbreak the product is a cloud agnostic Hadoop as a Service API that abstracts the provisioning and ease of management and monitoring of on-demand clusters.

The provisioning challenge

As we have discussed in a previous blog post, we use Apache Ambari quite a lot and have built toolsets around (Ambari Shell, Ambari REST client) and contributed these back to the Ambari community. While with our contribution we were able to automate most of the HDP provisioning, the infrastructure part was still a missing piece. We needed to find a way to be able to use the same process, toolset and API’s to provision HDP – literarily anywhere. We were among the first Docker adopters and started to use the “containerized” version of the HDP sandbox to ease our development process – and from there it was only a step away to have a fully functional Docker based HDP cluster on bare metal, initially merely for development purposes.

We commonly use different cloud providers. After we had “containerized” HDP for bare metal, we came up with the idea of Cloudbreak – the open source, cloud agnostic and autoscaling Hadoop as a service API. While Cloudbreak’s primary role is to launch on-demand Hadoop clusters in the cloud, the underlying technology actually does more. It can launch on-demand Hadoop clusters in any environment that supports Docker – in a dynamic way. There is no predefined configuration needed as all the setup, orchestration, networking and cluster membership are done dynamically.

Here are the components to the solution:

Docker containers – all the Hadoop services are installed and running inside Docker containers, and these containers are shipped between different cloud vendors, keeping Cloudbreak cloud agnostic.

Apache Ambari – to declaratively define a Hadoop cluster.

Serf – for cluster membership, failure detection, and orchestration that is decentralized, fault-tolerant and highly available for dynamic clusters.

Autoscaling and SLA policies

Now that we have an open source Hadoop as a Service API that runs HDP in the cloud, we moved forward, and wanted to have an open source SLA policy based autoscaling API which works with Cloudbreak or a Hadoop YARN cluster. Welcome Periscope.

Where does Periscope name come from?

Periscope is a powerful, fast, thick and top-to-bottom right-hander, eastward from Sumbawa’s famous west-coast. Timing is critical, as it needs a number of elements to align before it shows its true colors.

Obviously we have a surfing related naming theme here!

Periscope the product brings QoS and autoscaling to Hadoop YARN. Built on cloud resource management and YARN schedulers, it allows to associate SLA policies to applications.

We followed up the same route as we did with Ambari: identified the key components and features we considered that would better fit into the Apache Hadoop YARN codebase and contributed there. The API allows configuring metric based alarms and creates SLA scaling policies to dynamically adjust the size of your HDP cluster.

Beside the policies, we provide a visual monitoring dashboard – collecting over 400 metrics from the cluster from different sources (RM, timeline/history server, Metrics2 sinks). End users can drill down at node or component level and identify problems and view logs, by using the default queries or configuring custom ones.

DevOps toolsets, resources

When we start a project, we always approach it from a very strong DevOps perspective. It was the same for Cloudbreak and Periscope, and we have created toolsets to ease and automate your HDP cluster provision on any environment.

Try it out

We have a hosted version of Cloudbreak where you can create your arbitrary size HDP cluster with support for the full stack on your favorite cloud provider. Give it a try and let us know how it works for you. Provisioning a HDP cluster has never been easier and faster – and the options to do so are listed above (UI, REST client, CLI shell, REST calls). Stay tuned, as we will be announcing cool things with the next release as well come up with a follow up post with deeper technical details.

Tags:

Your email address will not be published. Required fields are marked *

Comment

Name*

Email*

Related Posts

BLOG

9.8.16

An introduction to Ambari Views 2.4...

Originally posted in HCC. Ambari Views Server is the Standalone Ambari Server used for hosting Views and Ambari Server is the Operational Ambari Server which manages a Hadoop Cluster Before Ambari 2.4, when Ambari Views Servers are setup, the only way to configure views was to use ‘Custom Configuration’. In this method details had to…

Top Articles on Apache Hadoop --...

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC An introduction to Ambari Views 2.4 new feature- Remote cluster configuration by:abilgi This article discusses this new feature. Ambari Views Server is the Standalone Ambari Server…

Announcing Apache Ambari 2.4

We are pleased to announce the latest release of Apache Ambari 2.4 which further simplifies Hadoop Operations. With Ambari 2.4 (which is part of the recently released Hortonworks Data Platform 2.5, enterprises can plan, install and securely configure the Hortonworks Data Platform and easily provide ongoing maintenance and management. This new release includes an integrated…

Advanced Metrics Visualization Dashboarding with Apache...

At Hortonworks, we work with hundreds of enterprises to ensure they get the most out of Apache Hadoop and the Hortonworks Data Platform. A critical part of making that possible is ensuring operators can quickly identify the root cause if something goes wrong. A few weeks ago, we presented our vision for Streamlining Apache…

Apache Ambari Hackfest on a Serene...

Hackathons, Hackfest, and Codefests have an initial air of invincibility. They challenge participants, even veterans—not if the attendees work together or if the community collaborates and innovates together. That air of invincibility quickly dissipates. Last Saturday, because of such camaraderie and collaboration, a harmony of innovative ideas flourished and came to fruition at an Ambari…

Introducing Availability of HDP 2.3 -...

Last week, on July 22nd, we announced the general availability of HDP 2.3. Of the three part blog series, the first blog summarized the key innovations in the release—ease of use & enterprise readiness and how those are helping deliver transformational outcomes—while the second blog focused on data access innovation. In this final part, we…

Available Now: HDP 2.3

We are very pleased to announce that Hortonworks Data Platform (HDP) Version 2.3 is now generally available for download. HDP 2.3 brings numerous enhancements across all elements of the platform spanning data access to security to governance. This version delivers a compelling new user experience, making it easier than ever before to “do Hadoop” and…

Introducing Hortonworks SmartSense

The components in a modern data architecture vary from one enterprise to the next and the mix changes over time. Many of our Hortonworks subscribers need support ensuring that their Hortonworks Data Platform (HDP) clusters are optimally configured. This means that they need proactive, intelligent cluster analysis. As businesses onboard new workloads to the platform,…

Driving Business Transformation with Open Enterprise...

Hadoop isn’t optional for today’s enterprises—that much is clear. But as companies race to get control over the significantly growing volumes of unstructured data in their organizations, they’ve been less certain about the right way to put Hadoop to work in their environment. We’ve already seen a variety of wrong approaches with proprietary extensions that…