Wednesday, February 24, 2016

I've been working with VMware Big Data Extensions more with a couple of customers as we look at providing Hadoop as a Service (HaaS) leveraging the Serengeti API. So what is Big Data Extensions (BDE), and what is the Serengeti API, and why would I use it?

What is it?

BDE is an orchestration layer for deploying and managing Hadoop clusters. It's deployed as an OVA and registered as a plug in in the vCenter web interface. What is unique about BDE is that it allows VMware administrators to manage Hadoop clusters as a single instance, and provides all of the under the hood orchestration. Is supports both deploying the cluster as well as scaling the cluster. BDE is available to all Enterprise + ESXi customers and supported by VMware. You can get it here:

Why would I use it?

The BDE plugin is preconfigured to manage Hadoop clusters as a single instance, which is great if you are a VMware admin with access to vCenter. What happens when you need to offer HaaS to data scientists, and you don't really want to give them access to vCenter. That's where the Serengeti API comes in, we can use it to call out to BDE from another platform.

If you already leverage vRealize Automation you are in luck. VMware has pre-built a plugin pack for vRealize Automation and Orchestration to offer HaaS. You can get it from the solutions exchange here. But what happens if you use another portal? That's where the Serengeti API comes into play.

Dig into the API after the break

Through your portal you need to offer a service that authenticates with the Serengeti API, makes a call to create a new cluster, and passes the cluster information in a JSON. This let's you deploy a virtual hadoop cluster through any portal that support making calls to a RESTful API. Here are some examples of leveraging the API from curl to get you started:

Example JSON for Apache Bigtop Cluster (also on github here ). To use with a cloud portal the specific properties of the cluster (node counts, sizes, name) would be variables passed through the portal interface.

About Me

vSpecialist at EMC. VCAP-DCA / VCAP-DCD. I work for EMC but these opinions are my own. My primary focus is virtualization and data center networking. I have a loving wife, two small children, and a home in the burbs.