Big Data, Big Team: The Making of Control-M for Hadoop

In July 2012 I participated in a BMC Big Data Summit that took place in BMC headquarters in Houston, Texas. I met architects from many of the BMC product lines and we discussed Hadoop and Big Data opportunities along with customers use cases, challenges and needs (I’ve described some of the Workload Automation use cases before in an article published in the Enterprise Systems Media magazine). This is the “behind the scenes” story of the team that made Control-M for Hadoop a reality.

As soon as we had the green light for the project, R&D started the technical research and we identified together the potential content for the first release. With the Big Data market and the Hadoop ecosystem being so dynamic, we wanted to ensure that what we developed actually addressed customer needs. We validated both the potential content and the use cases with customers in North America, EMEA and AP, and adjusted our plans according to their input. The support for HDFS file watching for example was added to the content based on such feedback.

We learned what type of challenges companies experienced when using Hadoop specific schedulers (such as Oozie) and realized that we can deliver immediate value by offering the ability to manage Hadoop batch jobs with the same power and ease of the “traditional” enterprise processing. Control-M for Hadoop allows application developers to focus on developing Hadoop programs rather than wasting time with writing and debugging wrapper scripts that schedule those programs.

The main challenge that customers shared with us was the ability to integrate Hadoop jobs with data integration activities, analytics tasks, and with file transfers – types of integration for which Control-M was specifically designed. Proactive notification on missed SLAs and self-service offering for application developers were asked for as well. Having Control-M able to easily integrate mainframe and distributed tasks was also a key factor for those customers who shifted data processing from DB2 to Hadoop in order to reduce MF processing costs.

We have always been passionate about learning new technologies but the team excitement increased to a new level after the discussions we had with our customers and the understanding of the value we would be providing them. We learned that we had customers running Hadoop jobs with Control-M for years, using homegrown wrapper scripts, but looked for a tighter, more “native” integration that would reduce effort and risk. We helped them eliminate these scripts by replacing them with jobs defined from a simple and powerful graphical user interface.

We started the research on a couple of single-node Hadoop clusters but this wasn’t enough for us. We knew that our customers’ Hadoop environments were more complex, and that Control-M for Hadoop must support these environments. We tested various Hadoop distributions in multi nodes cluster configurations. Our virtualized infrastructure allowed us to provision Linux instances quickly and the feedback from our customers helped us to configure the Hadoop clusters in a way that is as similar as possible to their environments.

Being involved with the first BMC Big Data initiative was the thing that got me excited the most. Other product lines are now following us with developing additional offerings around Big Data, but we got to be the leading team. We’ve been “playing” with Hadoop for a couple of years now and finally reached a point where the market demand for enterprise support justified the development costs. Now that the first release of Control-M for Hadoop is available and customers are adopting it, we have a larger community to get feedback from and we are already looking into additional use cases for the next release.

I love the idea of new and innovative technology that is mostly batch oriented. Over the years we heard people saying that IT is turning to a completely online driven approach but the truth is the exact opposite. It’s like saying that the mainframe is dying…

Analyst predictions on the Big Data market growth encourage us to invest in additional research in Big Data technologies. NoSQL databases and in memory databases (such as SAP HANA) are now on the table as well, next to the social, mobile and cloud initiatives. We are also witnessing a trend of silo Big Data exploratory projects turning into enterprise-wide Big Data initiatives. Our customers are looking for tools to support such a shift.

Communication is always a key factor in these types of projects. We all learned the new technology together, shared customer inputs and worked collectively to ensure we met our project deadlines and quality standards. In fact, we were able to complete the project ahead of time. The alignment of all participating teams including documentation and support to the dynamic nature of the project was all I could ask for. When I saw the Times Square ad and Control-M for Hadoop on the www.bmc.com front page I couldn’t be more proud and felt privileged to be part of the team.

The feedback from the customers that evaluated preGA releases of Control-M for Hadoop helped us to design and execute the testing use cases. We made sure that our testing coverage included the same platforms and configurations that those customers use and they in return now have a much more stable and robust solution that meets their needs, IT standards and configurations. The learning curve of the new technology was relatively short due to the fact we’ve been there with R&D and product management since the beginning of the project. We participated in the technical research, the discussions with the customers and the release specifications planning. This was a truly team effort. We ended up with a shorter release cycle and eventually a better product.

Working on Control-M for Hadoop let me do two of my favorite things as a solutions marketer – bring a new product to market AND in a new market area. I began researching the big data market opportunities well over a year ago. With all of the initial big data processing being batch (scheduled), it was a perfect fit for Control-M and just a matter of the right time.

Reaching out to customers, understanding their needs, and then working with product management, sales, and other stakeholders in the company – it was all great fun, and of course challenging as well.

The best part of the entire effort was working closely with customers to understand their business needs. Every customer I spoke with was using Hadoop and big data to learn more about their business –to make better informed decisions. Each of them was passionate and excited about the opportunities they were finding to offer better and even personalized service, new products, and improve their business operations. Their excitement was infectious.

Memorable moments? The first meeting with a customer, which was MetaScale, and understanding just what a game-changer big data really is for businesses. The first conversation with a sales rep — they called Hadoop “Hoopla” the entire conversation making me realize how important training would be. And throughout — working with my team-mate Joe Goldberg who was relentless and remarkable the entire time.

Want to Learn More About Big Data and What It Can Do for You?

BMC recently published an authoritative guide on big data automation. It’s called Managing Big Data Workflows for Dummies. Download now and learn to manage big data workflows to increase the value of enterprise data.

Share This Post

Tom Geva is an IT Workload Automation expert. Tom served as a Control-M product manager for more than 10 years and was responsible for translating the workload automation market requirements gathered from customers and analysts into product and business strategies, roadmap and specifications. Tom is deeply involved in the development process of all the Control-M releases. As a Sr. Solution Marketing Manager he frequently speaks in conferences and workload automation events. Tom holds 19 years of experience in the IT industry. Prior to joining BMC Software in 2001, Tom spent 4 years as a production control manager in the Israeli Defense Force and was responsible for workload automation and mainframe education services.