Getting a handle on Hadoop

You can think of it as an ever-inflating pink elephant. It’s either got its own space in which to grow, or it’ll just end up sucking all the air out of the room. It’s always easier to talk about the elephant in the zoo than the elephant in the room, and Hadoop is definitely a zoo-full of complex moving parts that can cause just as much damage as an enraged bull elephant, provided we drag this metaphor into the realm of data.

All that data is why you have a Hadoop cluster, after all. Even if you haven’t integrated the system into your day-to-day activities, Hadoop is nothing if not a data lake. It’s a cheap place to put the data. And that, said Mike Tuchen, CEO of Talend, is the single most important thing to remember about Hadoop: its value.

Get someone else to pay for your cluster
This should be quite easy to do, said Tuchen. In fact, he advocated that you should run, not walk, to your CIO/CEO’s office with all of your Hadoop ideas. As an enterprise executive, there is one thing that will always make you look good and get you promoted: cost savings.

Because Hadoop offers cost savings that are an order of magnitude cheaper than systems from traditional ETL and data-warehousing vendors, bringing a cluster online and replacing existing systems can turn you into a rock star, said Tuchen.

“Why care about Hadoop? It’s dramatically cheaper,” he said. “You can take a subset of your data warehouse work and offload it for a dramatically cheaper price. A lot of customers are phrasing it as data process offload and data warehousing. And when you look at it with that lens… if you add up hardware plus software from EMC, NetApp, IBM and you compare it to Hadoop, you’re talking about something that was US$30,000 or $40,000 a terabyte, to $1,000 a terabyte.”

Saving that kind of money for your company could just get you that VP position you wanted. But don’t expect this to be an overnight change. Hadoop is still a difficult system to own and operate, and it’s particularly difficult to hire for. That’s why Tip No. 2 is so important.

Train, don’t hire
If you can hire Hadoop developers and administrators, get out there and do it. If you think you know a team you can bring in-house, or if you happen to have an internal expert, put them into your Hadoop project exclusively.

Why? Because it is quite difficult to find Hadoop people. Popular job site Indeed.com shows that Hadoop has grown from a non-existent job market in 2009 to encompass 0.2% of all jobs on the site. The term has grown 225,000% since 2009. By contrast, the term “Java” is included in around 2% of all jobs posted on the site.