Bringing a proof-of-concept project into production is only the beginning. Postproduction, Hadoop differs greatly from other information technologies. Deploy SAP or Salesforce, for example, and the transition typically means a shift into a lower-intensity "maintenance" mode, where less attention and fewer resources are required. With Hadoop, in contrast, delivery of the first production application is just the start of the journey. Trust me: Pressure will soon mount to develop new applications. And these new applications will require integration with new data sources. Your users will want to run more and more exploratory jobs.

In companies experiencing this kind of "success disaster" with Hadoop, keeping up with demand for expansion and new use cases often requires more effort than getting the initial application into production.

While there are many areas that IT managers must address to ensure the ongoing success of a Hadoop initiative, here are five challenges you should proactively address:

1. Keeping your software up to date: Hadoop is a rapidly evolving framework. Unfortunately, updating Hadoop software is challenging, especially on heavily used clusters. As a result, many people get stuck on a 3-year-old version and, before you know it, it's a huge effort to even think about upgrading. Although challenging, it's worth instituting a program of regular, incremental updates to the Hadoop software stack. To facilitate these updates, establish a frequent maintenance window for the cluster. Yes, the concept of a maintenance window feels retrograde to many IT organizations, but it's preferable to falling behind the fast-moving Hadoop ecosystem.

2. Scaling your cluster: Going from a half-rack to a full one brings one set of challenges; expanding from one rack to two brings different trials; going from two racks to four ... you get the idea. Each time you grow your cluster, there are new issues. Fortunately, Hadoop scales relatively easily, and it comes with built-in tools for common tasks like rebalancing disks. Still, the logistics of expanding the physical infrastructure can be thorny because, as your cluster grows, new tuning settings are required, and problems that didn't used to happen very often start to occur regularly (like failed disks). Critical Hadoop software services, such as your Name Node and Resource Manager, may need to be improved as well. Unfortunately, there's no silver bullet for addressing these problems. The best approach is to get ahead of the curve -- plan for expansion well before it becomes critical. One way to achieve this is to add a bit of capacity every quarter or even every month, on a regularly scheduled program.

3. Getting your security in order: In a successful Hadoop deployment, you'll find more and more users wanting access to the cluster and a corresponding demand for more and more data. You may soon outgrow the simple security and compliance mechanisms that were adequate in the early days and instead be pulled into a world of substantial complexity. Most Hadoop implementations start by using Hadoop's default security mechanisms, which provide no substantive user authentication. This may be OK initially, but over time you'll need to switch to the strong authentication provided by Kerberos. Most organizations wait too long to make this switch and instead tack up workaround measures that reduce productivity and will eventually need to be thrown away. That's a waste of time and effort. Instead, make the switch as soon as you can. Move early, "learn as you grow" with Kerberos, and don't waste time and productivity with workaround measures.

4. Supporting your users: The devil is in the details, and Hadoop has a lot of details. While Hadoop brings unprecedented power to the fingertips of your employees, it's a rather rough system to use, as you might expect from a system with its roots in the Wild West of Silicon Valley hackers. When a job fails, it can be difficult to tell if the problem is with a user's application code or in the database itself. Your developers and data scientists can waste valuable time trying to resolve arcane problems that have been solved already. Consider creating a user support system that encourages your community of developers, data scientists and Hadoop administrators to cooperatively help one another get past the rough edges of Hadoop, and take advantage of knowledge bases.

5. Keeping tabs on technology: The ecosystem surrounding Hadoop involves more than 15 open source projects, and that ecosystem is evolving rapidly. There's a constant flow of innovation, changes and updates that may impact productivity and ROI. Before deploying any new component, even for a quick evaluation, investigate its track record. Has it stayed current with the latest Hadoop release? Are there sufficient developers committed to the project? You need to be sure that slow-moving components don't prevent you from keeping your core Hadoop software updated.

Hadoop in its postproduction phase can be challenging. Its promiscuous nature means it has a powerful ability to tie disparate systems together and handle all kinds of data -- and that tends to make it a hub of activity for data scientists, software developers and system administrators. Paying attention to these five challenges will take you a good way toward ensuring that you can reap those benefits.

Tell us your tips and tricks for keeping Hadoop scaled, secure and up to date.

Welcome to
TechWeb, the IT professional's online resource for news coverage of the
information technology industry. We know technology news. Our mobile
and wireless news coverage moves as fast as wireless technology itself.
We follow all the devices you depend on to stay connected. Our software
coverage follows the multi-faceted software industry from every angle.
We've got a lock on network security and computer security issues.
We're all over the business of the Web--the Internet business--and the
engines that run it. We have our eyes and ears tuned to the players who
make and run the tools that tie us all together--Google, Microsoft,
eBay, Cisco, Yahoo, Oracle, Apple, Sony--and scores of others. And we
keep close tabs on the backbone of information technology, PC hardware.
We know PCs and Apple computers inside and out. We cover computer
technology, computer news, software news, search engine news, business
software, operating systems, and software development. Our coverage of
tech news includes a strong focus on the security business, its
attendant spyware and viruses, how security relates to wireless
technology and business networking and the security issues surrounding
RFID technology. We closely follow developments in Internet news and
Internet technology, including the spread of broadband and its effect
on Web browsers and the Web business. We watch the VoIP business, and
how VoIP technology is affecting the state of telephony in the
enterprise. And if all that isn't enough, we also track developments in
the IT industry that affect IT jobs, IT careers, and outsourcing.