What's holding back Hadoop?

Hadoop -- the open-source, distributed programming framework that relies on parallel processing to store and analyze both structured and unstructured data -- has been the talk of big data for several years now. And while a recent survey of IT, business intelligence and data warehousing leaders found that 60 percent will Hadoop in production by 2016, deployment remains a daunting task.

TDWI -- which, like GCN, is owned by 1105 Media -- polled data management professionals in both the public and private sector, who reported that staff expertise and the lack of a clear business case topped their list of barriers to implementation:

Barriers to implementation

Respondents who checked each category

Inadequate skills or difficulty of finding skilled staff

42%

Lack of compelling business case

31%

Lack of business sponsorship

29%

Lack of data governance

29%

Security for Hadoop data

29%

Lack of metadata management

28%

Excessive hand coding required of Hadoop

27%

Cost of staffing Hadoop admin/development

25%

Cost of implementing a new technology

22%

Difficulty of architecting big data analytic system

22%

Immature support for ANSI-standard SQL

19%

Interoperability with existing systems or tools

19%

Software tools are few and immature

19%

Enterprise-class manageability

17%

Not enough information on how to get started

16%

Slow pace of hand-coded development

16%

Cannot make big data usable for end users

13%

Handling data in real time

13%

Existing user-defined DW architecture

12%

Poor quality of Hadoop data

11%

Software tools need higher-level language support

10%

Hadoop's high operational expenses

9%

Enterprise-class availability

9%

Other

2%

The respondents did, however, see a wide range of uses to justify the deployment efforts, including:

HDFS applications

Respondents who checked each category

Complementary extension of a data warehouse

46%

Data exploration and discovery

46%

Data staging for data warehousing and data integration

39%

Data lake

36%

Queryable archive for non-traditional data

36%

Computational platform and sandbox for analytics

33%

Enterprise data hub (for both new and traditional data)

28%

Business intelligence (reporting, dashboards)

27%

Queryable archive for traditional enterprise data

19%

Operational data store (ODS)

17%

Repository for content, records management

17%

Operational application support (apps on Hadoop data)

11%

Don't know

3%

Other

1%

And just 6 percent said Hadoop deployments were not in their organization's plans at all:

When do you expect to have HDFS in production?

- 2012 - 2014

The full report, which also includes best practices and implementation trends, is available here.

About the Authors

Troy K. Schneider is editor-in-chief of FCW and GCN.

Prior to joining 1105 Media in 2012, Schneider was the New America Foundation’s Director of Media & Technology, and before that was Managing Director for Electronic Publishing at the Atlantic Media Company. The founding editor of NationalJournal.com, Schneider also helped launch the political site PoliticsNow.com in the mid-1990s, and worked on the earliest online efforts of the Los Angeles Times and Newsday. He began his career in print journalism, and has written for a wide range of publications, including The New York Times, WashingtonPost.com, Slate, Politico, National Journal, Governing, and many of the other titles listed above.

Schneider is a graduate of Indiana University, where his emphases were journalism, business and religious studies.