“As use of the Hadoop stack continues to grow, organizations are asking if it is a suitable solution for data integration. Today, the answer is no. Not only are many key data integration capabilities immature or missing from the stack, but many have not been addressed in current projects.”

He says many companies are turning Hadoop into a data integration platform.

“Gartner is correct in that, Hadoop, by itself, is NOT a data integration platform,” Goldman writes. “However, it can be made into a data integration platform. Lots of companies are investing in making Hadoop based integration easier.”

Informatica did this by porting its Virtual Data Machine onto Hadoop, he adds, giving companies the same integration development environment they use for ETL jobs, with Hadoop as the underlying engine.

Not surprisingly, Informatica is not the only vendor investing in adding full data integration platform capabilities to Hadoop.

“The market in general is moving in this direction so expect to see some exciting capabilities emerging over the next six months,” he states, adding that there are companies already using a kind of graphical development environment with Hadoop — as opposed to hand-coding MapReduce jobs. Not surprisingly, they’re able to create code five times faster, he said.

If you’d like to read more about Big Data integration, check out this Big Data integration piece by Richard Daley, industry veteran and co-founder of Pentaho. Daley looks at all the tools in the Hadoop stack and discusses supporting integration for other NoSQL solutions, such as MongoDB, Cassandra and HBASE.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.

@Loraine, the right approach to leveraging Hadoop for ETL isn't to port a runtime engine to Hadoop, since that would require a pretty complex deployment on every single node. Rather, to leverage native Hadoop code (MapReduce, Pig, Hive, etc.) and let Hadoop do what it's designed for: execute this generated job leveraging the parallel architecture.
Reply

Please enable Javascript in your browser, before you post the comment! Now Javascript is disabled.

Post a comment

Your name/nickname

Your email

WebSite

Subject

(Maximum characters: 1200). You have 1200 characters left.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.