DBTA Hadoop Webcast Now Available on Demand

Database Trends and Applications recently hosted an educational webcast to explain how organizations can extract business intelligence and business value from large and complex data, with Apache Hadoop.

In a recent Unisphere Research study conducted for the Independent Oracle Users Group, nearly two thirds of respondents reported more than 5 terabytes of data online and 20% reported more than 100 terabytes data, said DBTA's Wilson, pointing out that as recently as earlier this past decade, a single terabyte environment was considered unusual enough to be reported as news. With data stores now growing at a breakneck speed, users have begun to search for new ways of integrating information to provide a single view of the business and to delivery actionable information for business decision makers, he noted.

While in the past, the ability to deploy and manage Hadoop was restricted to teams of software engineers, Cloudera and Pentaho have teamed up to provide solutions that enable the data intensive enterprise to leverage Hadoop's distributed storage and processing framework to perform sophisticated analysis and transformations of structured and unstructured data. Using the joint solution, organizations can combine a wide range of data sources and provide end users with powerful BI reports, dashboards and ad hoc analytics.

A key attribute of Hadoop, said Cloudera's Trajman is that it provides "a very flexible system" that can store any kind of data without necessarily defining a schema ahead of time and then being able to layer schema on top of that down to a per query or per job basis so you can look at your data in different ways. It is also affordable, and has become broadly adopted, he added.

Most technologies have some kind of "up to," Trajman noted. They are good up to a certain number of servers, a certain amount of data, a certain amount of RAM. But Hadoop, he emphasized is one of the unique technologies that really has no up to. "We have yet to find whether there are any upper boundaries of capabilities for Hadoop," he said, adding, "Everything we have seen points to Hadoop really being the future of this data intensive processing that we have today."

Hadoop is not trivial to use, the interface by default is command line, and its rich attributes do come with some complexity of administration and understanding, said Trajman. For people who are looking for easier use and management of Hadoop, Cloudera provides a suite of management applications and production level support which includes authorization management, monitoring, resource management, system lifecycle management and integration for various enterprise IT applications and databases such as Oracle, Netezza, Teradata, Greenplum, and Aster Data.

According to Nicholson, Pentaho's support for Hadoop started with a number of the business intelligence vendor's existing enterprise edition customers asking for help in leveraging the technology.

"The statistics on how data is growing is really staggering. I have been in the data integration and business intelligence market for over 20 years. We have never sent the growth of data that we are dealing with today," Nicholson said, echoing Wilson's comments.

While Hadoop is infinitely scalable, "Hadoop was never designed to do business intelligence per se," Nichols noted. Typically, he said the vast majority of customers that Pentaho talks to in terms of their needs, are the hybrid model of data.

These customers want to be able crunch data in Hadoop and then apply that data set to data that they already have in some sort of a structured or semi-structured environment typically within a data mart or existing data warehouse. The "real magic" comes when you put those two analytic data sets together and have the ability to apply modern data integration BI tools against that to really get some answers that frankly were simply not "get-able" before Hadoop technology came on the forefront, observed Nicholson.

Access a replay of the webcast to hear more about how Cloudera and Pentaho improve the manageability of Hadoop deployments and provide application functionality and services for Hadoop-powered analytics. Built on Cloudera's Distribution for Hadoop, the Cloudera and Pentaho solution delivers affordable, scalable storage and analysis together with powerful, flexible data integration and business intelligence capabilities.