Flexible exploration – View data through dimensions such as time, product and geography, and across measures like revenue and quantity.

Predictive analytics – Use advanced statistical algorithms such as classification, regression, clustering and association rules to spot trends or challenges ahead of time.

Pentaho Visual MapReduce: Scalable in-Hadoop ExecutionPentaho’s Java-based data integration engine works with the Hadoop cache for automatic deployment as a MapReduce task across every data node in a Hadoop cluster, making use of the massive parallel processing power of Hadoop.Pentaho can natively connect to Hadoop in the following ways:

Multi-Threaded Engine for Faster ExecutionThe Pentaho Data Integration engine is multi-threaded, with each step in a job executing on one or more threads. Multi-core processors running on each data node of the cluster are fully leveraged, eliminating the need for specialized, multi-threaded programming techniques.In addition, the Pentaho Data Integration engine executes as a single MapReduce task, instead of the multiple tasks typical of machine-generated or hand-coded programs and Pig scripts. As a result, Pentaho MapReduce jobs typically execute many times faster than other methods.The table below compares performing common Hadoop tasks using traditional MapReduce programming skills and Pentaho’s visual interface:

Together, these components can be combined to enable visual assembly of powerful job flow logic across multiple jobs and data sources. Pentaho provides graphical drag-and-drop components for Hadoop ecosystem projects such as Sqoop and Oozie, drastically reducing the amount of time needed to use these powerful bulk-data load and workflow utilities:

Sqoop – A tool designed for efficiently transferring bulk data between Hadoop and structured data stores, such as relational databases.

Oozie – An open-source workflow/coordination service that manages data processing jobs for Hadoop.

Deep Support for HadoopPentaho fully supports the leading Hadoop-based distributions and supports native capabilities, such as MapR’s NFS high performance mountable file system. Several distributions of Hadoop are available as open source projects and from commercial providers.