Oracle Big Data Appliance Featuring Hadoop Released

Oracle this week officially released the Oracle Big Data Appliance, its new "engineered system" that tightly bundles servers and software into a unified system.

The system combines full rack configurations of Oracle Sun servers with the Cloudera distribution of the Apache Hadoop software framework, the Cloudera Manager admin and management console, and an open source distribution of the R programming language (for statistical computing and graphics).

Oracle's Executive Vice President of Product Development Thomas Kurian previewed the Big Data Appliance in October at his company's annual OpenWorld conference. Oracle's plan to include a NoSQL DB generated a lot of buzz. NoSQL, the non-relational, distributed, schema-free, open-source, horizontally scalable DBs that emerged around 2009, have been getting attention as the most effective database for the Web, the cloud, and mobile computing. There are quite a few of them out there: Google, Amazon, Facebook and LinkedIn all have NoSQL databases.

But the Cloudera collaboration to create a system that makes Apache Hadoop work with Oracle's product stack is really the centerpiece of this announcement. The two companies are working together to provide support for the Big Data Appliance, Cloudera's co-founder and CEO Mike Olson said. The combination is "a natural and highly complementary fit," he said in a statement.

Palo Alto, Calif.-based Cloudera is a provider of Hadoop system management tools and support services. It's Hadoop distro, dubbed the Cloudera Distribution Including Apache Hadoop, or CDH, is a data management platform that combines a number of components, including support for the Hive and Pig languages, the Apache Zookeeper distro coordination service, the Flume service for collecting and aggregating log and event data, Sqoop for RDMS integration, the Mahout library of machine learning algorithms and the Oozie server-based workflow engine, among others. The CDH is available as a free download.

The Hadoop Framework is an increasingly popular, Java-based, open-source platform for data-intensive distributed computing. In a nutshell, it's a system that can analyze a large amount of data in a small amount of time. At its core, it's a combination of Google's MapReduce and the Hadoop Distributed File System (HDFS). MapReduce is a programming model for processing and generating large data sets. It supports parallel computations over large data sets on unreliable computer clusters. HDFS is designed to scale to petabytes of storage and to run on top of the file systems of the underlying OS.

Oracle is offering its Big Data Appliance in full rack configurations of 18 Sun servers. Each rack will provide 864 gigabytes of main memory, 216 CPU cores, and 10 gigabit-per-second Ethernet data center connectivity, among other features. The system scales via connections of multiple racks linked through the InfiniBand network.

In a related announcement, Oracle released its Big Data Connectors, software designed to allow users to integrate data stored in Hadoop and Oracle NoSQL DBs with Oracle Database 11g. The software bundle combines the Oracle Loader for Hadoop, the Oracle Data Integrator Application Adopter for Hadoop, Oracle Connector for HDFS and Oracle Connector R.