Roll your own Big Data Appliance

One of the major drawbacks of Oracle Big Data Appliance is that I don’t have one yet. Its a major drawback, because I’m itching to play with shiny new software on fairly impressive hardware.

The chances of getting my hands on 18 servers each with 12 cores, 48g RAM and 84T storage each all connected by InfiniBand are not that great. But I can play with the software, and so can you.

Unlike Oracle’s Exadata, almost every software component that is available on the Big Data Appliance is also available for download. Some components are completely OpenSource and can be used freely (Hadoop, Oracle NoSQL, Open source R), some are available for download but require a license under Oracle’s usual terms (Hadoop Connectors, Enterprise R, Oracle NoSQL) and some seem to be plainly unavailable but have a limited free version (Cloudera Manager).

So, lets roll our own Big Data appliance!

Grab a bunch of (virtual) servers to use for your cluster. EC2 servers on Amazon cloud are not a bad option.

Oracle connectors are not part of the Big Data Appliance spec, but a separate option. In my opinion, the connectors are actually one of the more exciting things Oracle is doing with Hadoop, so you should at least give Oracle’s Hadoop Loader a go. You can download them here: http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.htmlBut remember that the OTN version is for fun and games only. If you want to use it for serious work, the connector package licenses for 2000$ per CPU.

Enterprise R is more awesome because instead of reading tons of raw data from Oracle to R, potentially causing network contention on the way, it will translate your R functions to SQL and run them in the database. Kind of like storage offloading, only between the statistical analysis layer and the DB. I didn’t play with this yet, but it sounds interesting.
Again, you can play with what you download from OTN, but if you want to use it for real, it is part of Oracle Advanced Analytics option for Oracle Database 11g R2. The price for this option is $23,000 per CPU.

Now that you have your own Big Data non-appliance, you should show it off on OTN: