High Performance Analytics

Marry Big Data Analytics to High Performance Computing, and you get the buzzword of this season- High Performance Analytics.

It basically consists of Parallelized code to run in parallel on custom hardware, in -database analytics for speed, and cloud computing /high performance computing environments. On an operational level, it consists of software (as in analytics) partnering with software (as in databases, Map reduce, Hadoop) plus some hardware (HP or IBM mostly). It is considered a high margin , highly profitable, business with small number of deals compared to say desktop licenses.

As per HPC Wire- which is a great tool/newsletter to keep updated on HPC , SAS Institute has been busy on this front partnering with EMC Greenplum and TeraData (who also acquired SAS Partner AsterData to gain a much needed foot in the MR/SQL space) -while Revolution Analytics has been trying out a partnership with IBM (via it’s acquisition of NetEzza)

SAS is considered the undisputed leader in advanced analytics — that according to IDC who, in 2009, pegged the company with a 34.7 percent market share in this category. A subset of business analytics, advanced analytics uses compute-intensive data mining and statistical software techniques to extract complex relationships from databases. For SAS, it’s a half a billion dollar business.

and

In early April, SAS demonstrated the power of high performance analytics at its Global Forum meeting. In the first case, two racks (16 nodes) of Greenplum’s Data Computing Appliance (DCA) were used to run a logistic regression of bank loan defaults across a database with a billion records, applying just a few variables. The regression was able to complete in less than 80 seconds (as compared to 20 hours for an unspecified serial implementation). Another demonstration, this time on a 24-node Teradata platform, used 1,800 variables applied to 50 million observations. In this case, the analysis finished in 42 seconds.

You should read the complete article – it is an excellently written article on how technology should be written about, with complete details of hardware and software across two platforms, and very less lazy copy and paste from briefings, deck, PR as some other tech journalists are often prone to do.

An additional resource for keeping track of database technologies is DBMS2 written by Curt Monash, what i really like is Curt takes time to climb down from the pundit’s pulpit and explains coherently and concisely in terms people like me can understand. You should read the full article- this is just a summary.

Revolution Analytics plans in High Performance Analytics, seem to consist of first finding and then partnering with people trying to convince them of their enhanced offering ( which makes sense in an enterprise software deals worth millions- Revolution provides timely support while the R Email help group probably needs a corporate /academic split ). Same with putting the archives and Stack overflow on a R Wiki (which is not updated as much as some other parts of the project are).

I would probably happier if Revolution Analytics uses/navigates Amazon to create a paid AMI image on EC2 for their complete stack- that would also be a good offering and experiment to using High Performance Analytics as a service ( though Amazon just went down recently for an extended outage, crashing lots of clouds in US East and West, overall SLA performance continues to be great by Amazon (and hopefully the soon coming Google Storage and Prediction APIs et al)