HP salivates over the future brontobyte digital universe

Peddling Vertica and Autonomy wares in the yottabyte era

Common Topics

Discover 2012 The amount of digital data that the world is creating and passing around is swelling a lot faster than revenues and profits at Hewlett-Packard and its peers in the traditional IT racket, but you can't blame them for getting excited about trying to capitalize on that data explosion. You can, however, blame them for overdoing it a bit. Or maybe a megabit.

So as HP kicks off its Discover customer and partner conference in Frankfurt, Germany on Tuesday, you can expect a whole lot of hyperbole about big data and something that will come to be known as the brontobyte era. Unless, that is, unless our children decide to call it the velocibyte era after Velociraptor mongoliensis. Given that big data is as much about speed as it is size and about slicing and dicing it, Velociraptor is perhaps a more appropriate foundation for the term that comes after yottabyte and that represents one billion exabytes.

HP's eagerness to gets its fair slice of the big data racket is what drove it to acquire columnar clustered database maker Vertica back in February 2011 for an undisclosed sum and then unstructured data sifter Autonomy for a whopping $10.7bn during that crazy reorganization that former CEO Leo Apotheker did in August 2011, a month before he was shown the door.

Forget whether the price of the Autonomy deal was sane or not (everyone knows it was nuts anyway), the fact remains that Autonomy was on track to break $1bn in revenues (regardless of how that revenue might have been booked) and it was a player in the big data racket. HP needed something to peddle here alongside Vertica, which handles structured data, and that is distinct from Hadoop, which is good for certain kinds of data munching but which is not generating big bucks for anyone yet.

HP wants to do more than sell servers to a third of the world. We have moved from wrestling with kilobytes in the mainframe era to megabytes in the client/server era to gigabytes in the Internet era. Now, with social networking, cloud computing, and the big data analytics that underpins out every move on the Internet (and increasingly in the real world), we are in the zettabyte era.

Think about it, explains Andrew Joiner, who was CEO at Autonomy before HP ate it and who is general manager of emerging technology and in charge of worldwide marketing and partners for the Autonomy IDOL platform. Every 60 seconds, there are:

98,000 tweets on Twitter

695,000 status updates on Facebook

698,445 Google searches

11 million instant messages sent

168 million emails sent

217 new mobile platform users come online for the first time

, and

1,820 TB of data is created

According to HP, the Federal Bureau of Investigation and the National Security Agency together have collected yottabytes of data – that's 1 million exabytes of data – on people. (That must make for some pretty boring reading for the most part.)

The Brontobyte Era in the world of interconnected digital things

HP's premise is simple, and one that HP's own managers might do well to study. "The companies that use information better are the ones who succeed and have a competitive advantage," says Joiner. And the acquisitions of Vertica and Autonomy, among other software and storage companies, are all aimed at helping HP build the systems that can deal with lots of different types of data moving at high speed and in huge volumes.

HP wants to create systems that can physically manage and access huge volumes of data, put some context around it as well as dice it and slice it, and then help managers act on it. That's where Vertica and Autonomy come in, and both tools are being updated this week at the Discover event.

How HP sees the big data problem

First up is the Vertica 6.1 database, technically known as the Vertica Analytics Platform. "With this release, the software engineers really attacked the kernel because we want to be the fastest database in the industry," says Joiner.

Depending on what data you are chewing through and the current database you are using, HP says that Vertica 6.1 can deliver anywhere from 50X to 1,000X the performance on certain kinds of queries. Vertica 6.1 offers support for more SQL-99 functions and the ones it already supported run faster.

The parallel clustered columnar database also has deeper integration with the open source R statistical analysis programming language. Vertica 6.0 had R support, but with 6.1 there is a pre-built R language pack that comes with the database and it can make use of parameterized and polymorphic R functions to analyze data sets in a number of different formats.

Vertica 6.1 also has a connector for loading data directly from the Hadoop Distributed File System, which is the unstructured data store used to plunk data before Hadoop's MapReduce algorithms tear it to shreds. This HDFS connector uses the Vertica Copy command to move data into itself, which is apparently faster than other methods because it takes the parallel nature of Hadoop and Vertica alike into account to move data from one data store to the other. There are also connectors that link through MapReduce and Pig if you want to go that way.

Vertica 6.1 runs on clusters fired up with Red Hat Enterprise Linux 5 and 6, SUSE Linux Enterprise Server 10 and 11, Oracle Enterprise Linux 6, Debian Linux 5 and 6, and CentOS 5 and 6. On the client side, you can access it using JDBC or ODBC and there are native access methods from Perl, Python, ADO.NET (for Microsoft's .NET Framework), and a plug-in for Informatica's PowerCenter. HP has rolled up an Amazon Machine Image (AMI) of the Vertica 6.1 database, so you can throw it out onto Amazon's EC2 compute cloud if you want instead of on your own iron.

Data in human friendly formats is a big problem

Vertica handles structured data – the kind of transactional information and related data you put into a relational database – but according to Joiner, somewhere on the order of 90 per cent of the data that is used in businesses today is in "human friendly formats" such as a tweet, an email, a blog, a video, or a web page.

That's where Autonomy's Intelligent Data Operating Layer (IDOL) comes in. IDOL is not a content management system or database, but rather is a layer of software that can reach into data stored in over 1,000 different formats through more than 400 different connectors and not only put some context around that data, but provide an application environment for managing access to it.

Autonomy sold IDOL as a raw tool, called the eDiscovery Appliance, as well as creating two main application suites that ran on top of it: Legal & Compliance Performance Suite and Marketing Performance Suite. In both cases, it sifts through the data and forms what Joiner called a "statistical understanding of the information" that provides some context around the information.

The big new thing that HP is doing is pushing its Marketing Performance Suite directly to chief marketing officers as well as to the IT departments that serve them. HP is keen to tap into the increasing role of CMOs at large organizations, who will command a bigger IT budget than the traditional IT department hosting back-end and front-end systems that run the company by 2016, according to analyst estimates that HP is keeping a close eye on.

The Autonomy Marketing Performance Suite is updated today, and now includes "executive scorecard" dashboards that were developed by HP Software for its security and performance monitoring tools. These scorecards are driven by key performance indicators (KPIs in the dashboard lingo) and allow marketeers to see customer satisfaction, lead conversion, online advertising effectiveness, social impact, and search engine optimization stats pulled live from logs and through the IDOL layer.

The advertising optimization app that runs atop the Marketing Performance Suite now has bidding algorithms for social media advertising, starting of course with Facebook. The Autonomy Optimost ad app has dynamic advertising segmentation routines to detect and target new content online and push ads against it in real-time.

The marketing suite now integrates with Autonomy Web Content Management, another app that rides on IDOL, and the Hybris multichannel commerce platform and the Aurasma "augmented reality" mobile browser (think of the invasive direct retinal advertising in the Minority Report and you have the basic idea).

The Autonomy marketing app is also now integrated with HP's own Exastream customer communication management tool, which feeds personalized content to customers based on their profiles. Exastream can kick off work on Autonomy and vice versa now.

The Autonomy Legal & Compliance Performance Suite gets the executive scorecards as well, but in this case watches legal fees, document data volumes on hold and ready for deletion, compliance status of documents under review, among other metrics important to law firms.

The WorkSite Mobility 2.5 add-on now lets lawyers and legal departments create and edit briefs and other docs from mobile devices. The legal hold, eDiscovery, and archiving capabilities of the IDOL platform are now integrated and added to the legal suite as well.

IDOL is priced based on the number of data sources, the size of the information stores in the IDOL layer, and the number of applications that hit that data. It can range from as low as $1,000 per month for a setup that gathers social media data from Facebook and Twitter and does sentiment analysis on a 30-day store of their feeds to multi-million dollar IDOL installations that deal with the petabytes of data in a modern lawsuit that has armies of lawyers banging on it. ®