HP aims to shrink big data

Hewlett-Packard wants to help organizations get rid of their useless data, all the information that is no longer needed yet still takes up expensive space on storage servers.

The company’s Autonomy unit has released a new module, called Autonomy Legacy Data Cleanup, that can delete data automatically based on the material’s age and other factors, according to Joe Garber, who is the Autonomy vice president of information governance.

Hewlett-Packard announced the new software, along with a number of other updates and new services, at its HP Discover conference, being held this week in Las Vegas,

For this year’s conference, HP will focus on “products, strategies and solutions that allow our customers to take command of their data that has value, and monetize that information,” June Manley, HP’s director of big data solutions.

The company is pitching Autonomy Legacy Data Cleanup for eliminating no-longer-relevant data in old SharePoint sites and in email repositories. The software requires the new version of Autonomy’s policy engine, ControlPoint 4.0.

HP Autonomy Legacy Data Cleanup evaluates whether to delete a file based on several factors, Garber said. One factor is the age of the material. If an organization has an information governance policy of only keeping data for seven years, for example, the software will delete any data older than seven years. It will root out and delete duplicate data. Some data is not worth saving, such as system files. Those can be deleted as well. It can also consider how much the data is being accessed by employees: Less consulted data is more suitable for deletion.

Administrators can set other controls as well. If used in conjunction with the indexing and categorization capabilities in Autonomy’s Idol data analysis platform, the new software can eliminate clusters of data on a specific topic. “You apply policies to broad swaths of data based on some conceptual analysis you are able to do on the back end,” Garber said.

HP has made a number of other announcements at the conference as well, most of which revolve around big data analysis.

The company is releasing a free version of its Vertica Analytics analytic database. Users can deploy the software to analyze as much as a terabyte of data, at no charge, and with no time limit. The company has offered a version of Vertica at no cost before, but that version only had a subset of all the software’s features. The full functionality might help organizations to test different ways of analyzing their data before embarking on the creation of a new data analysis system, Manley said.

HP also announced a new marketing initiative, called HAVEn, to help organizations think about how to build big data analysis systems. The company says HAVEn stands for HP Autonomy, HP Vertica, HP ArcSight and HP Operations Management, all of which are HP software products that can help harvest and collect data.

The “n” stands for the potential number of systems that could be built from any combination of these software packages, an idea that the company will promote at the conference. For example, the data from HP Operations Analytics, which monitors the operational health of IT systems, could be paired with Vertica, to more quickly pinpoint trouble spots in the system.

HP Technology Services has expanded its consulting services to help organizations build systems that can do big data analysis as well. More expertise has been added to help organizations with issues around IT strategy and architecture, system infrastructure and data protection, especially when they involve implementing Hadoop.

“Most of our clients are ready to adopt Hadoop but don’t really have the right infrastructure,” Manley said. The new services “enable customers to put a strategy around Hadoop, and to ensure they have the right architecture and infrastructure to get the most out of Hadoop.”

This article was updated on June 12 to correctly identified the HP executive quoted.