Greenplum protégé brings predictive muscle to Exadata

Alpine Data Labs, a predictive analytics startup that incubated within Greenplum (now part of EMC), is expanding its support beyond the Greenplum Database and into Oracle’s Exadata appliance and the open-source PostgreSQL database. The expanded support is part of the version 2.0 release of the company’s Alpine Miner software, its first official release since launching in May.

Alpine is trying to distinguish itself from the legacy predictive analytics set such as SPSS and SAS by running entirely within companies’ analytic databases, and by eliminating the need for data scientists to achieve meaningful results.

The company’s timing couldn’t be better, as mainstream businesses are catching on to the promise of big data and will need tools that help them do it without having to make big investments in capital or personnel. We’re already seeing such startups emerge in the Hadoop space, but the trend hasn’t made its way to predictive analytics quite yet.

Co-founder and CEO Anderson Wong explained that running within massively parallel databases like EMC Greenplum and Oracle Exadata saves users time and money. That’s because users don’t have to operate separate systems for their data warehouses and predictive analytics needs, which means less capital investment and no migration of data between systems. And because the database is already built for parallel processing, Miner still processes data quickly.

Wong explained that working within the database also lets analysts run predictive models against a company’s entire dataset. Traditionally, predictive analytics requires frequent movement of data and recoding as analysts try to optimally tune their algorithms to ensure accurate results.

Alpine also tries to level the playing field in terms of who can perform predictive analytics. Wong said Miner features a visual interface and prepackaged algorithms that let business analysts — not just data scientists — run models. In fact, there’s no coding required at all, although version 2.0 does let users import or write their own custom predictive algorithms.

In future releases, Wong said Alpine Miner will be able to work with unstructured data in Hadoop environments. It’s already working with EMC and IBM on integrating Miner with their Hadoop distributions. When that capability comes to market, organizations using the EMC Greenplum HD Data Computing Appliance will be able to centralize their analytic database, Hadoop and predictive analytics within a single system.

Alpine’s history is also noteworthy. Prior to its acquisition by EMC, Greenplum commissioned employees Wong and Alpine Co-founder and CTO Yi-Ling Chen to develop a killer app to run atop massively parallel databases. They developed what became Alpine, and although it was initially tuned specifically for Greenplum, it soon became clear such a narrow scope was too limited considering how many companies also store data in Oracle databases and other environments.

So, the company emerged as Alpine Data Labs to address the broader market. Wong said it still maintains a tight relationship with EMC Greenplum, though. Their offices are next door and EMC was among the investors in Alpine’s $7.5 million Series A funding round. Other inital investors were Sierra Ventures, Mission Ventures and Sumitomo Corporation Equity Asia.