Physical Vs. Virtual: Oracle, Others Redefine Appliances

While IT rushes to virtualize applications, high-end databases are increasingly moving against the tide, from software to hardware.

Big Data, Big Hardware

The market for integrated database systems, known collectively as data warehouse appliances, is booming, driven by companies that need to manage explosive growth in data volumes and by vendors embracing massively parallel processing (MPP), which spreads processing across the many CPU cores of their appliances.

Spats over terminology also shed some light. Oracle eschews the "warehouse" and "appliance" monikers, saying that a data warehouse is fundamentally no different from any other database and that the word "appliance" is a misnomer. The former claim is dubious in light of Oracle's lack of a dedicated data warehouse offering, but the company might have a point about "appliance." If you think of a data warehouse appliance as a simple plug-and-play device, you'll get a rude awakening given the integration and customization required. Exadata, for instance, has to be configured for either OLAP or OLTP. Still, most vendors of data warehousing appliances say customers can get their products up and running in days, where conventional databases take weeks or months. And as with any dedicated hardware, there's a trade-off between (relative) simplicity and (relative) flexibility. A data warehouse appliance is more flexible and complex than a typical 1U networking box, but it's simpler and less versatile than a full-featured database.

One area where IT can gain an edge with an appliance is in giving business units the power to mine large amounts of new and historical data. Today's increased storage capacity means that companies can save every mouse click or GPS coordinate, but such a trove is valuable only if it's accessible to analytics applications. This is where appliances shine: With software and hardware designed specifically for reading large volumes, data that might previously have been archived to removable media becomes available for analysis.

Most integrated systems target data warehousing. Teradata pioneered the concept, convincing customers that a data warehouse is different enough from a conventional OLTP system to require a separate installation. Netezza and Greenplum had a similar vision, and these startups made inroads by cutting the cost of tightly integrated hardware and software. Retailers use data warehouse appliances to track customer buying patterns, mostly to improve marketing and demand forecasts. Banks use these appliances to detect fraud, phone companies to plan cellular coverage based on calling patterns, and airlines to price fares. Any task that depends on demand forecasting will likely benefit from a data warehouse.

Warehouse vs. OLTP

Key differences:

OLTP processes fresh data, which usually means small data sets and a d roughly equal proportion of reads and writes.

Data warehousing usually handles older data, which implies a much larger volume and a far higher proportion of reads vs. writes.

Before building or buying a database system, ask:

What quantity of data is involved?

What type of operations will be performed?

Will it be used for data warehousing online transaction processing?

Exadata is designed to run any Oracle database, and the company pitches Exadata as the optimum platform for both OLTP and data warehousing--even though it started out as a data warehouse platform, and Oracle still compares it with rivals' data warehouse appliances. Oracle's competitors counter that the design approach that lets Exadata perform both functions has compromised its ability to handle data warehousing.

"It's a diminution of the importance of data warehousing and analytics," says Luke Lonergan, VP and CTO of EMC Data Computing, the division responsible for the Greenplum technology EMC acquired last year. Rivals don't dispute Exadata's ability to handle OLTP, but they do say data warehousing is a fundamentally different problem, pointing to enormous storage requirements and a need for fast reads. Oracle retorts that warehousing is neither that different nor that big a deal. "It's an easier problem," says Tim Shetler, VP of Exadata product management. Still, Oracle is trying to enhance its data warehousing chops by moving aggressively into storage. Its goal is to expand the capacity of Exadata so that a warehouse can scale to analyze data that previously would have been archived to removable media.

So who's right? Data warehousing specialists such as EMC Greenplum, IBM Netezza, and Teradata have scores of reference customers analyzing hundreds of terabytes, even petabytes, of data. Oracle says more than 1,000 customers have deployed Exadata (for both OLTP and OLAP), but we haven't seen it publish data warehousing references breaking into the hundreds of terabytes. Ultimately, the only way to know if an appliance will handle your workloads is through testing. Most would-be data warehouse appliance customers have already outgrown conventional deployments of leading databases such as Oracle, IBM DB2, and Microsoft SQL Server and need the scalability and MPP power of appliances. "Nobody buys Teradata for the fun of it," says Ed White, general manager of Teradata. "They all started with IBM and Oracle."

However, IBM's acquisition of Netezza and Oracle's launch of Exadata mean IT has another option: Migrate to an appliance while sticking with your existing vendor. Netezza is as tightly focused on data warehousing as the other specialist appliance vendors, and Oracle says that almost all Exadata customers were 11g users. Thus, whether one appliance architecture can suffice for both OLTP and data warehousing is really a question that applies only to Oracle customers. If you are, and you need the next level of performance and scalability, Exadata may be a good solution--provided you don't mind being locked into Oracle and that you can afford it. The lowest-cost Exadata full rack lists for $1 million. If you'd rather let performance, scalability, and price determine your selection, test your data and query loads on a short list of platforms. EMC Greenplum, IBM Netezza, Oracle Exadata, and Teradata are market leaders, but you might also consider Hewlett-Packard, Vertica, Infobright, Microsoft's new ParAccel, and SAP's Sybase IQ.

ITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.