Technical details, ideas and news on data warehousing and big data from the Oracle Team

Data Warehouse Vendor Comparison

Spent some time looking at various vendors and their offerings and tried to compare it with the HP Oracle Database Machine offering from a customer perspective... In other words, what would I be looking at if I was looking at my data warehouse infrastructure.

Scalability – Can your infrastructure not just store the required data, but also scale out to service the business with information at the required performance, both today and next year? Can you scale out in terms of users and can your infrastructure handle the needs for these users even if they are working concurrently on these systems?

Agility – Can your system deal with changing requirements, with mixed workloads that include loading data while querying, and returning the right answers? Can you easily switch to real-time data loading for some of your data and support operational needs in your business?

Enterprise Readiness – Does your infrastructure provide the functionality to always keep your business running. Is the infrastructure delivering the security you need, in terms of who can see what data, but also in terms of disaster recovery, and fraudulent manipulation of data? Is your system running when you need it, and does the infrastructure deliver maximum availability to run your business 24*7?

Appliances and How They Stack Up Against Our System Criteria

Most, if not all, of these new, small vendors are using proprietary hardware platforms using massive hardware to deal with the data growth. As a business tactic they then benchmark a single query as proof of their superiority over the current data warehouse database. While claims of 100 times faster sound impressive, it is not actually that useful to run such a “benchmark”.

First of all, the benchmark is not comparing apples to apples. Often an old system, typically undersized or out of balance, is compared with the latest and very much over-estimated appliance hardware for this benchmark.

Second, the response times on the incumbent system are almost always with full user and query loads. E.g. hundreds of users querying on the system while the trickle feed ETL process is running. The so-called super fast appliance runs only a single query with only a single user on static data.

Third, most of the current appliances are really one-trick ponies. The trick is to use massive hardware resources to solve a problem with brute force. Long-running data warehouse queries will benefit from this approach, hence the benchmark results. The real problems come into play when the appliance has to ensure that the data is secure, or when a mixed workload has to be run. Now, all of a sudden the appliance starts to run into problems. Write processes create dirty data polluting queries with incorrect results. Locks prevent users from completing their query, or even from starting one. As more and more users come on-line, contention for resources starts to become an issue, quickly reducing the effectiveness of the system.
What is needed for these workloads is a full solution consisting of the brainy database software on top of brawny hardware, not just a lot of hardware.

Ease of Deployment

Another angle often claimed as an advantage for an appliance is the ease of deployment. As a customer you purchase a completely configured hardware and software solution, you wheel it into the data center and you start working on it. While this is probably true for most of the appliances to some extent, what comes next is not so pretty.

Your system administrators and the DBAs will all of a sudden have to deal with different hardware, different database software and maybe even a different operating system than the ones they use for all your other systems. As the novelty wears off, these appliances soon become orphans of the data center.

With data centers becoming leaner, these orphans will require a set of new skills and specialists to maintain them. Probably not what your cost-conscious CIO wants to hear.

Cost of Upgrading

While it may be easy to deploy an appliance in your data center initially, once it is time to upgrade – due to that ever present data growth – you may get stuck with an interesting invoice. Most appliances do not come with perpetual licenses. Most appliances however do require a forklift upgrade, which means that you will need to buy new licenses for this new machine. So you do not just end up buying new hardware, you also re-purchase the software you already own.

How Viable is The Company

Data warehouse appliance vendors have been popping up like mushrooms. A lot of these vendors are very small with few real customers. Research and Development, support capabilities and overall viability are in a different league from market leaders like Oracle. In today’s economy, having invested a multi-million dollar sum in one of these orphans of the data center runs the serious risk of seeing your critical system running on un-supported hardware and software.

Vendor comparison

When comparing [the] unique capabilities of Oracle Database 11g and the HP Oracle Database Machine with some of the vendors in the data warehouse market [see graph] we see a distinct segmentation. The so-called general-purpose database offerings from Oracle and others deliver a much more universal value proposition. Appliances like Netezza really are one-dimensional offerings and even in that claimed specialty fall short of the market leaders.

Hi Jean-Pierre
I have to disagree most strongly with some of your assertions. Whilst Oracle, IBM and Co are certainly enterprise-ready relational databases, they are by no means a match for the specialist data warehouse products on the market these days. Products like Teradata and Netezza undeniably outperform Oracle and Co and are certainly easier to maintain. Your positioning of a product like Teradata as less scalable and less enterprise-ready as Oracle is mind-boggling given that Teradata scales linearly which is by no means true of Oracle, and the fact that Teradata is of course enterprise-ready. Your comments sound like a religious argument from someone indoctrinated by the big named historical players.

Hi Malcolm,
First of all my apologies for the late approval of the comment, I was on a short holiday break. Anyways, I am trying to not be religious about anything here, but trying to look beyond niche markets. What I mean is that we see the DW market shifting towards more mixed workloads, towards real time data processing and data management. Netezza is lacking a lot of the required capabilities to deal with these new(er) requirement and so quite frankly is Teradata and Microsoft. The biggest problems these systems have is the writes and reads at the same time, in other words locking. Running a query when updating data will give different results as the data progresses into the tables. Other issues are high availability solutions (and the price of these). With the large data volumes, things like compression are also being more and more important. Both Teradata and Netezza have solutions but are certainly not in the leadership there...
So there are a variety of solutions required to be - in my opinion - enterprise ready. Netezza certainly has a long way to go. Teradata is of course further along on some areas but - and yes size does matter here - lags in others. To me the total picture and the versatility of a platform is important. It is the sum of the parts that make the solution work, not just a small subset of features that deal with a specific pre-defined workload...
JP

SAP BW is an interesting piece of the puzzle in an SAP account (no one of course has it without having R/3 in place).
The way I look at BW (I think it is now called BI) is as an embedded reporting area for R/3. You'll have it and sometimes even need it to run standard reporting from your ERP system. I do not see BW as a data warehouse or analytic environment. The biggest issue with BW is that there is no 3rd party or applications content in it...
I've seen various ways of getting analytic content out of SAP. The two extremes are: 1) do not use BW ever, 2) use it as the only solution. Number 2 tends to fail as there is not enough in the solution to replace a data warehouse. Number 1 is often a lot of work and so I've seen people use BW as a source for their data warehouse. That way you make use of some of the business logic in the batch loads to BW and move some of that data. Then augment it with data from other systems and create your data warehouse.
There are, specifically for SAP, out-of-the-box solution available that produce a DW based on R/3 in a snap. Most of these focus on the ETL process and the business logic required to get the data out. That business logic and ETL is the hard part... One such solution is here (http://www.newfrontiers.com/quickstart/2/quickstart+general).
On the performance side, have not seen any hard numbers. On Oracle, the underlying structures do use a lot of the Oracle techniques to make it go fast, so it should be reasonable. I think however that the functional parts of the product should drive the choice in DW technology. And like I said, I'm not convinced BW is the be-all-end-all data warehouse solution...
JP