Big data is not just for the big boys

It may take a while but eventually any good technology embraced by large enterprises trickles its way down to small and mid-sized businesses in some appropriately modified and re-priced form.

It will be no different for modern business analytics tools. The time could be ripe for mid-range customers to start thinking about either modernising their data warehouses or data marts if they are lucky enough to have any, or come up with a plan to install a business analytics platforms if they don't.

One of the reasons for the success of Microsoft's SQL Server relational database a decade ago is that many of the customers buying the database – as much as a third of all sales by some estimates – wanted the relatively inexpensive SQL Server to set up an online analytical processing (OLAP) server.

That initial OLAP server bundled in SQL Server 2005 opened up a new world of business intelligence.

Fast work

Today’s tools are not only much more sophisticated but are affordably priced for mid-market customers. The level of performance they offer gives these smaller companies what they need to compete in the global marketplace

Everybody is talking about big data these days, but the term is really a misnomer. Fast data is probably a better term. Companies of all sizes are wrestling with making sense of diverse structured, semi-structured and unstructured data sets to help them make quick decisions.

Dell, which does not usually get into markets if it doesn't think it can make a good profit, particularly from the small and medium businesses that it still peddles a lot of its gear to, is cooking up the Quickstart Data Warehouse Appliance. It is based on Dell’s new PowerEdge 12G servers and Microsoft's SQL Server 2012.

Dell says this will be the first data warehouse appliance out the door running the Denali SQL Server 2012. It will also depend on Dell's Boomi service for integrating transactional systems and other data sources into Quickstart.

Not much else is known about this appliance, except that it is in beta testing and is due to be launched in the second quarter of this year.

Place your bets

Meanwhile IBM is betting heavily on business analytics as a key driver of revenue over the next five years. The company is beginning to see some traction with various products in the mid range, according to Nancy Kopp, programme director for data warehousing and business analytics at IBM.

Depending on the type of data and analytical applications that hit against it, mid-market customers tend to go with one of two IBM machines right now, in the wake of Big Blue's $1.7bn acquisition of Netezza in September 2010.

In July 2009 IBM launched its Smart Systems, which are clusters of Power or x86 server nodes equipped with operating systems, IBM's General Parallel File System and Tivoli System Automation to manage each node.

Some of the nodes ran Cognos modules, including BI Server, Go Dashboard and BI Samples, and others ran IBM's InfoSphere Warehouse variant of its DB2 database, merging data warehousing and analytics all in one cluster.

"The more bundling you do, the more favours you are doing"

IBM gradually fleshed out the boxes and even created an entry machine called the Smart Analytics System 5710, which pairs up an IBM System x server with a DS3500 array and the Cognos and InfoSphere software stack, all for a $50,000 price tag and configured as an appliance for companies to dump data into and chew on it.

"The more bundling and integration you do, the more favours you are doing for the mid-market," says Kopp.

Some mid-range companies have quite large data munching jobs, and for these customers IBM has created the Smart Analytics System 7700. This uses IBM's Power 740 servers, based on the Power7 Risc processors and similar to the nodes used in IBM’s Watson machine, which competed in the TV quiz show Jeopardy! and won.

The server is configured with IBM's AIX Unix variant and InfoSphere Warehouse Enterprise Edition data warehouse plus Cognos business analytics tools for drilling into the data warehouse and extracting reports. The Smartie 7700 uses DS3500 storage arrays to house data.

IBM's Smart Analytics System 7710

There is a variant of this machine called the 7710 designed for data warehouses that are under 10TB In size, which would be particularly useful for mid-range shops. This pairs one Power 740 with three DS3500 arrays with the same InfoSphere and Cognos software stack.

Serious shopping

IBM has not yet bundled SPSS’s predictive analytics tools with the Smart Analytics Systems. These are obvious add-ons and explain why Big Blue paid $1.2bn back in July 2009 to acquire that business intelligence software firm.

And of course, IBM Power Systems shops that prefer the IBM i operating system can get the combination of the DB2 for i database and the DB2 Web Query tool, developed in conjunction with Information Builders, to build data warehouses, execute ad hoc queries and generate reports.

In Europe, a fairly large company might only need an analytics system that would qualify as a mid-range box in the US.

That is why Netezza created a cut-down version of its data warehousing appliance, called Skimmer and sold as the Netezza 100 series.

All Netezza data warehouses are based on IBM's BladeCenter x86 blade servers, but they are goosed for data warehousing and analytics by a special field programmable gate array co-processor.

Netezza created this to speed up the heavily modified PostgreSQL database that runs on top of the iron. (Netezza chose IBM iron long before it was bought by Big Blue.)

The Skimmer machine hit a $125,000 price point for 10TB of user data capacity, which was a bit more than the Smartie 5710 box but considerably less expensive than an entry Netezza 1000 appliance. This has more processing oomph and would cost about $200,000 for a similar configuration.

There is a possible cloudy future to analytics in the mid range, and IBM could be pointing the way. In February, the vendor completed its $440m cash acquisition of retail analytics software provider DemandTec.

Here's the interesting bit: DemandTec offered its software on private slices of its own cloud, which was backed by Netezza iron, as well as allowing customers who wanted their iron and software on premise, and had the cash to pay for such a luxury, to bring it inside the corporate firewall.

Google has the answer

But mid-market companies that want to do sophisticated analytics may not want to own the iron so much as run the algorithms against their data.

That is certainly what Google thinks will happen for many customers, which is why it has launched BigQuery. It is in beta testing now and available on an invitation-only basis.

Google says the BigQuery engine will be able to scan billions of rows of data in seconds and scale across terabytes of data and trillions of records – and use an SQL-like query language to kick off the data munching.

And if companies don't want to get their hands dirty sorting out BigQuery, then there will be service providers that sit in front of them, masking the complexity.

Bime me up

We Are Cloud, a startup founded in southern France by Rachel Delacour and Nicolas Raspal, has created a front-end for BigQuery called Bime (pronounced "beam") as a business analytics tool that runs on Amazon's Web Services compute cloud and stores data in Google BigQuery.

The Bime service comes in workgroup, enterprise and premium editions, costing a mere $60, $120, or $240 per month, with ten users and varying features on dashboards, connectors, storage and dataset row counts.

The company has 200 customers, most of them are outside of France, and the service is available in French, English, Dutch and Chinese, with other languages in the works. It is designed for sharing data and query results through dashboards and other graphical representations.

"Traditional on-premise business intelligence tools are not inherently collaborative or cost effective," Delacour said, introducing the Bime front-end at the recent Structure Data 2012 conference in New York.

"Cloud solutions are, even though they are not necessarily good at delivering performance on all data sets."

It probably beats trying to do business intelligence in Excel, which is what most mid-market customers are still trying to do. ®