Data Replication is the Cure

To bring the discussion full circle, though, our original focus was on big data and the fact that even with an implementation of an analytical application (especially one developed using Hadoop), there were still scenarios in which data access latency proves to be a performance bottleneck. Luckily, the same data replication principles we discussed last week can be put to use in supporting big data analytics.

When implemented the right way, data replication services can provide extremely rapid access mechanisms for pulling data from one or more sources and creating copies that can satisfy different business analytics needs, especially in a mixed workload. Separate replicas can be used for complex queries, knowing that rapid data access can reduce the drag on multi-way joins. Data replication also enables faster reporting, thereby accelerating the analyze/review/decision-making cycle.

Data replication can enable real-time capture of transactions within an analytical environment, which provides synchronized visibility without increasing demand on production transaction systems. In turn, the replication process eliminates the need for batch extracts and loading into the data warehouse, allowing decisions to be synchronized with current data.

And from the big data perspective, replication allows data sets to be rapidly cloned and propagated to a number of different targets, which maps nicely to the Hadoop computational model. Pushing data out might mean a combination of distribution (of some of the data) and multiple copies of others – a feat that is easily done by a data replication framework. So in fact, we do have a strategy that can somewhat accommodate the demand for reducing data access latency for big data analysis. I will discuss this topic more on May 23 for the Information-Management.com EspressoShot webinar, Treating Big Data Performance Woes with the Data Replication Cure.