pivotalguru.com | greenplumguru.com

Replication

Overview

Replicating data to Greenplum is very simple with Outsourcer.

A challenge in many organizations is keeping a data warehouse or analytics platform up to date with Production data. Refreshing terabytes worth of data on a daily basis is time consuming and puts a huge burden on source systems.

Demo

By using Outsourcer’s Replication feature, you can keep tables up to date in Greenplum by only applying the new changes rather than taking a snapshot of the entire table every time!

How it works

Outsourcer takes an initial snapshot of the table from the source and loads it into Greenplum. When it takes the snapshot, it also creates a log table and triggers in your source system to capture the INSERT, UPDATE, and DELETE commands executed in your SQL Server or Oracle system. These changes are identified and bulk loaded into Greenplum and then applied through a sophisticated and Greenplum optimized function. This means only the changes are applied each time you execute the job to load your Greenplum table!

Other Change Data Capture (CDC) Programs

Other Change Data Capture (CDC) methods are half-cooked solutions. The tools will read database log files directly instead of using triggers which on paper, looks like a better solution. Unfortunately, these tools are not multi-threaded so it doesn’t utilize all of the CPU cores in your server. It will also make a CPU core go 100% while it processes the log files.

Another problem with other CDC solutions is the application of the changes. Greenplum is not optimized for Online Transaction Processing activities and will perform much better with large, set based operations. This means you will have to develop a solution to take files generated by CDC tools, load it into Greenplum, and then apply the INSERT, UPDATE, and DELETE steps in an optimized manner.

Lastly, the CDC programs are expensive and time consuming to learn and implement.

With Outsourcer, the triggers are executed by each database session so it is multi-threaded and the extra work is spread more evenly through the day so no CPU core spiking. No coding is needed to use Replication. Last but not least, Outsourcer is free!

Outsourcer provides a free, fast, reliable, and fully-cooked method to Replicate data from SQL Server and Oracle to Greenplum!