Getting Started with Oracle GoldenGate

February 9, 2011

In this article, we’ll take a look at what takes place within Oracle GoldenGate in terms of how data gets from the source database to the target database. As the diagram below shows, there are four steps to perform in configuring GoldenGate.

Typically, we replicate from one database to another, but in the case of a tutorial (covered later), we’ll replicate from one schema to another. That is just as valid as database to database, as there is no reason why you could not perform ETL-type of transformations on data before it is consumed elsewhere (either shipped to another database, or viewed by users in reports).

Just like Streams, you have to synchronize your databases (referred to as the source and target when the distinction matters). One way to achieve “source = target” is to restore a backup of source on the target. Another way is to perform an initial load of data, taken from the source and propagated to the target.

GoldenGate offers four ways to do an initial load, but the general guidance is to use native RDBMS tools first (e.g., restore a backup on the target system). They are more tightly coupled with the product than GoldenGate would be (which is loosely coupled, but is also the property that makes it work well in heterogeneous environments).

After the data is (relatively) synchronized, the next step involves setting up processes to capture change, pump the data to the target, and then replicate the data. Using Streams Capture – Propagate – Apply as a framework, GoldenGate’s counterpart is Extract – Data Pump – Replicat.

Overall, there is a Manager process on each end (and if replicating onto the same server, just one is needed as it can be shared by the source and target). As you can imagine by what the name implies, the Manager process is responsible for much of the background support processing. Changes in tables are extracted in one of two Extract processes. All processes have parameter files, and it is within these parameter files where you define what happens, or of more interest, which tables are being replicated. GoldenGate knows about the tables from the parameter file, and with supplemental logging turned on at the database (and table) level, GoldenGate mines the redo logs and extracts the change or DML into what is called a trail file.

You can think of a trail file as a dump file (proprietary format and all that), but more precisely, the trail file is unstructured data. So far, we have “DML vectors” captured and stored in a trail file. Now we have to get this information over to the target, and that is where a second Extract process comes into play. This process is related to the data pump, or the transport mechanism to get extracted data from the target to the source.

Speaking of the trail file for a moment, it’s not that hard to see what it contains. GoldenGate supplies a Logdump utility (located in the same directory that the GGSCI executable lives) that can be used to browse through the file. You can see the before and after images of data (useful for forensics or troubleshooting), plus you can embed tokens in the file (like using environment variables and assigning those to placeholders, think of “USER” in Oracle and how that translates to your name or logon).

The data is sent over the network (TCP/IP, totally outside of Net Services, another feature of GoldenGate where there are no single points of failure) to the target. The destination at the target is (but doesn’t have to be) another trail file. A collector service assembles the data/statements for the Replicat. The Replicat process then applies the changes (i.e., SQL statements) to the target.

GoldenGate supports up to 300 processes in a session, and if you are using that many, you will have a bookkeeping/accounting nightmare on your hands. The naming convention of trail files only allows you to specify the first two characters (the rest being a 6-digit sequence starting with 000000). When a basic “capture change, transport it, and apply it” set up is up and running, you’ll see two extracts and one Replicat process. This will be in addition to seeing a Manager process (no Manager = nothing happens).

Of course, it is more convenient to take data from A and have it fit nicely into B, so table name, column name, column order, and data type consistency would be nice. Recognizing that, GoldenGate will default to assuming you are using identical tables. If they are not identical, but the same overall, an easy workaround is to do a column mapping. It’s like adding a translator between the source and target (for example, LAST_NAME maps to LNAME). Otherwise, you can define a source definition (kind of like a template) and GoldenGate will use that definitions file (you tell it to via the process parameter file) for the mapping.

At this point, with a better understanding of the processes behind the scenes, running a demo using GoldenGate will make better sense. For each process, Extract or Replicat, the build process is the same: create a parameter file, add the process to GoldenGate (the Manager now knows about the process), and then manage it (start or stop it, with options to have a particular process autostart).

The naming convention for parameter files is E or R (Extract or Replicat), followed by three letters representing the where or what (INI, ORA, PMP for initial load Extract, data from Oracle, and pump, which represents the secondary Extract), and finally ending with an identifier (some letters or numbers, and with so many possible, you will definitely want to document them, both within the parameter file and elsewhere).

In the next article on GoldenGate, I’ll show how you can extract data from a table into a plain, old, regular flat file, as in, “How do you create a comma-separated value file using GoldenGate?” The Logdump utility will also be introduced, and we’ll look at how you can do data transformation along the way.