The first record is a standard trail file header and is of little interest to us. We're going to take look at the second record. First of all, as you can see, we've got a fully qualified table name stored in there. We can see the inserted value as well. Now if all that stuff got stored in plain text that could explain it... let's check it out:

Indeed, we've got a fully qualified table name appearing 10 thousand times on our trail file! The longer your schema and/or your table names are the bigger your trail file will be. The main reason Oracle Streams are using object identifiers (as well as Oracle's redo) is to avoid getting into exactly this kind of trouble. This makes GoldenGate configuration easier as it doesn't have to store the mapping information on the target database but you also have to pay the price with every row modification which gets captured.

The story doesn't end there, however. Let's take a look at the inserted data. How do you think that got stored?

That's right - all numbers got stored in plain text. The fundamental problem here is that trail file does not store data types. Everything just gets converted to strings. This helps dealing with heterogeneous data sources but also makes the storage for certain data types to be very inefficient. Again, you have to pay the price somewhere. If you gzip the above trail file you'll get a whopping 25x compression ratio.

You may ask yourself how column names are being stored? The answer is there are no column names in the trail file, just column positions. That's why GoldenGate requires column order to match between source and target databases, otherwise you'll get your data mapped to a wrong column. Add a lack of data types and you might be in for a very interesting results. If column order doesn't match then you'll have to use defgen to get your data across properly.

In a nutshell, watch out for tables/schemas with long names and lots of numeric data.

Page 289 in reference guide (http://download.oracle.com/docs/cd/E18101_01/doc.1111/e17791.pdf) provides a formula for calculating TCPBUFSIZE value but I haven't done any performance tests to see how much real world impact it has. I'd imagine that on networks with big latencies larger buffer should provide more benefits so it depends on the circumstances.

GG uses zlib for data compression so you should see some good benefits, especially in the corner cases like above.

I have a question about Flashback Data Archive tables in Goldengate replication. Are these tables usually excluded in an extract like "TABLEEXCLUDE .SYS_FBA_*"? What is the approach for replicating and initial load of Flash Back Data Archive tables?Here is my problem. I exported a user schema using Oracle DataPump, imported into destination database. GoldenGate abended saying some tables dont exist on dest. I checked tables and there were about 200 tables that were not exported because they are FBDA tables and DataPump just ignores them. So, I recreated them on source with scripts.So, how is this done? Source and destination have their own FBDA and these tables should not be neither recreated with scripts on dest or replicated? Or they have to be replicated with contents?