Category Archives: Oracle

The time came and it is Oracle Open World time once again. I fly San Francisco next week to join the largest technology event in the world. This will be the Complex Event Processing, Advanced Analytics and Data Warehousing year for me in Open World.

After spending a non-speaker year (2011), this year I will be presenting on Oracle Enterprise R and ODM by going over several use cases. Here are session details:

As being the embedded option of Oracle relational database engine, OWB is still my favourite ETL (or to correct ELT) tool. In addition, although Oracle positions ODI as their strategic tool for this purpose, they still keep investing on OWB at least in 11g Release 2.

As being a consultant in VLDB domain, one of the most popular questions of today is “How can I feed my data warehouse/reporting environment in real-time?” Yet another one is “How can I offload my reporting activity over my OTP environment without generating any time gap between reality and what is being reported?” In addition, I am a bit depressed to see people taking CDC (Change Data Capture) products as DR (Disaster Recovery) solutions.

It was two years ago in ACED briefing when Thomas Kurain declared that GoldenGate is Oracle’s strategic real-time integration solution. After that, I have spent quite a few time to understand merits and drawbacks of this product. I have talked customers want to use it, already using it, and suffering it. Almost all sites annoyed with the same problems

It is very hard to configure GoldenGate

It is very hard to monitor&manage GoldenGate

Oracle documentation is still not sufficient for them.

To be honest it is hard to say that they are wrong.

Last week I have read John P. Jeffries’s Oracle GoldenGate 11g Implementer’s Guide and I can easily say that it is a nice piece of material built just to make reader a successful GoldenGate implementer. There is no dictionary-based definition of GoldenGate concepts like Extract, Trail File, Data Pump, etc. as it is in Oracle formal documentations. The book is structured in “Let me show what I define above” fashion. The book is full of details to show you the way of implementing up and running GoldenGate systems. However, I will continue to write on my favorite sections.

Chapter 6: Configuring Golden Gate for HA is on how to configure GoldenGate on a RAC database. Chapter covers to integrate GoldenGate with clusterware software to enable automatic failover. I have seen customer sites still writing custom scripts for this. Therefore, this chapter is a good how to for RAC implementers.

A gentle man… An Oracle expert … An esteemed community figure…
Yes. As the reader of this Oracle focused blog, you should figure out about whom I am talking. Last week (on Thursday), we have achieved to set first Turkey Oracle User Group (TROUG) Day at Bahcesehir University.

We have just discussed on the keynote speaker as the Turkish community members and agreed on one name: Jonathan Lewis
The problem was that as being such a popular figure in community, it was almost impossible to find a proper gap for our event in Jonathan’s schedule. As I have sent the invitation to him and asked for his attendance on one Sunday afternoon, Jonathan just replied that he would be with us and perform the keynote speech. Almost a month after our mailing, Jonathan was on stage at Bahcesehir University auditorium as our keynote speaker talking “just about joining two tables”.
It was amazing that we have finally achieved with this user group day and the man whose books are my Oracle library’s masterpiece was on stage.
Thank you Jonathan personally and as the founding member of TROUG for honoring us and hope to see you soon again…

After migrating more than 120 TB of data on Exadata v2 and delivering 4 Exadata Handson Courses in Europe (Germany, Russia, and Belgium), last week my friend Zekeriya Besiroglu told me that they have recently opened up an Exadata certification exam. I have taken the exam last week and pass with 93% score.

Just to guide you guys who will take this exam, it seems that I/O Resource Manager is the most important topic of the exam although this is not true for Exadata customers in my region :)

I believe sometimes all of us suffer from the limitations of playing with Oracle’s SH, SCOTT, etc. schemas to generate a sufficiently large playground for our tests. In this post you will find how to create your own TPC-H playground database on Linux.

Download TPC-H Data Generator (dbgen)

TPC as being the council for TPC-H benchmarks delivers a standardized data generation tool for all benchmarks. You can download this tool from http://www.tpc.org/tpch/default.asp (The version I will be using can be downloaded from http://www.tpc.org/tpch/spec/tpch_2_12_0_b5.zip). This bundle contains a bunch of C files to be compiled to form dbgen. Copy the zip file into one of your folders and ensure that your Linux environment has the necessary toolkit to compile C language (gcc, make, etc.)

Generate Your Playgroun Data

Now you are ready to create your playground database. For those of you who are not familiar with TPC-H model, refer to below Relational Model to have an idea of what it looks like.

TPC-H Data Model

Use dbgen to generate a 4G of TPC-H benchmark data. In order to be able to load generated files in parallel by using Oracle External tables, we will be using file split feature of dbgen (Remember that this step might take some time and can be parallelized depending on your CPU & I/O capacity):

When you are done with all 8 executions you will have *tbl* files in your current working directory. Those are pipe separated files which you will be loading into your database.

DBGEN Options

–s 4 specifies that we are using a scale factor of 4 meaning that we are generating approximately 4GB of benchmark data. –S 1 instructs dbgen to generate first of 8 chunks. –C 8 is the total number of files for each large dataset (excluding nation and region tables). –v is setting the verbosity for dbgen.

DBGEN Output

In total you will see that all *tbl* files will be approximately 4 GB in size.

[oracle@localhost tpch]$ du -ch *.tbl* | tail -1
4.2G total

A good idea is to compress all those files with gzip so that they will consume minimum disk space and optimize read I/O in case of CPU power abundance.

I have been in Sweden (in a japanese spa hotel near Stockholm City) between Monday and Wednesday to join ORCAN event. Thanks to Patrik Norlander and his friends, the event was really perfect. I had two presentations and joined presentations of other ACEs and experts.

I spent my time in talking with Dan Morgan on a possible Turkey Oracle User Group Event, with Jose Senegacnik on Oracle and planes, with Dimitri Gielis whether APEX 4.0 is sufficiently mature to grow large scale applicaitons, and finally with Luca Canali about recent Oracle Streams projects in CERN.

After attending the CDC implementation session of CERN team in UKOUG 2009, new features of Oracle Streams technology introduced with 11g got my attention. While searching for a suitable resource, I came across with this extremely helpful resource.

The problem about many Oracle books is that they either paraphrase Tahiti (or Oracle My Support notes although it is illegal) or they are built on some pseudo examples generated just to create a problem to find a solution.This book is definitely an exception and it is not for my book-shelf but for my briefcase.

Thanks to Ann L. R. McKinnell and Eric Yen start with a few warm up chapter (Chapter 1) explaining the underlying concepts of streaming idea and its proper usage, Oracle CDC components, and a brief introduction to XStreams which will be detailed in Chapter 6.

Chapter 2 is for database architects who are responsible with designing the replication system such that it will work smoothly for their business. There is an invaluable check list including almost everything that should be taken into consideration before starting.

Chapter 3 is a kind of implementation chapter of Chapter 2. In order to satisfy the checklist given in previous chapter, this chapter defines the necessary configuration details.

Chapter 4, Chapter 5, and Chapter 6 explains different ways of replication in detail. Keep in mind that you can read Chapter 5 online.

Chapter 7 and Chapter 8 are my favorite ones and I believe those are the reasons why this book is an excellent reference for all implementors. Chapter 7 is explaining the importance of documentation in a replication environment and explain how you can automatically generate your environment map and how you can gather performance data with Oracle utility packages. Chapter 8 is all about troubleshooting in Oracle Streams environment. I think this is the most important part because people keep on changing what you have implemented. The methodology and toolkit to track,diagnose, and solve a problem in your streaming environment is put very clearly in two sub-chapters and 13 bulletins.

To sum up, Oracle 11g Streams Implementor’s Guide is a really niche reference for not only those try to implement an Oracle CDC environment but also wish to understand essence of replication concepts (almost all are the same with slight changes in terminology and the way they have been implemented).