Menu

What is container?

An image is a lightweight, stand-alone, executable package that includes everything you need to run specific software. It can include code, libraries, environment variables and config files.

A container is running instance of an image. It exists in memory and runs in isolated (from host) environment. Container can access host files and port if it allowed.

Containers run applications natively on kernel. They have better performance than virtual machines because VMs access resources through a hypervisor. Containers can get native access, each one running in a discrete process, taking no more memory than any other executable.

GoldenGate 12.3.0.1.2 (or December release) was uploaded to edelivery.oracle.com. If you use GoldenGate Microservice Architecture then it is essential to download new version because it has some new features and bugs fixed.

I couldn’t find instructions how to upgrade GG 12.3 Initial Release to GG 12.3.0.1.2 December Release. So I chose the following method:

1) Install GG 12.3.0.1.2 to new directory (for example to directory /u01/app/oracle/product/gg/12.3_ma/db122_december — this is GG for DB 12c)

Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases.

Oracle Data Integrator is a transparent and heterogeneous Big Data Integration technology based on an open and lightweight ELT architecture. It runs a diverse set of workloads, including Spark, Spark Streaming and Pig transformations, to enable customers solve their most complex and time sensitive data transformation and data movement challenges. It is a core component of Oracle Data Integration solutions, integrating seamlessly with the rest of Oracle’s Data Integration and Business Application solutions

Oracle Data Integrator for Big Data provides the following benefits to customers:

It brings expanded connectivity to various Big Data source such as Apache Kafka or Cassandra

It decreases time to value for Big Data projects

It provides a future proof Big Data Integration technology investment

It streamlines and shortens the Big Data development and implementation process

Currently ODI supports

Generation of Pig Latin transformations: users can choose Pig Latin as their transformation language and execution engine for ODI mappings. Apache Pig is a platform for analyzing large data sets in Hadoop and uses the high-level language Pig Latin for expressing data analysis programs.

Generation of Spark and Spark Streaming transformations: ODI mappings can also generate PySpark. Apache Spark is a transformation engine for large-scale data processing. It provides fast in-memory processing of large data sets. Custom PySpark code can be added through user-defined functions or the table function component.

Orchestration of ODI Jobs using Oozie: users have a choice between using the traditional ODI Agent or Apache Oozie as orchestration engines for jobs such as mappings, packages, scenarios, or procedures. Apache Oozie allows fully native execution on Hadoop infrastructures without installing an ODI agent for orchestration. Users can utilize Oozie tooling to schedule, manage, and monitor ODI jobs. ODI uses Oozie’s native actions to execute Hadoop processes and conditional branching logic

You can use Oracle Data Integrator to design the ‘what’ of an integration flow and assign knowledge modules to define the ‘how’ of the flow in an extensible range of mechanisms. The ‘how’ is whether it is Oracle, Teradata, Hive, Spark, Pig, etc.

Let’s configure Oracle Data Integrator for Cloudera Hadoop. You don’t need to install any components on your Hadoop Cluster. It is enough to have remote connection to manage all jobs on Hadoop.

Many tools use Hadoop as backend for performing some jobs. For example we can use Kafka (or HDFS) as stage area for Oracle Data Integrator or GoldenGate. Usually it better to install separate node which will be used by ODI or GoldenGate exclusively because if will install them on Hadoop node then they will interference with other workload. And because Hadoop is cluster. Each node does its work and whole job is not finished until last node is finished. So caravans move at the speed of the slowest camel.

Hadoop vendors call such special node “Edge” or “Gateway”. They don’t contain any data, don’t participate in data process but host client software and Hadoop configuration. Let’s look how to install such node. I will use Cloudera distribution and Cloudera Manager as management tool.

Why do we need to configure Edge nodes using tools like Cloudera Manager or Ambari? Because software and configuration should be refreshed. We shouldn’t bother if somebody add new Kafka broker or changed Zookeeper host. That’s why management tool does this.

The Microservices Architecture (MA) for Oracle GoldenGate is a new REST API Microservices-based architecture that allows you to install, configure, monitor, and manage Oracle GoldenGate services using a web-based UI.

Really there are two versions of GoldenGate now: classic and microservice. Classic architecture has standard extract, replicat, pump and receiver. It is managed by classic ggsci. Microservice Architecture (MA) has different types of processes and managed using Admin Client or using web UI. See architecture of GoldenGate MA below

Oracle GoldenGate MA is designed with the industry-standard HTTP communication protocol and JSON data interchange format.

Classic architecture was managed using ggsci console and had weak authentication and authorization tools. Oracle GoldenGate MA has ability to verify identity using basic authentication and using SSL client certificates.

GoldenGate MA processes

Oracle GoldenGate MA uses different types of processes to perform same tasks as GoldenGate Classic. Let’s talk a little bit about new processes:

Service Manager. Something like (and replacement of) Manager process. This is watchdog for other processes.

Administration Server. Something like ggsci console. Operates as central control entity. You use it to create and manage other processes. The key feature of Administration Server is REST API which can be accesses from any HTTP or HTTPS client.

Receiver Server. Something like collector. It can receive trail files from remote server. However it replaces multiple collectors because it is multithreaded. Receiver was designed to be protocol agnostic – so it supports HTTPS, HTTP, UDT (reliable UDP) and classic GoldenGate TCP transports. By default it uses HTTPS protocol.

Distribution Server. Something like pump. But again this multithreaded process which can handle multiple trail at the same time. So it will replace multiple pumps. And again it supports multiple protocols: WebSockets for HTTPS-based streaming, which relies on SSL security, UDT, SOCKS5, HTTP. It also support Passive mode to initiate connection from remote side.

Performance Metrics Server. This is process which collects and saves information from other processes (extracts, replicats, etc). All GoldenGate processes push information to Performance Metrics Server. Now this is the only processes which writes data to GoldenGate datastore (Berkley DB). You can use Performance Metrics Server to query various metrics, view logs, process statuses, monitor system utilization, etc.

Admin Client. It is a command line utility (similar to ggsci) used to create, configure and manage processes. Admin Client uses REST API to accomplish its tasks.

Oracle has release new version of GoldenGate 12.3 in 18 August. This is very long awaited version – it postponed 2 or 3 times because of some very important new features. See some useful links for GoldenGate 12.3:

Introduction

Currently we see that Hadoop is becoming part of Enterprise Data Warehouse family. But family should be connected to each other. Sometimes we need access to Hadoop from Oracle Database. Sometimes Hadoop users need enterprise data stored in Oracle database.

Hive has very interesting concept — External Tables which allow you to define Java classes to access external database and present it as a native hive table.

GoldenGate Cloud Service is part of Oracle’s PaaS portfolio. From technical perspective it is just standard GoldenGate deployed on VM in Oracle Cloud. So same already proven architecture works in Cloud.

GGCS can be used for different cases from zero downtime migration to real-time DWH feeding. More cases like BigData and data pipeline feeding are on the way.

So what do you need to use GoldenGate Cloud Service. You should have:

database instance in cloud (DBaaS or ExadataCS)

subscription for GoldenGate Cloud Service.

storage cloud service (it used for backup)

GGCS is available as Non Metered service now. If you use GGCS Non-Metered Service then you should pay money even if your GoldenGate instance is down.

Soon GGCS will be available as a Metered Service. So it will possible to pay on per hour basis. This capability will open new cases like Dev/Test Cloud Environment Synchronization. Just imagine you have database in cloud for testing purposes. You should periodically (every week/month) synchronize it with production database. So you don’t need GGCS running for all time but run it for 2 hours every Sunday to apply captured data. This approach can save a lot of money.