Oracle GoldenGate data transfer and replication technologies, including a new cloud-based service, are making it faster and easier for customers to move data in real-time from diverse sources to where it needs to be – for analytics, mobile and cloud apps. IDN speaks with Oracle’s Jeff Pollock.

Oracle GoldenGate is building on its success in real-time data replication to become a strategic technology for companies looking to deliver data for the real-time digital enterprise.

Today’s Oracle GoldenGate options are opening up opportunities for real-time analytics driven by streaming data. GoldenGate is also breaking through the speed and scale limits of ETL, and can bring legacy technologies into the new era of real-time data.

Upgrades to Oracle GoldenGate’s on-premises version are delivering more features for big data and data transformations. Beyond this, a new Oracle GoldenGate Cloud Service is letting users simplify how they work with on-prem and cloud-based data stores – all at flexible, usage-based pricing.

The result: Customers have a wide range of GoldenGate-enabled options to more quickly and easily move data in real-time from diverse sources to where it needs to be – for analytics, mobile and cloud apps, Oracle vice president of product management Jeff Pollock told IDN.

“Today’s Oracle GoldenGate offerings open up new opportunities for users looking to do real-time analytics. We support those use cases, even while the data is flowing,” Pollock explained. “It all happens in memory, before the data hits disc.”

Oracle GoldenGate’s latest features also break through some traditional limits for ETL (extract, transform and load). in speed and scale, “On top of millisecond analytics, Oracle GoldenGate can mix workloads and not just do ETL,” Pollock said.

Oracle GoldenGate, Linkedn Team Up on ‘Data Bus’ for Real-Time Analytics

One example, to show how Oracle GoldenGate supports real-time data transfer for the Apache Kafka high-throughput distributed messaging technology, is already in production.

Oracle’s GoldenGate team is working with social media giant LinkedIn, the creator of Kafka. The two are working to combine GoldenGate and Kafka to deliver a new-gen real-time data architecture called the ‘data bus,’ Pollock said. The ‘data bus’ approach lets enterprises easily create a data pipeline that supports multiple data types, and then lets users seamlessly publish, subscribe, transform and analyze that data in real-time.

In addition, Oracle GoldenGate provides crucial data integrity features for the data bus. “Audit, compliance and HA [high-availability] are so important. GoldenGate provides transactional guarantees that all the data is being delivered reliably to Kafka,” Pollock noted.

Going under the covers, Oracle GoldenGate for Big Data Kafka Handler allows users to take transactions data directly out of a database in real-time, and stream them into a Kafka environment. In turn, Kafka feeds the data into Apache Spark Streaming for ETL.

“You can now do the ETL in real time on a stream – rather than a batch. In this mode, the ETL engine now will be ‘always on.’ Because there is no batch, you don’t have to run on a schedule. The data will be transformed as it arrives (in Spark Streaming). As an output, the data can flow into enterprise data warehouse, Hadoop or NoSQL environment,” Pollock said.

Beyond super-charging the transfer of database transactions into Kafka, Oracle GoldenGate lets companies support datafeeds for multiple subscribers by leveraging Kafka’s one-to-many publish and subscribe architecture. “Kafka gives us a way to capture data once and push those database transactions to as many subscribers as need them,” Pollock told IDN. “So, it’s a fan-out architecture for consuming data for analytics, without impacting performance.”

This Oracle GoldenGate / Kafka approach provides a lot of benefit for bringing real-time data to instant analytics – not just for traditional transaction data, but for logs, streams and unstructured data sets. Pollock provided some details of the architecture and the user experience.

In the ‘data bus’ model, subscribers come into Kafka through a very simple REST API. Further, they can subscribe to just the specific datasets in which they are interested.

“Kafka uses topics, and the pattern we’re seeing so far is one table per topic. With GoldenGate, we can get data out of backend source systems using a non-invasive technology. We take data from the transaction logs, and we push it to make it broadly available through a simple REST-based web service to multiple users, where each can subscribe to any tables they’re interested in,” Pollock said.

Oracle GoldenGate’s user-friendly features go beyond APIs. “We don’t just out put raw records in a Kafka topic. We also put a JSON formatter into GoldenGate, so we are systematically converting the change record into a JSON format. This makes it very simple for a consumer to take these JSON records and use them in a light weight manner,” Pollock said, Users can even flag whether you want your data converted into JSON before it goes into Kafka, he added.

Further, Oracle GoldeGate does CDC (change data capture) with many leading databases, Oracle, SQL Server, DB2 and database from Sybase and Tandem.

The non-invasive approach GoldenGate uses is especially useful when users want to retrieve data from sources that are behind a firewall or subject to restricted access. “DBAs don’t typically want people running ad hoc SQL against their source systems,” he added.

As a final touch for IT admins and DBA, Oracle GoldenGate provides crucial data integrity features for the data bus. “Audit, compliance and HA [high-availability] are so important. GoldenGate provides replication services and more to ensure all the data is being delivered reliably to Kafka,” Pollock said.

No wonder Pollock said the Oracle / LinkedIn ‘data bus’ approach, “is a reference architecture or blueprint that we feel customers will adopt more widely.”

Oracle Sees Shift in Data Integration, Replication To Power the Real-Time Digital Enterprise

These latest use cases for Oracle GoldenGate illustrate the idea that the sector of data integration is changing to address the needs of the ‘act-now’ digital enterprise.

Pollock put the shift this way: “For the past 20-25 years, data integration has been batch or scheduler-driven. But today, now we see a push to a streams-based ETL [model] to cope with needs for real-time data and instant analytics.”

He explained it this way: “Customers are taking event streams from all sorts of endpoints, such as mobile devices, clickstreams and even application exhaust, and mashing these up with database transactions – all in real-time as they are happening. This is not just ‘A to B integration’ or active-active (high availability). Users are placing the data directly on the event stream so that it can be analyzed instantly. GoldenGate is proving to be the crucial bridge to bring database transactions in milliseconds from the DBMS engine into these event stream processing engines.”

This use case is proving so popular, Oracle will further push capabilities this summer, when it opens a beta for ODI doing ETL in Spark Streaming, he added.

On another front, Oracle GoldenGate for Big Data supports real-time streaming to Apache Flume, Hadoop (HDFS), Hive, HBase and Kafka. It can stream both transactional data and log data in real-time without degrading performance at the source system, Pollock said.

Inside Oracle GoldenGate Cloud Service

Oracle GoldenGate Cloud Service is a cloud based real-time data integration and replication service that provides a 100% feature set overlap with the on-prem version – and more, Pollock said. OGGCS is architected atop Oracle Cloud Service to provide data movement from various databases (across on-premises and cloud) in a fast, easy, seamless and reliable manner.

Flexible Pricing – To more cost-effectively deliver projects with high data volumes, flexible, lower-cost options are available including monthly leasing options

“Companies we talk to are looking to do more real-time type business with data. So, opening access to the data while it’s in motion from multiple sources – even raw data – needs to be easier and more flexible,” Pollock said, “You also need to ensure your users are using trusted data. That’s where we’re seeing things moving.”