Friday, June 22, 2012

Replicate from Oracle to MySQL *without* GoldenGate

Intro

Oracle is widely use to support back-end systems.  On the other hand, MySQL is the "go-to" data management solution for the web-facing part of many businesses.  If you have both Oracle and MySQL in-house, you may already also have the need to share data between them.  In this article I'll describe software that my colleagues and I have been working on to move data from Oracle to MySQL in real-time without costing an arm and a leg.

Tungsten to the Rescue!

Latest Tungsten Replicator has many features, most of which are open-source, but the recent one for me is particularly exciting - thanks to the development done by my colleague Stephane Giron in the extraction part, Tungsten Replicator learned to get new transactions from Oracle. This is very powerful, as it opened up a possibility for us to implement replication from Oracle to MySQL. 

Disclaimer

Tungsten licensing differs depending on which direction you need to replicate to:
  • Replicating from MySQL to Oracle is open-source (though we do encourage to use consulting and support services).
  • On the other hand, replicating from Oracle to MySQL needs a reasonably priced closed-source component.
Supported Oracle Versions

Tungsten Oracle extractor requires Oracle Change Data Capture (CDC).

We support the following Oracle versions: 10g Release 5 (10.2.0.5) and 11g.

It works with both Standard Edition (SE) and Enterprise Edition (EE).

Change Data Capture Modes

Tungsten supports two CDC modes:
  1. Synchronous CDC, which uses triggers. As transactions execute DML, CDC triggers are called. In effect, CDC becomes part of the transaction. Note, that triggers are set only on the tables that are replicated.
  2. Asynchronous CDC HotLog, which requires database-level supplemental logging to be enabled. It uses data sent to the redo log files, hence has no direct effect on the original transaction.
Data Type Limitations

In general, Tungsten can extract everything that CDC can provide. However, this means that the following types can't be replicated, as they are not supported by the CDC itself.

Not supported by both Synchronous and Asynchronous CDC:
  • BFILE
  • LONG
  • ROWID
  • UROWID
  • object types (for example, XMLType)
In addtition, Synchronous CDC doesn't support:
  • BLOB
  • CLOB
  • NCLOB
Just to clarify, replication out of Oracle XML DB is not supported. Other than that, all usual suspects are handled.

The Recipe

Here are the steps required to setup replication from Oracle to MySQL:
  1. Choose a database to replicate.
  2. Identify tables that require replication and make sure all their column types are supported.
  3. Create corresponding tables in MySQL.
  4. Get Tungsten.
  5. Execute CDC preparation script that Tungsten provides.
  6. Provision MySQL tables with the initial data to have a consistent starting point.
  7. Run tungsten-installer command, which deploys Tungsten and starts replication.
Transaction Flow

The image below provides a view of how transactions travel from Oracle through Tungsten Replicator to MySQL. In this example, we are replicating data from an Oracle DEMO database:


During the initial setup, Tungsten's CDC preparation script (step 5 above) takes care of creating Oracle CDC publisher (DEMO_PUB) and subscriber (TUNGSTEN) users. This is one of the most complex parts of the setup and it is handled under-the-hood. The script subscribes to all the tables in the database to be replicated (DEMO). In effect, the created publisher user gets a change table for each of the real tables.

Below is an example of a table BIRDS in the DEMO database:



Here's a corresponding change table CT_BIRDS in the DEMO_PUB (note six metadata columns at the top and three user data columns at the bottom):


When transactions hit tables in the DEMO database, the change sets (rows that were changed by the transaction) are saved in the corresponding change-holding-table in the DEMO_PUB database. This happens within a few seconds, depending on the CDC mode being used and configuration peculiarities. Usually, replication from point-to-point happens in less than a minute.

In the mean time, Replicator is monitoring the DEMO_PUB tables as a CDC subscriber user. As soon as there's new data, OracleCDCExtractor component reads it, formats it as a standard Tungsten platform-independent row change structure and saves it in a disk-based Transaction History Log (THL). For example, an UPDATE to our BIRDS table looks like this (you will be familiar with this generic structure if you tried Tungsten Replicator for pure MySQL replication):

SEQ# = 86 / FRAG# = 0 (last frag)
- TIME = 2012-05-31 11:08:44.0
- EPOCH# = 83
- EVENTID = 962670580
- SOURCEID = dwh.example.com
- METADATA = [service=default;shard=DEMO]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- SQL(0) =
 - ACTION = UPDATE
 - SCHEMA = DEMO
 - TABLE = BIRDS
 - ROW# = 0
  - COL(1: ID) = 1
  - COL(2: NAME) = The Spirit Swans
  - COL(3: NOTE) = Continuent
  - KEY(1: ID) = 1
  - KEY(2: NAME) = Swans
  - KEY(3: NOTE) = Continuent


As soon as transaction is in the master's THL, even if Oracle stops working, THL serves accumulated transactions to the slave Replicator. As soon as Oracle is back, master Replicator knows which transactions it already extracted and continues from that point. In a similar way, slave Replicator is restart and crash safe too.

When transaction event reaches the slave, it is applied to the underlying MySQL. Applying to MySQL incorporates a few filters. For example, often Oracle uses upper case letters for table names, while MySQL uses lower case - there's a filter to change the case on the fly.

Overall, Replicator does not differentiate between Oracle generated events or MySQL ones, hence many of the Replicator functions work. For example, you could easily add more than one MySQL slave, filter out tables that are not needed, or do some data obscuring, in case you are replicating sensitive data. 

Note about DDL

The SQL dialect differences between different DBMS vendors are major, hence in heterogeneous topologies (like replicating from MySQL to Oracle) we usually recommend (a) setting up Tungsten filters which remove DDL statements from the queue completely or (b) if the DDL statement set is known, write a filter which translates one dialect to the other.

However, in the Oracle to MySQL case DDL statements are not seen at all. This is driven by the fact that CDC doesn't catch them. So, adding, altering or removing a table is done manually and should be added to your application upgrade process.

Anyone interested in an alternative for Oracle to Oracle replication?

I would like to finish this article with an exciting teaser, if you will. If you have been following progress on our work, you already know that Tungsten knows how to apply transactions to Oracle. So, now that we have both Oracle applier and Oracle extractor, it opens up another possibility - to do Oracle to Oracle replication. As the matter of fact, I'm working on this project right now. If you happen to be in need for this, let us know.

Conclusion

We have very briefly reviewed how Tungsten replicates from Oracle to MySQL. For me heterogeneous replication is one of the most exciting database areas to work in and I wanted you to know where to find the "GoldenGate without the Price Tag".

P.S.

If you're planning to visit MySQL Connect 2012 this September, come on over to hear Robert Hodges and me showing more of this in action (read: live demos) in a talk titled "Replicating from MySQL to Oracle Database and Back Again".