Tuesday, June 11, 2013

Replication Renaming Schemas, Tables and Columns

Schema definitions on replication slaves sometimes differ. Ability to rename schemas, tables and columns and still setup replication with comfort is useful for MySQL to MySQL replication, while for heterogeneous topologies like these, it's a must. In this article I'll present a component which I recently developed to enable an easy way of renaming database objects during both (a) initial slave load (provisioning) and (b) real-time replication.

It is called RenameFilter and is part of the latest Tungsten Replicator build. RenameFilter works with all DBMS types that Replicator supports.

Specifying What to Rename

First, let's look at how rename definitions file looks like. It is a CSV file which first three columns specify what to rename, while last three - what into.
# CSV format:
# originalSchema,originalTable,originalColumn,newSchema,newTable,newColumn 
There are two special characters:
* (asterisk) - match all objects (schemas, tables or columns);
               used in the search part (first three columns).
- (dash)     - leave original object name;
               used in the rename part (last three columns).
Let's look into one example in detail:
# Rename schema from schemac to schemac1.
#    Match all tables
#       |         Leave table names intact   
#       |            |
#         |            |
#         |         Leave column names intact
#      Match all columns 
It is quite flexible: it's possible to rename from one specific column in a specific table and schema all the way up to renaming all occurrences anywhere in the database.

Rename during Provisioning

First use case for this is during initial provisioning when you are preparing your slave database.

a.) For homogeneous MySQL to MySQL replication, it's a manual, though straightforward, process. After restoring a backup to the slave, we go inside the database and make needed changes (like rename schema, some tables and columns). While doing this, we log everything to the tables_to_rename.csv in format introduced above. We will then use this file while setting up replication (see one section below).

b.) For heterogeneous topologies, we'll use ddlscan tool. Let's consider a MySQL to Oracle replication topology. Tables are in MySQL master, but Oracle is completely empty without even a schema. We'll need:
  • MySQL DBMS with some tables on the master.
  • Empty Oracle DBMS on the slave.
  • Tungsten Replicator with ddl-mysql-to-oracle.vm template available for ./ddlscan utility (translates MySQL schema dialect into Oracle dialect).
  • CSV file with rename defitinions.
In the CSV we define all MySQL tables and columns that exceed 30 characters and how to rename (shorten) them, tables and columns which start with a number and how to rename these, etc. A trivial example:
Then we provide this CSV to the ./ddlscan:
  -template ddl-mysql-oracle.vm
  -rename tables_to_rename.csv
  -db test
  -out test.sql
In effect, ddlscan working together with the RenameFilter, will transform a MySQL table like this:
describe test.3table;
| Field | Type    | Null | Key | Default | Extra |
| i     | int(11) | NO   | PRI | NULL    |       |
And provide DDL SQL that is Oracle-compatible (note how table was renamed):
DROP TABLE test.freetable;
CREATE TABLE test.freetable
  i NUMBER(10, 0),

Rename during Real-Time Replication

After schema (with some objects renamed) is ready on the slave, we can configure Tungsten Replicator for replication. However, we want all transactions that happen on the master be aware of the renamed objects - eg. if there's a transaction on "3table", it should be transformed and applied into "freetable" on the slave. Luckily, Replicator can use the same rename definitions CSV file we defined during initial provisioning. All we need to do is provide it upon installation:
tungsten-installer -a \
  --master-slave \
  --datasource-user=tungsten \
  --datasource-password=secret \
  --service-name=mysql2oracle \
  --home-directory=$TUNGSTEN_DEPLOY \
  --cluster-hosts=$SLAVE_HOST \
  --datasource-type=oracle \
  --datasource-oracle-service=ORCL \
  --master-host=$MASTER_HOST \
  --svc-applier-filters=dropstatementdata,zerodate2null,rename \
  --property=replicator.filter.rename.definitionsFile=$RENAME \
When Replicator starts-up, you'll see how it parses the definition file and constructs a look-up structure in the trepsvc.log:
INFO  filter.RenameDefinitions Parsing /home/oracle/install/tables_to_rename.csv:
INFO  filter.RenameDefinitions test.3table.* -> -.freetable.-
Succeeding transactions on this table will be transformed accordingly.

More RenameFilter CSV Examples

More examples covering common cases:
# Rename schema for all tables.

# This table moved to a different schema.

# Table renamed and moved to a different schema.

# Rename table in a particular schema.

# Rename each table with a given name across all schemas.

# Rename column of this particular table.
schemaz,tableq,cola,-,-,colaa # Column renaming doesn't change schema/table.

# Rename column of each table with a given name across all schemas.

# Rename column occurring in any table across all schemas.

# Rename column across all tables in the given schema.

A Note on Order of Preference

Order of lines in the CSV file doesn't matter - all of them are read during first load of the filter and they fill a fast in-memory look-up structure. The first three CSV columns (the search path) is used for look-up. With that said, it is possible to define rules that overlap. Consider the following two cases:
# Renamed schema, but one table is left in the original schema
# and some of its columns renamed.
scheman,tabley,*,-,tabley2,- # Leave table in original schema.

# Renamed schema and one table with one column in it.
schemam,tableu,*,schemam2,tableu2,- # Move renamed table to the new schema too.
schemam,tableu,colj,-,-,colj2 # Column renaming doesn't change schema/table.
The order of preference that is followed is straightforward. There are three cases to cover: finding new name for a column, table and schema. Each of these search from the most specific definition to the most generic. To be exact:

a. Order of preference when searching for a new column name:

1. schema.table
2. schema.*
3. *.table
4. *.*

client100,orders,amount,-,-,amount1 # 1. schema.table
client100,*,amount,-,-,amount2      # 2. schema.*
*,orders,amount,-,-,amount3         # 3. *.table
*,*,amount,-,-,amount4              # 4. *.* 
b. Order of preference when searching for a new table name:

1. schema.table
2. *.table

client100,orders,*,-,orders1,- # 1. schema.table
*,orders,*,-,orders2,-         # 2. *.table
c. Order of preference when searching for a new schema name:

1. schema.table
2. schema.*

client100,orders,*,client101,-,- # 1. schema.table
client100,*,*,client102,-,-      # 2. schema.*


We have reviewed how to use the new RenameFilter, that is available in the open-source Tungsten Replicator. RenameFilter is designed to help with various renaming needs during both initial provisioning and following replication, as opposed to developing multiple small filters to handle various rename needs that appear.