Flying Clusters: ddlscan - Utility to Help Analyze and Migrate Database Schemas

Intro

While working on one of the MySQL to Oracle replication projects for Continuent, I needed to implement an open-source utility for transforming MySQL schema to an Oracle dialect (DDL statements that create specific schema on Oracle) to save from otherwise tedious work. This article introduces ./ddlscan tool, which does that and is extensible to do much more.

Ingredients

Here's what you'll need:

Your favorite DBMS with some tables. Currently supported MySQL, Oracle and PostgreSQL.
Latest Tungsten Replicator build. Not even needed to install, enough to untar.
Velocity template of your choice. You will find currently available ones in tungsten-replicator/samples/extensions/velocity/ folder.

Recipe

Mix ./ddlscan with your chosen schema and a template:

$ cd tungsten-replicator/bin
$
$ ./ddlscan -db test -template ddl-mysql-oracle.vm -url jdbc:mysql:thin://localhost/ -user tungsten -pass secret

If you have Replicator installed, it's even simpler - ./ddlscan can read connection credentials from Replicator properties for you (no need for "-url" parameter):

$ ./ddlscan -db test -template ddl-mysql-oracle.vm -user tungsten -pass secret

Result - rendered text file (or stdout). Eg. of ddl-mysql-oracle.vm template:

/*
SQL generated on Wed Feb 27 14:13:21 EET 2013 by ./ddlscan utility of Tungsten

url = jdbc:mysql:thin://localhost/
user = tungsten
dbName = test
*/

CREATE TABLE 3colors /* ERROR: table starts with a number, use rename option and a filter */
(
id NUMBER(10, 0),
color VARCHAR2(1) /* ENUM('R','G','B') */,
enabled NUMBER(1) /* BIT(1) - no constraints for 0 or 1 */,
acolumnthatdoesntfittooracleat NUMBER(10, 0) /* WARN: truncated column name exceeding 30 characters (acolumnthatdoesntfittooracleatall) */

/* ERROR: table must have a primary key! */
);

Architecture

To make it more scientific, here's an overview of how it works:

When you call ./ddlscan it uses Tungsten Replicator libraries and connects to the underlying DBMS through appropriate JDBC driver. There are classes in the Replicator which extract metadata from the requested schema. Tables, columns, keys and so on are identified and then provided as JAVA objects to the Velocity template-generating engine.

Velocity templates are not hard to write. Here's an example of a template that reports tables without primary keys: ddl-check-pkeys.vm - if comments and headers were removed, it's only 3 lines of code (at least, at the time of writing this article). To make it easier to handle the syntax, use some plugin like veloedit.

Finally, user receives a rendered result from applying schema metadata to a template.

Possibilities

By leveraging this simple approach many tasks can be automated. Here's just a few ideas:

The mentioned transformation of schema dialects between different DBMS types.
Automatic trigger generation for trigger-based replication or ETL tools.
Staging tables generation for loading data warehouses (eg. Vertica).
Schema validation (scanning for potential issues).
Automatic code generation. Eg. generating shell scripts to do something with the tables, generating JavaScript filters for Tungsten Replicator.

Conclusion

We have briefly reviewed how ./ddlscan tool works. More articles about specific use cases to come. Happy rendering!

Flying Clusters

Wednesday, February 27, 2013

ddlscan - Utility to Help Analyze and Migrate Database Schemas