Joe Grant (@dba_jedi), Principal Architect
The dream of active/active datacenters is becoming more of a reality. One issue for this type of architecture is how to replicate the databases between different datacenters. For the replication itself, Oracle GoldenGate is one possible solution. However, it is likely that any GoldenGate implementation will have an impact on the application itself. There are several potential issues that need to be addressed when implementing.
A few quick definitions and assumptions:
- Uni-directional replication – one way replication to a standby database
- Bi-directional replication – replication between two databases
- Multi-master replication – replication between three or more databases
- Each datacenter will have its own database. It is important to note that just because the databases support the same application they are separate databases.
- Each datacenter will have its own application servers that will access the local database
- In case of failure, the application servers from datacenter A will not point to the database in datacenter B, but rather some sort of network routing will resolve the issue.
The following points are some of the potential issues that need to be addressed before starting replication. Many of these may affect the application, the database schema and/or development of the application. Keep in mind that the following list is not a comprehensive list of potential application concerns, but is a starting point for the discussion.
Oracle does provide a script that can be run prior to setting up replication that will help identify any potential issues. It can be found on My Oracle Support.
There are numerous restrictions on the datatypes that can be replicated. This is an ever-changing list based on the version of GoldenGate that is being used. These restrictions need to be reviewed and compared to the datatypes currently in use for the application. Future development also needs to take these restrictions into consideration.
One method for improving replication performance is to have multiple processes involved in the replication process. While it is possible to replicate at the schema level, it will likely be better to replicate at the table level so that multiple process can be configured. When replicating at the table level, the tables will need to be grouped together based on referential integrity constraints. Possible table groups for the replication process will become an application concern and should therefore be evaluated.
One possible issue with data replication is the use of triggers. For example, if a trigger generated an additional insert/update to another table that is also being replicated, duplicate rows may be generated on the destination database. The data that fires the trigger on one database will also fire the trigger at the replicated database. The first database will also send along the data generated by the trigger. There are parameters that can be set within the GoldenGate configuration files to either suppress or disable the triggers. The version specific documentation should be reviewed to best decide how to deal with this situation.
When data is replicated between two databases it has to be uniquely identified. One of the simplest solutions is to have a primary key on all tables that are to be replicated. If a table does not have a primary key, a synthetic or composite key will have to be generated. Many times this is accomplished by using all columns in a row to uniquely identify the data. This can greatly affect replication performance.
Another issue is that sequences are frequently used as primary key values. The issue then becomes how to keep different sets of data from using the same value as a primary key in the different databases. The recommended best practice from Oracle in this situation is to use only odd numbers in one database and only even numbers in the other database and then increment the sequence by two. If more than two databases are used, then each database sequence starts at a different value (node 1 – 1, node 2 – 2, node 3 – 3, etc.) and then increment the sequence by the number of databases in the environment.
In any bi-directional or multi-master replication situation there will be conflicts. The same row will be updated in multiple databases before GoldenGate has the opportunity to replicate the data that was changed. It is imperative that GoldenGate is configured to help identify these conflicts and to help resolve them. Some of the strategies involved with conflict detection and resolution rely on the structure of the table and should be planned for as early in the development cycle as possible.
There will be certain conflicts that are allowable and can be ignored. One example is if two users attempt to delete the same record at the same time from different databases. The end result is that the record is deleted and no manual intervention is necessary.
Timestamp Based Resolution
One type of resolution is to simply take the first, or most recent, update that was committed. For this type of resolution to work, the table will require a column that has a timestamp of when the row was last updated, for example LAST_UPD. When the replication for the table occurs, the LAST_UPD columns can be compared and then, depending on configuration, one of them “wins” and the failing record is then placed into an exceptions table (more on this later).
Trusted Source Resolution
In this scenario, one of the databases participating in replication is considered the trusted source and always correct. Any conflicts from other sources are then rejected and placed into an exceptions table. For this type of simple resolution, an additional column that contains a unique identifier for the database is created and then compared at replication time.
Quantitative Resolution involves any value that needs to be updated and accurately tracked such as inventory, account balances, and/or sales information. In this scenario, GoldenGate has to be configured to accurately perform any mathematical operations on the values being updated and replicated.
The resolution types discussed above are considered simple resolution. More complex conflict detection and resolution rules are possible. Please refer to the version specific documentation from Oracle for more on these scenarios.
When a conflict does arise the “losing” transaction needs to be tracked so that GoldenGate, the application, an application administrator, and/or the DBA can take any appropriate action. This is done with the use of exception tables. The details of the transaction are placed into an exceptions table so that the transaction can be reviewed either manually or through application logic. Notification of exceptions can then either be through the use of a trigger any time a new row is added to the exceptions table, or through periodic checking of the table for new rows.
Special care needs to be taken with configuring Data Definition Language (DDL) replication, and Oracle advises against DDL replication in active-active environments. Not all DDL commands can be replicated. The version specific documentation should be reviewed for what is and what is not allowable. The changes need to be carefully planned for prior to issuing any DDL command. In addition to the change to the databases that the DDL causes, it is likely that the GoldenGate configuration files will also have to be updated. This will likely complicate any application releases that require changes to the database.
Even in active/active environments it is important to control how a failure of an application stack is handled. Flapping can still occur and can cause issues with the application data. If a client connection has failed from one datacenter to another at the application layer, it is important to ensure that it is not failed back until such a time that it will not cause transactional consistency issues.
So no matter what the sales guys says, it is not just a simple point database 1 at database 2 and it will all just work. Implementing bi-directional replication can be quite complex, often requiring quite a lot of planning, and will likely require changes to the application. For any third party applications you will need to work with the application vendor.