Storage Snapshots for Database Refreshes
Joe Grant (@, Principal Architect
One of the not so fun jobs for a DBA is the never-ending task of refreshing non-production databases from their production counterparts or from some other “gold” image. There are a myriad of tools that will help automate this process, but these tools can be prohibitively expensive. If your storage array is capable of snapshotting a LUN, then you can use this technology to help automate the refresh process without any additional tooling.
Recently I had the opportunity to work through the process of using SAN snapshots for database refreshes. At the client site they had 2 datacenters (production and non-production are in separate datacenters) within the same city and they were using EMC XtremIO and EMC Recover Point (for replication between the datacenters) for the storage. The database was an Oracle 12c RAC database running on vSphere with VMDKs for storage. The volume manager ASM was being used for the datafiles, and ASMLib was used to label the disks for ASM.
Even if you do not have this exact setup, don’t let it stop you from trying the following process. The real key here is that the SAN have some sort of snapshot technology. These procedures could function in a virtualized or physical environment, in one datacenter (or multiple), or with single instance or RAC databases.
There are several different use cases that the following procedures could be used for. Here are just a few that I worked through with the environment that I described in my introduction. They are:
- Take a snapshot of a production database, mount it to a non-production RAC cluster and then start it as a new instance.
- Take a snapshot of a production database, mount it to a non-production RAC cluster and then start it as a refresh of an existing database.
- Take a snapshot of a non-production database that is already a snapshot of a production database (a snap of a snap), mount it to the same non-production RAC cluster and then start it as a new instance.
As I mentioned previously, the environment that these procedures were developed in used RAC databases, although they will work in a single instance environment as well.
With all three use cases there are configuration options that need to be considered to help reduce the complexity of the refreshes and to help with scripting. These considerations are mostly around ASMLib disk labels, ASM diskgroup names, VMDK placement, database names, and datastore names. While this example involved a virtualized RAC environment, these considerations are valid for physical and single instance environments as well.
When taking a snapshot, the resulting VMDKs will retain the same ASMLib label and the ASM diskgroup information as the source database. The disk labels for all disks attached to the same VM have to be unique. Currently the ASMLib labels follow the pattern of <ASMDGNAME><ITTERATION>, so for example, the labels for the 4 VMDKs of the DATA diskgroup are DATA01, DATA02, DATA03 and DATA04. When the DATA diskgroup name is used on multiple VMs, then importing the snapshot can be problematic. However, renaming a diskgroup and changing the ASMLib label are relativity simple tasks. One possible solution is to leave generic diskgroup names and ASMLib labels in production then, as a part of mounting the snapshot disks, the ASMLib labels are renamed. The non-production diskgroup names and ASMLib labels could be modified to be unique. For example, the diskgroup name could be DATA_DEV, which uses the disk label of DATA01_DEV.
For VMDK placement, only those disks that make up the database should be snapped, the operating system, Oracle software and CRS disks should not be snapped. Currently, the operating system, Oracle software, CRS, and database VMDKs are stored in the same datastore. A configuration that separates the database VMDKs from the other VMDKs would allow for easier scripting of the process. All disks in the database datastore get added to the VM so there is no reason to add logic to exclude VMDKs. The following layout could be used for the VMDK layout, assuming a two-node RAC cluster.
When the ASM diskgroup disks are added to a RAC cluster, the database name for the database on them will be the same as the production name. As with the diskgroup and ASMLib labels, the database name can be changed as a part of the refresh.
In order to prevent possible confusion and error, a standard for snapshot and datastore names should be created.
The use of ASM is not mandatory for snapshot refreshes to work, but is does make it all a little easier.
You will want to work with your storage vendor on how the snapshots and any potential replication work. If the non-production copy points back to the same blocks that the production node is using there is a possibility that performance on the production node could be impacted. In this use case the snapshot is mounted in a different datacenter off of a replicated copy, so the blocks do not point back to the blocks that the production node is using.
The processes for the use cases above all use well-known procedures for working with Oracle databases; it is just a matter of putting them together. Some of the procedures used are re-naming ASM diskgroups, ASMLib disk labels, and databases. The skills needed to perform all of the necessary steps will involve OS administrators, storage administrators, DBAs, and (for virtualized environments) VMware administrators.
For all use cases the beginning is always the same, a snapshot of the source database needs to be taken. For this example, the database should be in hot backup mode and the following steps should be followed.
- Place the DB into hot backup mode
- Snap the datastore LUN(s) that the DB/ASM disks are in
- Take the DB out of hot backup mode
- Create a new datastore from the snapshot and mount it to the vSphere hosts
Once a snapshot naming convention is decided upon, a simple script can be developed to accomplish this step.
In a virtualized environment, a common task for the use cases is to add the VMDKs to the RAC VMs. This can be done either through the vCenter GUI, or through VMware CLI. Since the VMDKs for the database are shared between all VMs in the RAC cluster, the multiwriter flag has to be set, which is an advanced parameter. Advanced parameters cannot be modified through the vCenter GUI for a running VM, however they can be adjusted through the VMware CLI.
While working through the use cases for this project, the VMs were shutdown to add the VMDKs through the VMware CLI. However for ongoing operations, a script should be developed so that the VMDKs can be added to a running cluster. Once the VMDKs were added to the VMs, they were restarted. During the boot process, the operating system and ASMLib recognized the newly added VMDKs.
Either the vSphere administrator or the Linux administrator can perform the process of adding VMDKs. This task can also be scripted. Example scripts for adding shared VMDKs while the VMs are online have been provided. Once scripted, it will not be necessary to shutdown the VMs in order to add the VMDKs.
For physical environments, simply adjust the zoning and masking so that the snapshots are presented as separate LUNs to the non-production host.
Use Case 1
Take a snapshot of a production database, mount it to a non-production RAC cluster, and then start it as a new instance.
- There is at least one database running on the non-production cluster
- The non-production database does not have the same database name as the incoming production copy
- The ASM diskgroup names and ASMLib labels for any running non-production databases do not match the production diskgroup and ASMLib names
High level steps:
- Attach the snapshot VMDKs/disks to the VMs
- Scan for the new VMDKs/disks
- Mount the diskgroups
- Add the database to the cluster registry
- Start the database
Use Case 2
Take a snapshot of a production database, mount it to a non-production RAC cluster and then start it as a refresh of an existing database.
- There are no additional databases running on the RAC cluster
- The database that will be refreshed is currently configured with the cluster register
High level steps
- Complete the preparatory steps for renaming the database (backup the control file to trace, copy and adjust the init file)
- Shutdown the non-production database to be refreshed
- Unmount the ASM diskgroups
- Remove the existing VMDKs from the VMs
- Attach the snapshot VMDKs to the VMs and scan for the new disks
- Mount the diskgroups
- Rename the database (NID utility or recreate the controlfile), validate the database configuration and start the database
Use Case 3
Take a snapshot of a non-production database that is already a snapshot of a production database (a snap of a snap), mount it to the same non-production RAC cluster and then start it as a new instance.
- The initial snapshot database is dev1 and the secondary snapshot database is dev2.
- There are no other databases running in the RAC cluster.
High level steps:
- Complete the preparatory steps for renaming the database, datafiles, controlfiles and spfile
- Shutdown dev1
- Change the ASM diskgroup names and ASMLib disk labels
- Check the dev1 configuration, modify as needed, rename the datafiles and start dev1
- Complete the preparatory steps for renaming the database
- Attach the VMDKs and rescan the ASMLib labels
- Rename the database (NID utility or recreate the controlfile)
- Add the database to the cluster configuration and start the database
Additional Use Cases
While, the above use cases may not cover all your organization’s needs over time, the procedures are likely to still be relevant. The procedures above can be re-ordered to cover just about every reasonable situation. If a request arises that is not covered by the above procedures, there is likely something fundamentally flawed with the requirements of that new request.