Michael Stone (@HoBMStone), Lead Architect and CIO
Being virtualized brings some intriguing new possibilities to the table in terms of database backup, recovery, and cloning. Below we will explore the surface of those capabilities and implications.
Snapshot / Stun Issues
Most of the new snapshot capabilities are built around the VMware feature known as VM/VMDK snapshots. These allow virtual disks to be quickly frozen at a point in time similar to SAN snapshots.
As with SAN technology, VMware snapshots do not guarantee data consistency unless other mechanisms are at play. This usually means putting the database in hot backup mode, although Windows guests can take advantage of the system service known as Volume Shadow Copy Service (VSS) as an alternative.
According to VMware Oracle Database Best Practices (dated May 2016):
“VMware does not recommend that you backup a high transactional, heavy I/O-centric Oracle database using VMware snapshot technology because, during the snapshot removal (consolidation), there is a brief stun moment. No activity is permitted against the virtual machine, which might result in performance issues and service disruptions. ”
In fact, we have experienced this first hand and the stun can be both lengthy and very disruptive to throughput. The more I/O demand is present on the guest, the longer the stun will be. For that reason, we generally recommend making data volumes Independent/Persistent to protect them from this type of disruption. Unfortunately, doing so eliminates the possibility of using any snapshot-based mechanism such as those described below. At the very least, for production systems, you will want to schedule these types of backups during low-activity periods where pauses can be tolerated.
VMware and Third Party Data Protection
In general, these solutions can be used to take valid/consistent backups of Oracle data, assuming you have not excluded the underlying volumes from VMware snapshots and can accept the performance implications described above.
Caveats and methodologies for this type of backup follow those used with any generic snapshot or cloning techniques, which may be considered for data volumes. The primary concern is that we get a consistent backup of the data files in order to ensure data integrity.
In their Backup and Replication for Oracle Guide, Veeam provides the graphic at right, which conveys a simple flow chart for your options. The same decision tree appears to apply to VMware Data Protection as well.
To summarize, data volumes on Windows guests can be backed up using Volume Shadow Copy Service (VSS) because Oracle on Windows recognizes that interface. For other platforms, the database must first be placed in hot backup mode (ALTER DATABASE BEGIN BACKUP) for the duration of the snapshot or copy operation. Both Veeam and VMware incorporate guest scripts to accomplish this function seamlessly.
Oracle Recovery Manager (RMAN)
While virtualization brings a number of intriguing mechanisms for backing up your databases, there is usually no substitute for the flexibility and safety offered by native backup and recovery tools. This is particularly true when it comes to production and high performance workloads. For Oracle, this means RMAN.
At House of Brick, we always recommend that RMAN be included as part of any backup strategy, either physical or virtual, because it offers the greatest variety of recovery scenarios. When paired with an external backup location or librarian, RMAN allows very granular, independent, point-in-time recovery of objects from individual blocks to entire tablespaces, as well as full restore/recovery of the database for disaster scenarios. In addition, it facilitates cloning of the live database to other locations. If backup performance is a concern, there are a number of ways to reduce your backup time, including multiple threads, incremental backups, and block change tracking to enable backups of only the most granular changes to the database. By utilizing high-performance hardware for the destination, RMAN strategies can be built around very large databases – even ones with tight backup windows.
When moving into the virtual world, there are exciting and convenient mechanisms available for making consistent backups of your data volumes. As long as care is taken to implement them properly, they can greatly simplify your backup tasks (you do test recoveries and validate them on a regular basis, don’t you?). These techniques can also be used to clone your databases to other environments with ease, and are a great fit for test and development instances.
However, for your most critical production workloads, especially those with high throughput and performance requirements, the underlying snapshot technology may introduce unacceptable pauses to your throughput. Using these snapshot-based mechanisms exclusively, also takes some important recovery capabilities off the table that you would otherwise have with RMAN. So be sure to evaluate your requirements for recovery time and recovery point (RTO and RPO) to ensure you can meet the expectations.
As always, it’s very difficult to prescribe a one-size-fits-all set of “best practices” for anything relating to IT. That being said, at House of Brick our sensitivity to consistent performance of critical workloads leads us to always start with making VMDK data volumes Independent/Persistent and relying on RMAN as the cornerstone for production backups in virtual environments. Of course we would be very happy to discuss alternatives and their implications with you, or to schedule time to help you evaluate your specific requirements and options.