Oracle RAC on VMware – Beware Oversubscription
by Cameron Cameron, Senior Consultant
At House of Brick, we’ve been virtualizing Oracle RAC on VMware for over twelve years. The technology is reliable. Of course, prerequisites and best practices exist in a virtual deployment, just like they do with physical hardware.
Last year, I was sent to a client site where they were reporting RAC node evictions in their virtual environment. They were predisposed to believe that the issues had to do with the cluster interconnect, so a VMware NSX expert was also onsite. Suffice it to say that no network issues were found. I found, within the first hour, that there was some minor memory ballooning that would occasionally occur, which we determined to be the cause of the node evictions. The remainder of my visit was spent educating them on why memory pressure was so bad for Oracle RAC.
A significant reason why VMware is attractive to businesses is its ability to consolidate workloads and oversubscribe CPU and memory resources, so what’s the big deal? Well, it’s true, you can oversubscribe CPU, and sometimes memory, as long as you keep a few things in mind:
- Tier-1 workloads are generally not good candidates for oversubscription
- Oracle RAC is not a good candidate for oversubscription, regardless of whether it’s Tier-1 or other
At House of Brick, we have a set of Oracle on VMware best practices, which we advise our clients to follow, whether we’re assisting them in virtualizing existing workloads, or performing health checks. Part of our best practices indicate that you should use full memory reservations for Oracle Database VMs, or at least reserve an amount of memory equal to the size of the SGA. For CPU, we acknowledge that it’s generally okay to oversubscribe up to a ratio of ~1.5.
Not all shops choose to follow our advice however. In the case of Oracle Database VMs, the worst that can happen is a performance impact, which may or may not be noticed. Let me postulate that oversubscription = latency. Why is resource oversubscription so bad for Oracle RAC? The short answer is Oracle Grid Infrastructure.
Consider running a two-node RAC cluster on physical machines, Node-A and Node-B. If you were able to induce a condition on Node-A that caused some kind of latency across the cluster interconnect, what would happen? It’s very likely that Node-A would be evicted from the cluster; in other words, Node-A would be rebooted. A key difference between Oracle Database and Oracle Grid Infrastructure is that Oracle GI has the ability to reboot the machine.
Running Oracle RAC in a VMware environment, in a situation where the VM is not guaranteed to get the memory resources it needs, could result in ballooning. Ballooning on the ESXi host at the client site corresponded to what I labeled as “unpredictable results” in Clusterware. The word “unpredictable” is appropriate because the result of the increased latency in the environment presented in a number of ways:
- Loss of access to a CRS/voting disk
- Unacceptably high network latency
- Failure to communicate with the CSS daemon
Here are examples of errors shown in some of the Clusterware alert logs immediately following ESXi memory ballooning events:
2018-07-29 08:29:24.963 [OCSSD(12804)]CRS-1614: No I/O has completed after 75% of the maximum interval. Voting file ORCL:OCR_DATA_02 will be considered no functional in 4070 milliseconds 2018-07-29 08:29:27.951 [CSSDAGENT(12767)]CRS-1661: The CSS daemon is not responding. Reboot will occur in 5569 milliseconds;…. 2018-08-19 04:37:57.859 [OCSSD(9589)]CRS-1612: Network communication with node [nodename] (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.030 seconds 2018-08-19 04:38:20.45 [OCSSD(5141)]CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; … The CSS daemon is terminating due to a fatal error;….
As I mentioned earlier, the client was predisposed to suspect that the cluster interconnect was the culprit. If they reviewed AWR reports during their forensic research, this is not surprising. Any resource contention on other RAC nodes would cause a degradation in performance, which would have an effect on interconnect traffic. Of course, the surviving nodes’ AWR reports are going to show latency issues.
Oracle RAC on VMware runs well, and at House of Brick we’ve never encountered a RAC on VMware issue that couldn’t be stabilized, as long as the proper resources were provisioned. For Oracle RAC, this means setting memory reservations for the Oracle VM, or running it on a host that is not oversubscribed.