Jim Hannan (@HoBHannan), Principal Architect
I recently discovered a white paper published by VMware on tuning latency-sensitive workloads:
Being part of a performance team that virtualizes business critical applications, we are always looking for better methodologies. So obviously a white paper written by VMware on how to improvement performance for latency sensitive workloads would be of interest.
This blog discusses some the tweaks and settings introduced in the white paper. I have also provided recommendations on whether we would suggest using each performance tweak. I think it is important to stress two things before considering using any of the settings.
VMware very clearly states that for most workloads these tweaks are unnecessary. The category of application that would benefit from these settings are truly for latency sensitive applications that have SLAs in the sub-second range.
In some cases we have not had the opportunity to benchmark or examine the results of settings. So care should be taken. At HoB, we believe the best approach is to first benchmark in a test environment before implementing in production.
Turn off BIOS Power Management
This is something we hope all customers are doing. The Intel Nehalem has two power management options:
Intel Turbo Boost
C-state can increase memory latency according to VMware and is not recommended. Intel Turbo Boost should be left on. According to VMware it will increase the frequency of the processor should the workload need more power.
Tickless Kernel in RHEL 6
We have watched this from afar for 5 plus years. In the early versions of ESX, the guest would suffer from clock drift. Clock drift is when the clock of the OS falls behind. This was common for SMP would loads that were busy doing work or on an ESX host with constrained resources.
Moving off the RHEL 5 to RHEL 6 and the tickless kernels can reduce application reduce latency. The tickless kernel is a better time keeping mechanism. Additionally, VMware is claiming is can offer a better performance for latency sensitive applications.
NUMA Node Affinity
NUMA affinity basically assigns a VM to a NUMA node. This can be monitored with ESXTOP. I would not recommended this until it has been determined that NUMA latency is an issue. We say this because each application handles NUMA differently. Oracle for example chooses to not use NUMA as of 11g.
To monitor NUMA latency with ESXTOP
esxtop > f > g > enter > (capital) V N%L < 80 for any VM than you the workload may have NUMA latency issues.
Most NICs support a feature called interrupt moderation or interrupt throttling. This basically buffers (or queues) interrupts for the network card so the host CPU does not get overwhelmed. This can be disabled on most NICs to give latency sensitive application better network throughput. VMware recommends this in only the most extreme cases. We agree that this should be used carefully. We would consider this with Oracle RAC workloads that are suffering latency issues on the RAC interconnect.
VMXNET3 Interrupt Coalescing
As of vSphere 5, VMXNET3 supports interrupt coalescing. Interrupt coalescing is similar to a physical NICs interrupt moderation. The philosophy behind interrupt coalescing is to benefit the entire cluster by reducing the CPU overhead of TCP traffic. However for latency sensitive applications–like Oracle RAC–it is best to disable the interrupt coalescing.
Go to VM Settings →Options tab →Advanced General →Configuration Parameters and add an entry for ethernetX.coalescingScheme with the value of disabled.
If you have a cluster or host that runs latency sensitive applications you can also disable it for the entire cluster with the setting below.
Click the host go to the configuration tab → Advance Settings → networking performance option CoalesceDefaultOn to 0(disabled).
VMXNET3 Large Receive Offload (LRO)
Similar to the feature above, the VMXNET3 feature LRO aggregates multiple received TCP segments into a large segment before delivery to the guest TCP stack.
We recommend that you disable LRO all Oracle virtual machines.
# modprobe -r vmxnet3
Add the following line:/etc/modprobe.conf
(Linux version dependent):options vmxnet3 disable_lro=1
Next, reload the driver:# modprobe vmxnet3
Prevent De-scheduling of vCPUs
I have a hard time imagining that heavy workloads need to use this setting. You can ensure that a virtual machine is never de-scheduled from a pCPU (physical CPU) by configuring the advance setting monitor_control.halt_desched. This setting is similar to a tight loop an application might have when a process never sleeps. It just continually executes a loop. VMware describes this setting as the VM becoming the owner of the pCPU (or pCPUs). Which indicates to me that the pCPU is not eligible to schedule other workloads, it in effect becomes dedicated to the “pinned” workload.
Go to VM Settings →Options tab →Advanced General →Configuration Parameters and add monitor_control.halt_desched with the value of false.