House of Brick Principal Architect
You have allocated too many resources to your virtual machines, and now your business-critical server performance is suffering! How can this be? That does not make sense!
In this post, I will demonstrate how allocating too many vCPUs to a virtual machine with a low workload actually hinders performance instead of helps it.
Test Setup
To prove this cause and effect, we used a dedicated HP DL580 G7 server with four 10-core Intel Xeon E7-4850 CPUs at 2.0GHz per core and 512GB of RAM. An EMC DMX4 SAN was used for the storage underneath the virtual machines. VMware vSphere 5.0 Update 1 was used as the host hypervisor. One virtual machine was created, and was the only VM running on this host. The VM was configured with two vSockets and four vCPUs per socket, as well as 128GB of vRAM. SQL Server 2008R2 was installed on the VM and configured with all of our best practices.
Dell’s freely available DVDstore (a database benchmarking tool) was used to generate a synthetic workload against our SQL Server testbed.
A 50GB workload was generated and loaded into a new SQL Server database. A DVDstore workload test was performed for one hour. The vCPUs were then changed from 8 to 32 in a 4×8 configuration. The database was restored and the test rerun. The output from each test is in the form of ‘Orders Placed per Minute.’ For each test, the maximum degree of parallelism for the SQL Server instance, or MaxDOP, was adjusted from one to six (a requirement from the project).
Test Results
Threads |
MaxDOP |
8 vCPUs |
32 vCPUs |
2 |
1 |
19277 |
13589 |
2 |
2 |
19251 |
17858 |
2 |
3 |
18841 |
17453 |
2 |
4 |
15839 |
15640 |
2 |
5 |
15953 |
15779 |
2 |
6 |
16263 |
16055 |
8 |
1 |
76590 |
63910 |
8 |
2 |
76592 |
70705 |
8 |
3 |
75441 |
69335 |
8 |
4 |
57508 |
61412 |
8 |
5 |
55021 |
61579 |
8 |
6 |
56859 |
61151 |
16 |
1 |
152782 |
135484 |
16 |
2 |
151462 |
140577 |
16 |
3 |
147618 |
136376 |
16 |
4 |
86078 |
112365 |
16 |
5 |
81383 |
106862 |
16 |
6 |
84634 |
101230 |
32 |
1 |
298444 |
274629 |
32 |
2 |
291692 |
278024 |
32 |
3 |
280824 |
272659 |
32 |
4 |
108952 |
147444 |
32 |
5 |
102808 |
133270 |
32 |
6 |
106140 |
124293 |
64 |
1 |
487146 |
542351 |
64 |
2 |
429131 |
532679 |
64 |
3 |
368718 |
515461 |
64 |
4 |
113664 |
153877 |
64 |
5 |
117480 |
136862 |
64 |
6 |
0 |
127634 |
100 |
1 |
0 |
0 |
100 |
2 |
375301 |
539928 |
100 |
3 |
337446 |
480744 |
100 |
4 |
0 |
150850 |
100 |
5 |
0 |
132887 |
100 |
6 |
0 |
127498 |
The results are pretty clear. At a low volume of work, the SQL Server instance performs slower with more vCPUs assigned to the virtual machine. As the volume of work grows, the 32 vCPU VM eventually overtakes the 8 vCPU VM in performance.
Why?
The answer lies in the overhead of vCPU scheduling at the hypervisor layer. All vCPU activity is scheduled into a runnable queue, even if a vCPU is almost idle. You can see this measured indirectly via the vCPU Ready VMware performance counter. As the vCPU count goes up, the hypervisor schedules all activity in this queue. If some vCPUs are almost idle, they still have to get scheduled to run.
However, the priority of the request can decrease if VMware determines that a vCPU is idle, and the overhead of these tasks and queues has a cumulative effect. Now, if all vCPUs are busy, priority is given and these effects become negligible – for this VM. The effects of becoming deprioritized in the runnable CPU queue can potentially be felt by other VMs on the same host, however, so keep this in mind and constantly monitor CPU Ready times of all of your mission-critical virtual machines.
Therefore, baseline and benchmark your workload and determine the actual resource consumption of your workloads. Allocate your VM resources appropriately, and you might just see a noticeable jump in performance!
Note: This blog post was taken from an earlier engagement. For the full case study based on that engegement, please refer to this blog post: “SQL Server Performance on Itanium vs. x86 on VMware: A Case Study”.