Right Sizing Your Virtual Machines

House of Brick Principal Architect

You have allocated too many resources to your virtual machines, and now your business-critical server performance is suffering! How can this be? That does not make sense!

In this post, I will demonstrate how allocating too many vCPUs to a virtual machine with a low workload actually hinders performance instead of helps it.

Test Setup

To prove this cause and effect, we used a dedicated HP DL580 G7 server with four 10-core Intel Xeon E7-4850 CPUs at 2.0GHz per core and 512GB of RAM. An EMC DMX4 SAN was used for the storage underneath the virtual machines. VMware vSphere 5.0 Update 1 was used as the host hypervisor. One virtual machine was created, and was the only VM running on this host. The VM was configured with two vSockets and four vCPUs per socket, as well as 128GB of vRAM. SQL Server 2008R2 was installed on the VM and configured with all of our best practices.

Dell’s freely available DVDstore (a database benchmarking tool) was used to generate a synthetic workload against our SQL Server testbed.

A 50GB workload was generated and loaded into a new SQL Server database. A DVDstore workload test was performed for one hour. The vCPUs were then changed from 8 to 32 in a 4×8 configuration. The database was restored and the test rerun. The output from each test is in the form of ‘Orders Placed per Minute.’ For each test, the maximum degree of parallelism for the SQL Server instance, or MaxDOP, was adjusted from one to six (a requirement from the project).

Test Results

Threads

MaxDOP

8 vCPUs

32 vCPUs

2

1

19277

13589

2

2

19251

17858

2

3

18841

17453

2

4

15839

15640

2

5

15953

15779

2

6

16263

16055

8

1

76590

63910

8

2

76592

70705

8

3

75441

69335

8

4

57508

61412

8

5

55021

61579

8

6

56859

61151

16

1

152782

135484

16

2

151462

140577

16

3

147618

136376

16

4

86078

112365

16

5

81383

106862

16

6

84634

101230

32

1

298444

274629

32

2

291692

278024

32

3

280824

272659

32

4

108952

147444

32

5

102808

133270

32

6

106140

124293

64

1

487146

542351

64

2

429131

532679

64

3

368718

515461

64

4

113664

153877

64

5

117480

136862

64

6

0

127634

100

1

0

0

100

2

375301

539928

100

3

337446

480744

100

4

0

150850

100

5

0

132887

100

6

0

127498

 

The results are pretty clear. At a low volume of work, the SQL Server instance performs slower with more vCPUs assigned to the virtual machine. As the volume of work grows, the 32 vCPU VM eventually overtakes the 8 vCPU VM in performance.

Why?

The answer lies in the overhead of vCPU scheduling at the hypervisor layer. All vCPU activity is scheduled into a runnable queue, even if a vCPU is almost idle. You can see this measured indirectly via the vCPU Ready VMware performance counter. As the vCPU count goes up, the hypervisor schedules all activity in this queue. If some vCPUs are almost idle, they still have to get scheduled to run.

However, the priority of the request can decrease if VMware determines that a vCPU is idle, and the overhead of these tasks and queues has a cumulative effect. Now, if all vCPUs are busy, priority is given and these effects become negligible – for this VM. The effects of becoming deprioritized in the runnable CPU queue can potentially be felt by other VMs on the same host, however, so keep this in mind and constantly monitor CPU Ready times of all of your mission-critical virtual machines.

Therefore, baseline and benchmark your workload and determine the actual resource consumption of your workloads. Allocate your VM resources appropriately, and you might just see a noticeable jump in performance!

Note: This blog post was taken from an earlier engagement. For the full case study based on that engegement, please refer to this blog post: “SQL Server Performance on Itanium vs. x86 on VMware: A Case Study”.

Table of Contents

Related Posts