vSphere Settings for All Flash Arrays

Jeff Stonacek, Principal Architect

Introduction

The use of all-flash arrays is increasing all the time. House of Brick has definitely seen an uptick in the number of clients utilizing all-flash arrays. Their popularity is not surprising since prices are getting more competitive, especially when combined with compression technologies not possible with spinning-disk arrays. Plus, the performance is mind blowing.

For all-flash arrays, as with most new technologies, changes in operating procedures are needed. The amount of I/O possible with an all-flash array means we need to change how we think about configuring the infrastructure around it. In this blog we will explore the I/O settings recommended for vSphere and all-flash arrays.

Path Switching Policy

It is recommended to set the Native Multi-Pathing (NMP) policy for Fibre Channel LUNs to Round Robin. This is a general storage recommendation for VMware related to Fibre Channel storage networks. vSphere has a setting that controls Round Robin NMP behavior, the path switching policy. The path switching policy controls how many IOPS are sent down a single path before switching to the next available path. By default, vSphere sets the path switching policy to 1000.

Both EMC and 3PAR recommend setting the path switching policy in vSphere to 1 for their all-flash arrays. See the following links for details:

http://www.emc.com/collateral/TechnicalDocument/docu5265.pdf
http://h20195.www2.hp.com/v2/getpdf.aspx/4aa4-3286enw.pdf

IBM recommends setting the path switching policy in vSphere to 4 for their all-flash arrays. See the following link for more information:

https://www-304.ibm.com/partnerworld/wps/servlet/download/DownloadServlet?id=j65SvWUgq$wiPCA$cnt&attachmentName=deploying_ibm_flash_system_840_vmware_esxi_environments.pdf&token=MTQ0NTk3ODkxMTg0MA==&locale=en_ALL_ZZ

The setting is made on a per LUN basis. To make the change, run the following command:

esxcli storage nmp psp roundrobin deviceconfig set --device=naa.x --iops=1 --type iops

 

As with all advanced settings, check your vendor specific documentation for specific recommendations.

Queue Depth

ESXi contains queues for most operations of the hypervisor. For I/O, there are two primary queues that concern us, the adapter queue and the LUN queue. The default queue depth for an HBA depends on the brand. For example, vSphere sets the default queue depth of an Emulex HBA to 32. Two queue entries are reserved, so the LUN queue depth shows up as 30. The issue with LUN queue depth is that if the queues get overrun the I/Os will have to wait just to get into the queue.

The way to tell if queuing is occurring is to monitor the QUED column for the LUNs in ESXTOP. If queuing is observed during times of heavy I/O, then the queue depth needs to be increased beyond the default.

The reason this is important for all-flash arrays is because the latency is so much lower, so the likelihood of overrunning the queue is much higher than with a traditional spinning-disk array. Most storage vendors recommend setting the HBA queue depth to its maximum value when dealing with all-flash arrays.

See the following VMware KB article, 1267, for a complete description of HBA queue depths for QLogic, Emulex, and Brocade: Changing the queue depth for QLogic, Emulex, and Brocade HBAs (1267).

Related KB article: Setting the Maximum Outstanding Disk Requests for virtual machines (1268)

Results

Once the settings are optimized, the I/O performance from a single VMware virtual machine is amazing. Below are examples of some of the numbers we have seen:

Storage Storage Network IOPS MB/sec Desc
XtremIO 8 Gb FC 146,000 1,160 1 VM, 1 ESXi host
XtremIO 8 Gb FC 169,000 1,340 2 VMs, 1 ESXi host
IBM FlashSystem 840 8 Gb FC 119,648 934 1 VM, 1 ESXi host
NetApp 10 Gb Ethernet 109,000 915 1 VM, 1 ESXi host

Note: The NetApp array in the above example was not an all-flash array, but was fronted by a substantial amount of flash storage for caching. Also, the network adapters were not bonded, so the test was saturating a single 10 gigabit adapter.

In the case of XtremIO, the performance difference when using the Path Switching Policy and queue depth settings are as follows:

  • XtremIO default PSP 1000 IOs per path, default queue depth 30 – 149,000 IOPS
  • XtremIO modified PSP 1 IO per path, default queue depth 30 – 161,000 IOPS
  • XtremIO modified PSP 1 IO per path, modified queue depth 128 – 169,000 IOPS

Clearly, the PSP and queue depth settings have an impact on I/O performance.

Conclusion

All-flash arrays are a game changer when it comes to I/O performance. It is worth researching vendor specific optimal settings for all-flash arrays, as the settings are likely different than those of legacy arrays.

After all, when configured properly, VMware and flash equals stellar I/O performance.

Table of Contents

Related Posts