Mike Stone (@HoBMStone), CIO & Principal Architect
Scenario: Virtualized 8TB Data warehouse, daily ETL extending multiple GB/day. Enterprise storage system with 4 dedicated controllers and 16 fiber paths configured for active/active.
- Oracle is occasionally recording average I/O service times greater than 100ms for specific operations involving ETL insert activity. Yet, other times it appears fine.
- From a SAN perspective, we see relatively few I/Os the majority of the time and generally single-digit latency numbers.
- From vSphere we see average latency approaching 15ms and occasionally as high as 45ms.
- Linux is reporting a high percentage of I/O wait and also seeing latencies that agree with the Oracle AWR reports.
The DBAs insist storage performance is terrible but the SAN admins insist we barely even have the SAN’s attention. The vSphere administrators don’t see any issues with CPU Ready or SCSI queuing, the memory and CPU are not over-allocated and there aren’t any other workloads on the vSphere host.
After a lot of digging around specific time frames and attempting to reproduce the issue in a sandbox environment, we discover that the default storage provisioning for new disks is Thin VMDKs. Aha! We don’t want to do that… Let’s inflate the VMDK on the sandbox to fully allocate the 400GB we requested. Our testing expanded it to 115GB, so inflating the remaining 285GB should only take a few minutes. Three hours later, we’re all scratching our heads.
Next we verify that when we see bad insert performance in Oracle we are, in fact, extending Thin VMDKs. Sure enough – that’s the case. Now the same VAAI primitive (WRITE_SAME) that helps with zeroing out large blocks of data also facilitates the initialization of newly extended VMDK space. So, even though we’re Thin, we should at least be able to beat laptop performance with an Enterprise SAN!
First, data volumes under Oracle should NEVER have been provisioned Thin. But at this point, about half of the volumes are Thin VMDKs, so it is what it is. Unfortunately, read performance is correspondingly compromised during the extension. And all this adds up to EXTREMELY BAD performance of the data warehouse ETL processing. In addition, it is inconsistent because there are some Thick volumes provisioned that do not have the same problem.
Checking in vSphere, VAAI is listed as “Supported” on all hosts in the cluster. This should have made initializing the 285GB child’s play. Instead vSphere chugged away for 3 hours. Documentation says that a VAAI status of “Supported” in the vSphere display means that it’s been enabled and verified by vSphere to be functional on this host. What the documentation doesn’t tell you is that status does not mean that “primitives” (or specific acceleration directives) are currently enabled. For that, we need to go to the command line to check on the WRITE_SAME primitive. After “–option” we need to specify the name of any primitive we want to know about:
# esxcli system settings advanced list --option /VMFS3/HardwareAcceleratedInit
Hmmm… ALL primitives seem to be disabled. That explains a few things!
So, what appears to be happening is that Oracle is generating a request to write an 8K block of data (and starting a timer). This translates into a hypervisor request to write to a VMDK that needs to be extended. vSphere now has to extend the VMDK and initialize the storage – comprising about 4 logical operations.
Assuming that it adds 100MB every time this happens, it means first allocating 100MB of storage and then doing (say) 100 requests to write 1MB of zero’s. If each of those I/O operations takes about 1.5ms, which is what the SAN stats are reporting and why there’s no apparent problem at that level. Then, by the time the storage has been allocated and initialized, over 150ms has elapsed since Oracle asked for the data to be written.
Finally, the 8K block is written. vSphere aggregates the 100 I/Os into a single operation (initialize 100MB of data) and reports an average of about 45ms for the four operations it has performed, or 180ms total. We see this in the vSphere performance metrics. Finally, Oracle records about 200ms and then proceeds at full speed until the next time the VMDK has to extend (which may only be a few seconds later). We can see this entire scenario unfolding by analyzing the AWR/Oracle reports, the vSphere performance metrics and the SAN performance metrics.
VAAI was introduced to turn the 180ms into about 18ms (10 times faster) by offloading the work to the SAN hardware as a single operation. So, simply turning on the WRITE_SAME primitive should provide a significant boost in performance to the ETL processing.
There are a few primitives that were introduced specifically to improve the performance of Thin VMDKs. But of course, the correct answer to finally solving this and maximizing Oracle performance on the vBlock is to inflate the VMDKs from Thin to Eager Zeroed Thick (EZT). In our sandbox environment, this resulted in a 10x performance improvement. However, inflating the 4TB of Thin VMDKs to EZT under the data warehouse without the benefit of VAAI would require about 40 hours of downtime. A good estimate with VAAI/WRITE_SAME, would be 10 times faster, but the process could still take as long as 4 hours.
OK we’re educated adults here, so there’s got to be a better way to extend them, right? YES! Let’s use storage vMotion and inflate them on the fly to avoid downtime, albeit with a hit to overall I/O performance. Right, good answer, that’s the ticket. How long is that going to take? Well, with VAAI we instruct the SAN to do it and there seems to be plenty of capacity for the SAN to complete it with minimal impact, so the duration is probably not much of a factor. However, in testing things looks terrible once again. Oh wait, there’s more primitives that affect this, right? You bet, XCOPY and others… and they’re still disabled.
Bottom line: For common administrative tasks, as well as VM storage performance, always ensure you purchase a VAAI complaint storage array and enable all primitives.
Note: primitives are enabled by default and if the “Supported” status shows up, vSphere has verified that they appear to be working.
For the full VAAI reference, refer to this VAAI technical paper from VMware.