House of Brick Principal Architect
Welcome to a series of posts regarding the virtualization of your business critical SQL Servers. Throughout this series we will be dispelling the various myths and misconceptions around this topic. We will also present specific details around our best practices for a business critical SQL Server virtual machine, operating system, and instance. We will also talk through the process of how to prove that in an apples-to-apples comparison of a physical and virtual SQL Server, the performance is at least equivalent.
Look for us at a SQL Saturday near you! This topic is one that is near and dear to my heart, and I present on this topic frequently.
What is a Business Critical SQL Server, and Why Virtualize It?
A business critical SQL Server is just that – it is a SQL Server that your business absolutely depends on. If this server crashes, or data is lost, your business could fail. At a minimum, your employees could be left with nothing to do while it is down. These are the systems that, as management, you place the most resources and most care for high availability and disaster recovery. They absolutely must be running and recoverable.
The benefits of virtualization on these systems are tremendous. Individually, each one of the following benefits should be enough to get an organization excited about virtualization. Together, these benefits revolutionize the way datacenters are architected and managed.
Added Flexibility, Efficiency, and Agility
When virtualized, the application is effectively freed from the underlying infrastructure. VMs can move from resource node to resource node transparently, allowing the resources underneath to automatically handle growth and spikes without a disruption to business. An administrator can provision new servers in minutes instead of days with pre-configured VM templates.
Improved Disaster Recovery
Due to the decoupled nature of virtual machines from the underlying hardware, disaster recovery of virtual machines is much simpler than that of their physical counterparts. Through multiple means, VMs can be continually replicated to a DR location, audited and tested, and failed over and failed back quickly.
Increase your application uptime with built-in features such as VMware High Availability (HA) and Fault Tolerance (FT). VMware HA can minimize application outages in the event of a hardware failure. VMware FT can eliminate the application outages altogether. Avoid the downtime normally associated with hardware maintenance with vMotion and Storage vMotion.
Easier Development / Test / QA Environments
Ordinarily, constructing development, test, and QA environments that match production requires the same sorts of hardware as the production environment. Keeping these environments in technology and configuration sync with production can be cost prohibitive. With VMware, entire production stacks can be cloned and placed in a development, test, or QA role. These systems can be routinely refreshed with just a few clicks. This will accelerate the application development lifecycle because developers receive a development environment that is seemingly identical to production.
One of the obvious benefits of virtualization is server consolidation. The hardware server count is reduced, which lowers the server support and warranty costs, reduces the hardware footprint in the datacenter, and lowers power and cooling costs. Licensing can also be optimized to save even more capital. I normally speak less on consolidation when virtualizing business critical systems than with other environments or tiers of servers.
Why Virtualize Business-Critical Systems?
First of all, ask yourself – why not? The technology has evolved to the point where it is functionally transparent to the stack.
As of 2010, there are more virtual machines on this planet than there are physical servers. Even though the remaining physical servers are in the minority, these are the vast majority of business-critical systems. Take a look at the bell curve below.
The vast majority of lower-tier servers have already been virtualized. Businesses are sitting, waiting to cross the chasm to the business-critical system. This area is where the vast majority of capital is spent to maintain. This is the area where the lion’s share of the productivity and revenue lies. This is the area that is most vital to the business. This is the area where businesses are most cautious when addressing virtualization.
This is the area where virtualization can benefit the organization the most.
Myths and Misconceptions
A number of myths and misconceptions exist around virtualizing business-critical systems. With proper education, planning, and understanding, these can be eliminated and virtualization can do what it does best – helping your organization’s bottom line.
People seem to harbor a tremendous number of misconceptions around performance. I am constantly shocked when I talk to people who insist that virtualization continues to inflict a serious penalty in performance because of virtualization overhead. Some hypervisors have more overhead than others, and older versions of VMware vSphere did have a noticeable overhead, but VMware vSphere 5 has become transparent. As of a pre-release version of vSphere 5, storage has a 100-microsecond latency per I/O, and this latency linearly scaled all the way to one million IOPs. The only reason this benchmark was ended was because they ran out of storage to attach to the testbed. When it takes a benchmarking team to measure your system’s minute virtualization overhead, I declare it functionally transparent.
More often than not, these performance concerns come from some sort of virtualization trial (or even worse – a failed production go-live) that was performed in the past. Poor results (rightfully so) put a bad taste in people’s mouths. However, the investigation of their virtualization trial normally demonstrates a bad and unfair test. For example, the following diagram demonstrates a typical virtualization proof-of-concept system stack.
On the left is an average production system stack. On the right is the virtualization POC system stack.
What is wrong with this picture? Five dramatic items are different between the two stacks.
- The workloads are not the same. The POC has a much heavier workload placed on it.
- Only one storage path exists.
- The disk configuration is different – RAID-5 versus RAID-10.
- SATA disks are used instead of Fiber Channel drives.
- The amount of service processor read/write cache in the SAN is much lower.
In my experience, most people butcher a business-critical virtualization POC because the host hardware is dangerously overcommitted and the storage is completely overwhelmed. In this scenario, the CPU utilization is guaranteed to cause CPU Ready times to shoot through the roof, which will negatively impact VM performance. Storage performance is already at a disadvantage due to the disk configuration. RAID-5 suffers a write performance penalty when compared to RAID-10, and the lack of cache only magnifies the difference. SATA disks have a lower number of rated IOps than fiber channel disks.
This poorly constructed virtualization POC of business-critical systems is doomed to fail. As a result, the organization will now declare that virtualization cannot handle their top-tier systems.
It does not have to be this way.
With an apples-to-apples virtualization POC, or proper architecture if using equipment that is not at the same performance level of the production stack, the business-critical POC can succeed, paving the way to full production virtualization.
We seem to always to field a lot of questions regarding support stances from different vendors. In truth, some vendors are nicer than others when answering this question. Some vendors have support statements in writing that differ from what their salespeople say. Some vendors have not certified their software for use on a virtual platform and refuse to support it (which tells me they are so woefully ignorant on the topic and/or so lazy that they cannot perform simple engineering validation tests, which are guaranteed to pass).
Microsoft has supported and embraced virtualization for years, and has a published support policy for their applications on VMware. It is officially supported via the Server Virtualization Validation Program. In a nutshell, if the server hardware and hypervisor platform has been validated (and if it is on the VMware HCL, chances are it is), Microsoft supports it.
You can read VMware’s official customer support statement at http://vmware.com/support/policies/ms_support_statement.html.
You can read Microsoft’s support statement, located in KB897615 at http://support.microsoft.com/kb/897615.
Another misconception is that virtualization performance will be progressively negatively affected as the database size increases.
Database size has no impact on performance. Period.
If it works in the physical world, it will work in the virtual world. If you have serious storage performance degradation due to a workload size, you have a misconfiguration somewhere in the stack outside of the virtual infrastructure.
The only factors that matter for database performance are execution counts, concurrent connections, and SQL I/O access paths. Space has nothing to do with it. As the database size grows, concerns emerge such as backup and recovery throughput, disaster recovery operations, and migrations as needed. However, these concerns have nothing to do with virtualization. These are concerns that exist no matter the platform. No distinction between the physical and virtual environment exist!
Licensing is one of those fun topics that, when mentioned, everyone cringes. It should not be. When done right, virtualization can potentially lower your licensing costs. A dedicated SQL Server cluster could be constructed where the physical cores are licensed, or a sub-cluster of an existing vSphere cluster could be licensed. It all depends on your environment, your agreements with Microsoft, and your server architecture. But, never fear licensing. Evaluate your environment and determine how much licensing money virtualization could save you.
In the upcoming parts of this series, I discuss a number of the best practices that we follow that you should be aware of when building your SQL Server virtual machines. These are specific details around the virtual machine, operating system, and SQL Server instance configuration tweaks. I will also discuss how to prove that a virtual SQL Server performs as well as the physical counterpart in an apples-to-apples comparison.
Stay tuned, and check back in a couple of weeks for the next part of this series where I discuss the perfect build of a SQL Server virtual machine!