Dave Welch (@OraVBCA), CTO and Chief Evangelist
General Session – The “SMP FT” Session – 9:00AM
It’s Christmas, Easter, and my birthday all rolled into one. Three years after the VMworld 2011 live demo, we finally got SMP Fault Tolerance.
In my January 2013 blog post, I wrote:
“I wonder if VMware’s executive leadership has any idea what it is sitting on with SMP Fault Tolerance (SMP FT). I’m wondering if SMP FT could turn out to be the most disruptive technology anyone has seen in years. SMP FT certainly threatens a massive disruption to the clustered relational database market. I make that prediction due to two key SMP FT features that Oracle RAC can’t touch: approximately single-second failover, and no client disconnect.”
SMT FT is in the vSphere 6.0 public beta that anyone can download.
SMP FT in its public beta release comes with a 10-30% performance drag on the protected production VM. That an impressive engineering improvement compared to the VMworld 2012 published performance throughput drag numbers of up to 45% on SMP FT alpha code at that time.(The 2012 published performance numbers were re-published in the VMworld 2013 SMP FT deck without alteration.)
SMP FT has minimal CPU overhead.
Lest anyone think 4 vCPUs is wimpy capacity, that’s enough to handle IU’s Oracle instance ~12,000 registration cycle peak connections. The IU benchmark was on 2008 chips. Today’s 20-30% hyperthreading lift is another bonus. Four vCPU may be what they decided to go GA with. Don’t forget they demo’d 16 vCPU at VMworld 2013 (minute 37) claiming there were no performance deltas at that scale.
Stuff that’s impressive to me:
- Hot configure of FT via UI or API
- Snapshot to backup with API (certified by) several vendors
- Auto FT re-protect upon failure
- VMs on either side can still be vMotioned
- Thin provisioning is now ok
- SMP FT can protect multi-VM stacks as FT is a per-VM configuration
- All the CPU hardware assist works (didn’t in single vCPU)
- It’s just as well they’re forcing fully-reserved RAM on both sides
As of VMworld 2012 (U.S.), the VMware engineering team hadn’t yet decided whether to design with shared disk or disk replica. As of VMworld 2013, disk replica is required. SMP FT in vSphere 6 is architected such that the primary and secondary VMs can have their storage on separate data stores.
BCO2701.1 – vSphere HA and Best Practices and FT Tech Preview: I missed this session at the show because its title wasn’t searchable via “SMP FT.” The SMP FT discussion is in minutes 32 – 37.
The SMP FT that was released in the vSphere 6 beta has changed very little from the VMworld 2013 SMP FT sessions that I highly recommend to understand how the product has evolved.
Julian Wood gets my nod for the best technical overview blog of Tuesday’s GA announcement I’ve come across. Quoting Julian’s post:
“SMP-FT as it’s called works differently than FT for single CPUs. There is a new fast check-pointing mechanism to keep the primary and secondary in sync. Previously a “Record-Replay” sync mechanism was used but the new fast check-pointing has allowed FT to expand beyond 1 x vCPU. Record-Replay kept a secondary VM in “virtual lockstep” with the primary. With fast check-pointing the primary and secondary VM execute the same instruction stream simultaneously making it much faster.”
SMP FT AND ORACLE LICENSING
Oracle’s Michael Timpanaro is a regular Oracle Corporation representative into VMworld. Michael and I have a lot in common: a love for Oracle software, and Italian fluency. I envy Michael for routinely swinging something I’ve never done: a professional gig in Italy.
Although I didn’t see Michael this year, we stepped out for lunch in 2012 right before two simultaneous sessions: Oracle on VMware Licensing (the session Michael had planned to attend), and BCO2655 VMware vSphere Fault Tolerance for Multiprocessor Virtual Machines—Technical Preview and Best Practices. I said, “Michael, you already know what they’re going to say in the licensing session. Come with me and let me show you something that’s a lot more interesting.”
At the end of the SMP FT session, Michael was gigging. “I hope they sell a ton of that because they’re going to have to license the standby side, too!” Well, Michael, you were prophetic if not technically accurate at the time. The original single-vCPU FT used “Record-Replay” which was transmitting the primary’s memory block images to the secondary. As such, no actual workload code was executing on the secondary’s x86 CPU scheduler. So technically, the single vCPU FT paradigm under vSphere 5.5 and older involved no Oracle license obligation for an Oracle workload on the standby during normal operations.
On the other hand, the improved performance drag on the primary under vSphere 6 SMP FT comes at a price in the case of Oracle workloads. With the primary and secondary now executing the same instruction stream, per Oracle contract language, Oracle licensing on the standby most definitely applies.
Beyond the SMP FT Announcement
Moving beyond SMP FT, other interesting announcements for server virtualization:
- Impressive enhancements to stretch vMotion by adding inter-vCenter travel.
- Impressive tooling to push to the hybrid (vCloud Air) cloud including with NSX security properties.
I confess I really like the VMware’s new word “hybridity” as a descriptor of a workload’s ability to be tooled for hybrid cloud.
As impressive as the hybrid cloud tooling advances are, why am I not giving it a lot of attention? Because I’m predicting most of the Oracle DBAs (House of Brick’s primary market), some of whom are being reluctantly pushed into virtualization, are going to demand on-premise for the foreseeable future, minimally to hold their peer administrators responsible.
The best summary blog I’ve seen yet on vVols and everything else in Tuesday’s General Session is this one by VMguru.nl.
VAPP1224 “Applications Using Oracle on vSphere Customer Success Stories Panel” – Tuesday 1:00PM
To apply Jim Collins’ metaphor from his books “Crossing the Chasm” and “Inside the Tornado,” the Business-Critical Application Oracle workload herd is most definitely on the move toward VMware. I’m proud to be a member of the House of Brick team that went from 50% customer representation on Don Sullivan’s 2013 panel to 100% representation on the 2014 panel and did so inside a burgeoning market. All four of the panelists represent massive organizations.
Don, thanks for the shout-out at the beginning of the session. I felt a little uncomfortable with that, since these days I have little if anything to do post-sale with a lot of our customers. I was completely uninvolved post-sale with your panelist enterprises. (Exception: I was the boots on the ground in one of the enterprises for a 1.5 week gig when it became House of Brick’s first customer in 1998.)
Of particular importance is what Dan Young said about the happy Oracle DBA: VMware gives them their after-hours lives back.
Also take note of the panelists’ answers to Don’s question: “Do you run RAC?” Dan Young answered flatly: “No. We’re too cheap.” Don’t believe it. If IU had any workloads that needed the HA SLA only RAC can provide today, I’m betting IU would be there with the rest of the panelists.
I call out Dan Young’s answer as my latest excuse to say I’m thinking about two thirds of our customers for which RAC was their only viable HA solution a decade ago and who can no longer justify RAC under their Oracle workloads. Single instance on VMware HA is now adequate for those two thirds and drives out so much complexity (and threat of instability) and expense. But implied if not stated by the other panelists is if you need RAC, RAC on vSphere is a marriage made in heaven; it allows you to manage RAC’s complexity (and instability threat) end-to-end in the product lifecycle without the configuration burden and cost associated with native hardware RAC.
When Don asked what they want from VMware that they don’t already have, three out of four panelists said, “SMP Fault Tolerance.” I had nothing to do with that chorus (well, at least nothing directly to do with it).
This is the one session I attended where I didn’t take notes. And I learned a lot. Every one of these customers already knew most of what they needed to know to be successful with Oracle on vSphere before House of Brick got involved. I like to think we’re the specialty coach on the sidelines guiding these nameplate teams through a critical play in their overall winning game.
VAPP1449 – Extreme Performance Series: Virtualizing SAP HANA -Tuesday 2:30PM
Don’t blow past this just because you may not be a SAP customer or SAP HANA customer. This session is “required reading” regardless. The breadth and depth of the analysis of the workload on vSphere is probably unparalleled in anything I’ve noticed.
This session is by VMware Corp’s Bob Goldsand & Todd Muirhead. Bob is a fixture among the VMware Corp Oracle SMEs. I got introduced to Todd at Oracle Open World 2010 as I debriefed him on his impressive Oracle RAC on vSphere performance testing.
On average vSphere performance was within 4% of native across the four SAP-mandated HANA test suites.
In two years of testing, there were never any errors or corruptions of any kind. This is one of the most important statements of the show. It’s not that the statement is new, or unique, or surprising, which it is not. It’s that this statement is available to appropriately risk-adverse DBAs and line-of-business leads who need to understand that VMware does nothing to destabilize a workload.
SAP is the world’s ERP granddaddy, which could also mean unparalleled functional breadth and depth. I would also imagine HANA is the world leader in ERP in-memory analytics both functionally and in terms of marketshare. I have Oracle Exalytics in mind as I make that assumption. No functional errors is not surprising at all given the SAP workload is two layers above vSphere. But it’s a key statement in understanding the technical infeasability of vSphere inducing a functional bug into Oracle workloads for example. And here we’re dealing with an ERP vendor that’s friendly to the world’s premier compute platform: vSphere.
SAP HANA Production on vSphere 5.5 is in controlled availability; enterprises apply to get into the program. Bob said, “It’s almost administrative to get into this program. As of now there is no upper bound of the number of program participants.” The application can lead to restricted access to the SAP note. Read no smoke and mirrors; do it at home without supervision if you like.
Bob’s last on-site virtualizing a customer’s HANA took less than a day. In contrast, it had taken the customer months to stand up HANA with their hardware partners/SIs.
vSphere 5.5 introduces a new per-VM feature called Latency Sensitivity. This increases CPU utilization but improves latency. The presenters say normally the feature wouldn’t be used with databases. It allows VMs to exclusively own physical cores and avoid any overhead related to CPU scheduling and contention. Set latency sensitivity to HIGH within vSphere Web client. “Please do not do this with databases,” they implored. (We ought to do a House of Brick blog entry on the feature with respect to database instances.)
ORACLE ON VSPHERE SUB-CLUSTER LICENSING
My only beef with the session was a comment by one of the presenters contrasting SAP’s sub-cluster licensing to another unnamed software vendor that require full vSphere cluster licensing. This was an obvious reference to Oracle. VMware colleagues and other Oracle on vSphere pundits, you command way too much respect to do your customer and prospect base the damaging disservice of joining the chorus of Oracle field reps that spread that non-contractual FUD.
VAPP2309 – Virtualizing SAP: Design Guidelines and How They Are Used in EMC IT’s Successful SAP Implementation – 1:00PM
Although I didn’t attend this EMC SAP case study, I’m told the presenters got the contractual Oracle sub-cluster licensing privilege right. EMC’s Kenneth Paul and VMware’s Vas Mitra mentioned using DRS Host Affinity to carve out Oracle on vSphere sub-cluster(s). EMC uses VCS logs to verify Oracle was limited to certain nodes and did not traverse across the whole vSphere cluster.