Dave Welch, CTO and Chief Evangelist
I’m finally getting around to blogging on Oracle RAC One Node, coincidentally announced on the second day of VMworld.
The day Oracle announced this (September 1st), I took a quick look at the Oracle-provided collateral. It said, “…you can migrate the instance to another node in the cluster using the Omotion online migration utility with no downtime for application users”. I immediately had a negative reaction to that:
“Omotion” – clever name. Any attempt to piggy-back through implication on VMware’s successful “VMotion” capability?
In a RAC environment, there is no ability to hot-move a client connection. That’s unchanged through 11g. So I guessed that this involved installing the RAC views in the database, temporarily firing up a second RAC instance, connecting to the new instance, then disconnecting from the old.
I told colleagues at VMworld that I was betting this would have some tool that gave you say 30 minutes to get all your connections migrated, prior to killing off any straggler sessions on the original instance.
The Oracle RAC SIG webinar on September 24th pretty much confirmed my hunches. The Omotion utility gives you a default/maximum migration window of (drum roll…) 30 minutes to get your connections moved. And oh by the way, if you should happen to have a long running query that can’t clear out in 30 minutes, tough luck; there’s gonna be “downtime for application users” after all. No big deal? I’d say 90% of our customers’ OLTP systems are hybrid, involving canned or ad-hoc decision support, or both.
Other than looking up the Oracle RAC SIG webinar abstract paragraph, I’m writing this weeks later without reviewing the webinar ppt. The speaker made the point that clients/middle tiers should pick up the instance migration alert from Oracle Notification Server, and do their automatic disconnect/reconnect thing. Funny, as a former Oracle University RAC instructor, I’ve been promoting the ONS framework for years and encouraging RAC shops to code to it. The problem is, in all of our travels, it appears to be extremely rare for RAC shops to code to the ONS framework. That I may think they’re missing a huge opportunity, doesn’t change the fact that ONS is seriously under-utilized.
On the one hand, Oracle has declined to support RAC on VMware for years, hiding behind this statement in the published RAC FAQ: “…there are technical restrictions that prevent the certification of RAC in a VMware environment.” On the other hand, I’ve never seen Oracle attempt to go more aggressively after VMware line for line than in the “RAC One Node” webinar. I find it incongruous that they are willing to go into unprecedented detail in their RAC One Node/VMware product comparison, yet have been unwilling to provide any detail whatsoever as to what the alledged RAC-on-VMware problems are. This is all the more frustrating given the two-year-old announcement that RAC will be not only supported but certified on a (still) future Oracle VM release. What is the probability that any vendor’s comparatively challenged Xen platform (judgment call mine—another blog entry for another day), is going to come up with a solution for some as yet unspecified RAC-on-VMware technical restriction, that VMware and/or the Linux kernel development team haven’t long since made moot? Since our team did our first customer production-grade RAC-on-VMware install in the spring of 2007, I’d put those odds at close to nil.
So what about the detailed RAC One Node/VMware comparison? I remember feeling some of the line items were spurious, and thought others were given undue emphasis in the talk. Here’s one off the top of my head. They were saying that encapsulating the OS in a VM provides savings for the server hardware guys, but not for OS system administration. I have a lead SA in mind who I’d like to get on the phone, tell him I’m about to hit the record button, and see if he can think of any benefits to system administration on VMware at the OS level only. I’ll start recording first, then give him the question because I want to capture the laughter. I can think of a couple off top of my head:
Reduced if any need to re-install the OS on new machines due to VM templates
Shops that get away with running very old unsupported Windows versions on VMs, that would never dream of continuing to do it on fully-depreciated unsupported hardware
Not mentioned in the webinar, is the isolation benefit that VMware provides, that’s difficult to attain in a native environment short of following the long-standing, expensive recommendation of running production RAC instances on dedicated nodes. I’ve always agreed with the recommendation’s technical merits. But finding shops that practice it seems just about as difficult as finding shops that code to ONS. RAC One Node doesn’t help me there. On the other hand, RAC on VMware gives me all the isolation benefits while letting me load up the node with as many other workloads as I can get away with. That blends substantial technical and financial benefits, which is why we’ve been promoting the stack’s virtues for years.
I wonder if the Oracle RAC SIG VMware comparisons were misplaced in terms of audience. I suspect the Oracle RAC SIG audience is largely technicians and technical leads. I’d be surprised if they were swayed by the competitive analysis. Two years ago in our VMware Oracle Solutions Lab at OOW 2007, a steady stream of DBAs told us how wide spread VMware Infrastructure was in their shops, including Oracle test and dev instances, except for (at the time) Oracle production. It’s getting harder to find DBAs that aren’t familiar with VMware’s proven operational capabilities. So I wonder if Oracle did more harm than good with this presentation. If I were Oracle and had any hope of staving off the increasing rush of interest in Oracle on VMware by announcing Oracle RAC One Node, I would have attempted to suggest those comparisons in a C-level event, if I put them forth at all.
By the way, the tooling necessary to run Oracle RAC One Node wasn’t in the 11g R2 GA release. In the webinar, they said that they needed more time in QA with the tools. Hmmm.
What about pricing? $10K per processor, except the temporary 2nd RAC One Node migration instance is free subject to the 10-day rule. So where’s the flexibility in that, short of calling it $20K per processor? Imagine VMotion or DRS restricted to 10 cumulative migration days per year. With vSphere 4’s ability to increase a VM’s core count on the fly, I’d rather take a fraction of that $10K per processor, and buy VI 4 Advanced at $2,245 per processor. Run single instance Oracle on that, and I’d have true uninterrupted workload migration for all transaction types, let alone the other features in VI 4 Advanced. At risk of sounding too demanding, at least I’d get a control GUI out of that deal.
I’ll finish by commenting on the webinar’s abstract:
“Server virtualization in the data center promises to reduce server footprint while providing flexibility, load balancing and high availability for any application running in a virtual machine. However, adoption for database as been slowed due to limitations in the technology. Oracle now provides Oracle Real Application Clusters-One as an alternative virtualization technology. Oracle Real Application Clusters-One provides the benefits of server virtualization, and more, with easier consolidation, more flexible load balancing, and better high availability features. This session will introduce Oracle Real Application Clusters-One, and explain how you can use it to standardize your environment while increasing flexibility and agility”
“…adoption for databases as (sic) been slowed due to limitations in the technology”? It’s our observation that adoption has been slowed not because of technology, but primarily due to mixed messages from Oracle. Oracle Sales has an adversarial stance to customers running VMware, no doubt due to the license revenue threat. Yet Oracle support consistently provides quite a different experience. And, since this is a RAC SIG webinar, Oracle’s refusal to detail the RAC on VMware support issue could make this “limitations” statement sound like just more hiding.
I don’t see “…the benefits of server virtualization, and more…” for the reasons I’ve discussed.
I don’t see “…easier consolidation…”. For starters, has Oracle pulled back on the one-RAC-instance-per-node recommendation?
I don’t see “…more flexible load balancing…”. I don’t even see equivalent load balancing. Can RAC One Node automatically move active workloads let alone long running workloads without restrictions, and with no connection awareness on the part of the clients?
As for better HA features, yes, we’ve always told everybody that RAC has Cadillac HA features that are superior to VMware HA for shops with explicit SLAs tight enough to justify the significant expenditure, and scale so large that they’re knocked out of the Oracle 10g SE RAC bundle. But that only applied to what I’m going to have to start calling “Real RAC”–the EE RAC line item, or 10g SE RAC bundle, both of which have multiple RAC instances in normal operations. “Real RAC” provides the superior HA solution. RAC One Node can’t provide better HA than VMware HA or IP stack fail-over solutions, because there’s no second hot RAC instance waiting to catch the fail-over football in the event of a production emergency.
The final sentence suggests using RAC One Node to “…standardize your environment while increasing flexibility and agility”. Well…
…the moment I ask any of our single instance Oracle-on-VMware customer shops to scrap that stack and go native with Oracle RAC One Node for purposes of “…increasing flexibility and agility”, I’ll probably be subjected to an impromptu comparative features analysis that you can take to the bank.