The Emotion of VMotion…
A lot has been said about the wonders of workload VM portability.
Within the construct of virtualization, and especially VMware, an awful lot of time is spent on VM Mobility but as numerous polls and direct customer engagements have shown, the majority (50% and higher) do not use VMotion. I talked about this in a post titled “The VM Mobility Myth:”
…the capability to provide for integrated networking and virtualization coupled with governance and autonomics simply isn’t mature at this point. Most people are simply replicating existing zoned/perimertized non-virtualized network topologies in their consolidated virtualized environments and waiting for the platforms to catch up. We’re really still seeing the effects of what virtualization is doing to the classical core/distribution/access design methodology as it relates to how shackled much of this mobility is to critical components like DNS and IP addressing and layer 2 VLANs. See Greg Ness and Lori Macvittie’s scribblings.
Furthermore, Workload distribution (Ed: today) is simply impractical for anything other than monolithic stacks because the virtualization platforms, the applications and the networks aren’t at a point where from a policy or intelligence perspective they can easily and reliably self-orchestrate.
That last point about “monolithic stacks” described what I talked about in my last post “Virtual Machines Are the Problem, Not the Solution” in which I bemoaned the bloat associated with VM’s and general purpose OS’s included within them and the fact that VMs continue to hinder the notion of being able to achieve true workload portability within the construct of how programmatically one might architect a distributed application using an SOA approach of loosely coupled services.
Combined with the VM bloat — which simply makes these “workloads” too large to practically move in real time — if one couples the annoying laws of physics and current constraints of virtualization driving the return to big, flat layer 2 network architecture — collapsing core/distribution/access designs and dissolving classical n-tier application architectures — one might argue that the proposition of VMotion really is a move backward, not forward, as it relates to true agility.
That’s a little contentious, but in discussions with customers and other Social Media venues, it’s important to think about other designs and options; the fact is that the Metastructure (as it pertains to supporting protocols/services such as DNS which are needed to support this “infrastructure 2.0”) still isn’t where it needs to be in regards to mobility and even with emerging solutions like long-distance VMotion between datacenters, we’re butting up against laws of physics (and costs of the associated bandwidth and infrastructure.)
While we do see advancements in network-driven policy stickiness with the development of elements such as distributed virtual switching, port profiles, software-based vSwitches and virtual appliances (most of which are good solutions in their own right,) this is a network-centric approach. The policies really ought to be defined by the VM’s themselves (similar to SOA service contracts — see here) and enforced by the network, not the other way around.
Further, what isn’t talked about much is something that @joe_shonk brought up, which is that the SAN volumes/storage from which most of these virtual machines boot, upon which their data is stored and in some cases against which they are archived, don’t move, many times for the same reasons. In many cases we’re waiting on the maturation of converged networking and advances in networked storage to deliver solutions to some of these challenges.
In the long term, the promise of mobility will be delivered by a split into three four camps which have overlapping and potentially competitive approaches depending upon who is doing the design:
- The quasi-realtime chunking approach of VMotion via the virtualization platform [virtualization architect,]
- Integration distribution and “mobility” at the application/OS layer [application architect,] or
- The more traditional network-based load balancing of traffic to replicated/distributed images [network architect.]
- Moving or redirecting pointers to large pools of storage where all the images/data(bases) live [Ed. forgot to include this from above]
Depending upon the need and capability of your application(s), virtualization/Cloud platform, and network infrastructure, you’ll likely need a mash-up of all three four. This model really mimics the differences today in architectural approach between SaaS and IaaS models in Cloud and further suggests that folks need to take a more focused look at PaaS.
Don’t get me wrong, I think VMotion is fantastic and the options it can ultimately delivery intensely useful, but we’re hamstrung by what is really the requirement to forklift — network design, network architecture and the laws of physics. In many cases we’re fascinated by VM Mobility, but a lot of that romanticization plays on emotion rather than utilization.
So what of it? How do you use VM mobility today? Do you?
/Hoff
Hoff,
Any thoughts on technologies such as FastScale and the potential to reduce VM bloat?
Steve
@Steve Todd
Great point, Steve. I had wished others had brought that up in the other post, but I think it's a great point. FastScale markets their capability to provide automagic building on-demand JeOS stacks:
With patent-pending Application Blueprint™ technology, FastScale Stack Manager can automatically identify the precise operating system dependencies of your application and build a lightweight server with JeOS, on-demand. Configuring your logical server for JeOS results in a server build that is up to 95% smaller utilizing 75% or less system resources. And with fewer system services deployed, lightweight server builds are more secure by design. Application Blueprinting can also be used as an insurance policy against deploying legacy software stacks with missing file dependencies by comparing the server build to an Application Blueprint — missing files can be added with a simple mouse click.
…that is a good thing. I think it is a step in the right direction but we'll have to see how Cloud providers follow suit in their roll-outs to enable this sort of capability.
/Hoff
Meh, even if you have the kernel and just enough libraries to make your app run you still have a hell of a lot of bloat in there. Worse, much of it is doing funky things to talk to esoteric hardware – the virtual equivalent of which must do funky things in response (think of a read/write passing unnecessarily through IDE!?!). Sure paravirtualisation solves some of that but how many of us are using paravirtualisation (properly) today and even then how much overhead is left?
JeOS still sucks (only marginally less than full OS). It's not about the size of the image – if files go unused then they're benign except for taking up space. And if you remove them and later find that you've got NeOS (not enough OS) then you're up the creek without a paddle.
Sam
@Sam Johnston
Depends on the the jumping-off point, no? I talked to an ISV a few months ago that can wrap up a full-featured LAMP stack in 30MB and run your web app. They claimed they could get WinServ2003 down to about 600MB, which would be the other end of things.
But who cares if it that's too drastic? Need more? Start over, drop your instance, pull up the bigger one, its not a big deal. Design considerations and implementation of such can now be done in hours, not weeks. Anyone planning for a product/app stack in the cloud can do it how ever they like, and fix it whenever they like. that's where i see really interesting changes in how people work.
More to the larger point, who cares? plenty of cloud to go around, right? Its all free, endless and awesome, no? 😉
The underlying reason for vmotion and the like are increased availability. "Availability of what" should be the question.
VMotion is designed to allow reliability of a particular machine. What we should be concentrating on is reliability of the service, and truly reliable infrastructure will tolerate the loss of a machine.
<blockquote cite="#commentbody-21526">
Matt Simmons :
VMotion is designed to allow reliability of a particular machine. What we should be concentrating on is reliability of the service, and truly reliable infrastructure will tolerate the loss of a machine.
TOTALLY! VM's are (I hope) the first level of reasonable abstraction that will allow us to once again focus on what service oriented architectures were supposed to deliver — SERVICES.
/Hoff
Our customers tend to use vMotion in most deployments, even small ones. The reason for this is a bit ridiculous, in that the single largest driving factor behind purchasing the vMotion license is the need to be able to *patch* the ESX hosts without downtime.
Basically to patch you need the ESX hosts to be in maintenance mode, which means vacating VM's. Which means downtime, unless you've got vMotion.
It's a sad state of affairs when the only business case for buying fancy live migration capability is the need to resolve security and other issues in the host OS.
Ahh, now here's a subject close to my heart. Having come from IBM Power Systems where mobility was a big deal EXACTLY because you were running one honking great big physical server with potentially hundreds of virtual servers(really!) to Dell where its most normal x86 style workloads, I've been asking myself exactly this lately. What problem is vmotion trying to solve? It's another layer of complexity, which brings a layer of automation and complexity on top of it in order to make it work, to solve what? Generally application availability issues.
First up x86 business class servers are no longer the keep failing, OS is full of bugs, that they once were. Both Windows and Linux have contributed to this, as has VMWare. Secondly, all the vendors have had to focus on quality of components and design, much more than they used to.
Secondly, both in the open source and proprietary application lkandscape, software has significantly settled down and there is less churn. While its true that most commercial and open source software is on a drum beat of new releases every 12-15 months, users are less and less likely to install them. What they have mostly works and the new features are often less and less compelling.
Result? Platform stability. Where the platform is stable and not failing all the time it becomes more and more interesting to look at workload distribution, equalizing workloads etc. Cloud has become more important in this space, and vMotion is really relatively uninteresting since it only allows you to move the whole workload either to another server, or to the cloud(notionally). Given it introduces a level of complexity that is not really giving anything else except the ability to move a whole VM, I think it won't be long before many users wake up and say they don't need it, or that it's just the emperors new clothes, another way for IT to sell their services.
I'm going to reply to your Virtual Machines are the problem post via a trackback on my own blog.