Where Are the Network Virtual Appliances? Hobbled By the Virtual Network, That’s Where…
Allan Leinwand from GigaOm wrote a great article asking “Where are the network virtual appliances?” This was followed up by another excellent post by Rich Miller.
Allan sets up the discussion describing how we’ve typically plumbed disparate physical appliances into our network infrastructure to provide discrete network and security capabilities such as load balancers, VPNs, SSL termination, firewalls, etc. He then goes on to describe the stunted evolution of virtual appliances:
To be sure, some networking devices and appliances are now available in virtual form. Switches and routers have begun to move toward virtualization with VMware’s vSwitch, Cisco’s Nexus 1000v, the open source Open vSwitch and routers and firewalls running in various VMs from the company I helped found, Vyatta. For load balancers, Citrix has released a version of its Netscaler VPX software that runs on top of its virtual machine, XenServer; and Zeus Systems has an application traffic controller that can be deployed as a virtual appliance on Amazon EC2, Joyent and other public clouds.
Ultimately I think it prudent for discussion’s sake to separate routing, switching and load balancing (connectivity) from functions such as DLP, firewalls, and IDS/IPS (security) as lumping them together actually abstracts the problem which is that the latter is completely dependent upon the capabilities and functionality of the former. This is what Allan almost gets to when describing his lament with the virtual appliance ecosystem today:
Yet the fundamental problem remains: Most networking appliances are still stuck in physical hardware — hardware that may or may not be deployed where the applications need them, which means those applications and their associated VMs can be left with major gaps in their infrastructure needs. Without a full-featured and stateful firewall to protect an application, it’s susceptible to various Internet attacks. A missing load balancer that operates at layers three through seven leaves a gap in the need to distribute load between multiple application servers. Meanwhile, the lack of an SSL accelerator to offload processing may lead to performance issues and without an IDS device present, malicious activities may occur. Without some (or all) of these networking appliances available in a virtual environment, a VM may find itself constrained, unable to take full advantage of the possible economic benefits.
I’ve written about this many, many times. In fact almost three years ago I created a presentation called “The Four Horsemen of the Virtualization Security Apocalypse” which described in excruciating detail how network virtual appliances were a big ball of fail and would be for some time. I further suggested that much of the “best-of-breed” products would ultimately become “good enough” features in virtualization vendor’s hypervisor platforms.
Why? Because there are some very real problems with virtualization (and Cloud) as it relates to connectivity and security:
- Most of the virtual network appliances, especially those “ported” from the versions that usually run on dedicated physical hardware (COTS or proprietary) do not provide feature, performance, scale or high-availability parity; most are hobbled or require per-platform customization or re-engineering in order to function.
- The resilience and high availability options from today’s off-the-shelf virtual connectivity does not pair well with the mobility and dynamism of de-coupled virtual machines; VMs are ultimately temporal and networks don’t like topological instability due to key components moving or disappearing
- The performance and scale of virtual appliances still suffer when competing for I/O and resources on the same physical hosts as the guests they attempt to protect
- Virtual connectivity is a generally a function of the VMM (or a loadable module/domain therein.) The architecture of the VMM has dramatic impact upon the architecture of the software designed to provide the connectivity and vice versa.
- Security solutions are incredibly topology sensitive. Given the scenario in #1 when a VM moves or is distributed across the pooled infrastructure, unless the security capabilities are already present on the physical host or the connectivity and security layers share a control plane (or at least can exchange telemetry,) things will simply break
- Many virtualization (and especially cloud) platforms do not support protocols or topologies that many connectivity and security virtual appliances require to function (such as multicast for load balancing)
- It’s very difficult to mimic the in-line path requirements in virtual networking environments that would otherwise force traffic passing through the connectivity layers (layers 2 through 7) up through various policy-driven security layers (virtual appliances)
- There is no common methodology to express what security requirements the connectivity fabrics should ensure are available prior to allowing a VM to spool up let alone move
- Virtualization vendors who provide solutions for the enterprise have rich networking capabilities natively as well as with third party connectivity partners, including VM and VMM introspection capabilities. As I wrote about here, mass-market Cloud providers such as Amazon Web Services or Rackspace Cloud have severely crippled networking.
- Virtualization and cloud vendors generally force many security vs. performance tradeoffs when implementing introspection capabilities in their platforms: third party code running in the kernel, scheduler prioritization issues, I/O limitations, etc.
- Much of the basic networking capabilities are being pushed lower into silicon (into the CPUs themselves) which makes virtual appliances even further removed from the guts that enable them
- Physical appliances (in the enterprise) exist en-mass. Many of them provide highly scalable solutions to the specific functions that Alan refers to. The need exists, given the limitations I describe above, to provide for integration/interaction between them, the VMM and any virtual appliances in order to offload certain functions as well as provide coverage between the physical and the logical.
What does this mean? It means that ultimately to ensure their own survival, virtualization and cloud providers will depend less upon virtual appliances and add more of the basic connectivity AND security capabilities into the VMMs themselves as its the only way to guarantee performance, scalability, resilience and satisfy the security requirements of customers. There will be new generations of protocols, APIs and control planes that will emerge to provide for this capability, but this will drive the same old integration battles we’re supposed to be absolved from with virtualization and Cloud.
Connectivity and security vendors will offer virtual replicas of their physical appliances in order to gain a foothold in virtualized/cloud environments in order to intercept traffic (think basic traps/ACL’s) and then interact with higher-performing physical appliance security service overlays or embedded line cards in service chassis. This is especially true in enterprises but poses many challenges in software-only, mass-market cloud environments where what you’ll continue to get is simply basic connectivity and security with limited networking functionality. This implies more and more security will be pushed into the guest and application logic layers to deal with this disconnect.
This is exactly where we are today with Cloud providers like Amazon Web Services: basic ingress-only filtering with a very simplistic, limited and abstracted set of both connectivity and security capability. See “Dear Public Cloud Providers: Please Make Your Networking Capabilities Suck Less. Kthxbye” Will they add more functionality? Perhaps. The question is whether they can afford to in order to limit the impact that connecitivity and security variability/instability can bring to an environment.
That said, it’s certainly achievable, if you are willing and able to do so, to construct a completely software-based networking environment, but these environments require a complete approach and stack re-write with an operational expertise that will be hard to support for those who have spent the last 20 years working in a different paradigm and that’s a huge piece of this problem.
The connectivity layer — however integrated into the virtualized and cloud environments they seem — continues to limit how and what the security layers can do and will for some time, thus limiting the uptake of virtual network and security appliances.
Situation normal.
/Hoff
Related articles by Zemanta
- The Forthcoming Citrix/Xen/KVM Virtual Networking Stack…What Does This Mean to VMware/Cisco 1000v? (rationalsurvivability.com)
- Cloud Providers and Security “Edge” Services – Where’s The Beef? (rational survivability)
- Oh Great Security Spirit In the Cloud: Have You Seen My WAF, IPS, IDS, Firewall…
- Dear Public Cloud Providers: Please Make Your Networking Capabilities Suck Less. Kthxbye (rational survivability)
- The Four Horsemen Of the Virtualization Security Apocalypse (rational survivability)
- Xen packages build-your-own-cloud kit (theregister.co.uk)
- I Found the Missing Piece of the Virtualization Puzzle (devcentral.f5.com)
- CohesiveFT Rocks With VPNCubed For vCloud (cloudave.com)
I'm ready to accept the premise that a "physical appliance" is crusty old non virtual technology and that only if the appliance is a VM running on an x86 server is it cool and sexy. At the end of the day a CPU is processing a packet, what difference does it make if it's an x86 CPU or a customized ASIC? If performance, scalability, and features matter, a physical appliance will always win. If vApp mobility matters more, then I guess that's a good case for a virtual appliance, of which there are plenty, if not easily self created.
Cheers,
Brad
@Brad Hedlund
Ah, but there's the rub. You're thinking about Cloud Service Providers (CSP's) like Terremark or Savvis that build their cloud infrastructure atop of hardware like unified compute/fabric and use VMware/vCloud. That's a *very* different model from that of an all-software abstracted service from someone like AWS…that's the great divide.
@Justin Cormack
I totally agree with you. My "Hamster Security Sine Wave Of Pain" illustrates just this very thing; we'll see some work being put into AppSec/WebAppSec and a lot of it will likely come from PaaS providers who have the capability to enforce/apply more secure code development by nature of the fact that they own the dev. environment.
Chris, this is spot-on and I agree wholeheartedly. For me at least, this is an area where the most exciting "stuff" is happening in Cloud and Virtualization. Despite the fact that NVA products have been around in one form or another since the early days of VMs, I think we're just now starting to see big vendors get serious about their capabilities. No doubt this has everything to do with the emergence of cloud.
The virtualized network appliances suck for all of the reasons you mention–plus two significant ones that you didn't. Working with vendors, I sense two things: they're terrified of piracy (it's hard to pirate a $50,000 firewall, trivial to copy a VM), and they have been married to the idea of 'selling boxes' for the last 20 years. Software/VMs are often cheaper, and who wants to sell a $10k virtual version of a $40k appliance when you've got a quota to meet? Additionally, when CIOs drop tens/hundreds of thousands, they want (need?) blinkenlights.
Also, in your quote above Allan references the Citrix NetScaler VPX product and that it requires XenServer. This was initially true, but has been available on VMware for a few months now. Now I'm biased because I teach the thing, but it has a full set of capabilities in a VM (LB, WAF, SSL VPN, etc.) As you mention, you'd want to avoid doing large-scale SSL acceleration on a virtual appliance, but I applaud those guys for releasing a full version (even one for free). I would love for Cisco, Imperva, and f5 to follow suit. After all, I am a geek and need more toys.
Great article,
Dave
@Brad Hedlund
It’s unclear to me whether you’re just making a statement (rhetorically, of course 😉 or agreeing/disagreeing with me.
The point I’d like to make in response is that in the case of Cloud, we often don’t get the option of being able to use physical appliances, so what then?
/Hoff
Err meant to say “I’m NOT ready to accept the premise” …
Although it will cause a lot of pain, I can’t help feeling that pushing some of the security concerns into application stacks (and using OS capabilities effectively to do this too) is a good thing. It will not be easy though, and much existing software is not very security aware, relying in the network layer to protect it. Amazon is after all developed by people used to writing security aware web scale applications, one suspects the idea of replicating a network topology defined security policy was not at the top of their list (although load balancing was). Microsoft might be closer to that though.
@Hoff
I’m mostly agreeing with your assessment why virtual appliances have not yet seen the success of physical appliances.
As for the case of Cloud, the physical appliance can be provisioned into many virtual appliances and in turn provisioned to a customer like an IaaS offering. Inter-cloud mobility of the appliance IaaS is a bit more challenging. The destination could would need similar hardware and a means to accept an API for dynamic provisioning.
Cheers,
Brad
Aneel responded on his blog (er…)
http://j.mp/aa2C4N
/Hoff
@zerodave
Can you tell us more about your experiences with folks using LB products like VPX in Public Cloud environments…or Zeus? Interested if you’re seeing the same things I am.
/Hoff
Our users (of Zeus Traffic Manager) have had considerable success with our virtual appliances, despite the many good points you have made above. Many of these are symptomatic of the differences in maturity between traditional fixed networks and cloud networks, but are regularly overcome with a little ingenuity on behalf of the admin or in the appliance itself.
The virtual environment does add complexities for appliance software. VMotioning a machine or heavy CPU contention on the host can cause a couple of seconds of dead time which may be enough to trigger alarms within the datacenter. You're exactly right to point out that many clouds do not support protocols like multicast that many appliances rely on for fault tolerance and cluster communications (EC2 support took a lot of re-engineering on our part).
Some vendors artificially hobble or limit their virtual appliances in order to protect their higher-margin hardware appliance business. No one should conclude however that virtual appliances are inferior to hardware ones.
Virtual Appliances don't have to suck. They can handle multi-gigabits of traffic, (>8Gbits, >17,000 SSL TPS in internal tests), they can run effectively (with full fault tolerance) even on clouds with limited networking (e.g. Amazon), they don't have to be hobbled or artificially limited.
Evaluation is quicker and easier, deployment is simple and, in a cloud or service provider environment, you can get all the functionality you need and just pay for what you use.
@beaker
My last three customers for MPX or VPX have all been giant retailers and level 1 PCI merchants. These are people just beginning to embrace virtualization, and public cloud is still a bit off their radar screen. It can be a difficult enough challenge to get them to deploy a virtual appliance in their QA/Dev environments, so I am afraid I don’t have a good deal of juicy cloud war stories to share (yet).
@owen
In specific cases, I also agree that a virtual appliance doesn’t have to suck. Do NVAs have to perform worse than traditional appliances? Absolutely not. They will not always be as efficient at some routines, but with enough processing power this can generally be overcome. The vendors who developed their devices in an ASIC-driven paradigm (f5, Cisco, etc.) are going to have a hell of a fun time recoding their routines to work in a general purpose computing environment, but platforms like Zeus and NetScaler that were developed for x86/x64 have a significant edge. I can’t wait until we get on-board/on-chip crypto and regular expression/pattern-matching in server chipsets. Then the efficiency of NVAs will largely be a thing of the past.
As Chris points out, the real culprit is the lack of a full-featured, standardized virtual network environment, and a lack of introspection to keep up with a dynamic cloud.
Great article. I'm biased as I work at a CSP but imho the main article and the comments so far are spot on. We tend to see classify this evolution in phases. The first and most simple phase being putting "Security in the cloud. " Vendors have taken their classic products and turned them into virtual appliances. While these are essentially unchanged from their physical counterparts, they can and actually do perform quite well especially if the kernel is multi-core aware. We can typically provide the VM with a lot more horsepower than what the vendors similarly priced physical appliance ships with. Additionally they gain all of the synergies of a traditional VM, with the potential to stack lots of them in a small footprint, but allowing them to burst to full capacity when traffic spikes, very quick to deploy in minutes vs. days or weeks, easy to back up, recover, clone, and so on. We've had Checkpoint running in production since 06 and it has performed quite well and continues to do so.
The concerns on revenue recognition for the sales force and piracy as noted are very real but are slowly resolving themselves as the pressure from customers increases and smaller virtual only appliance vendors create competition in the space. This challenge for software vendors also applies to the cloud space as a whole as they struggle to define revenue and comp plans for cloud sales vs. traditional capitalized software licensing sales.
The second wave, "Securing the cloud", is coming on stronger this year. Efforts like VMware's vSafe give the security software vendors a shim in which to allow inspection and control inside the hypervisor. These have the potential to strip away a lot of the complexity required by security devices in a classic topology as they can inspect things inside the vswitches, common memory space and network flows. A centralized , but multi-tentant approach can knock out a lot of layer 2/3 issues and make sure that the security workload is right-sized for the application workload which can make a big dent in the cost, or make better security available to more applications depending on how you look at it. Ideally that type of insertion point would be back ended with a beefy purpose built security processing device, loaded with asics, to scan and defend cloud workloads without encumbering the application stacks as they tend to have a hard enough time getting the job done on their own without the infosec cops getting involved. 🙂
To clarify, I meant Checkpoint in production as a virtual workload in the cloud since 06 in case that was confusing.
Additionally, zerodave's comments are on target as well. The hardware vendors who were offering an x86 server version of their software for Linux or Windows when the "cloud" happened were able to ship their virtual editions to us first. Several that had evolved to customized CPUs or had heavy asic integration are still on the bench, even today trying to work that out.
My favorite story on this topic: We had a fairly high level meeting with some security software product managers back in 07 to try to help them understand the potential for a VM and then ultimately cloud based versions of their products. They went through a lot of effort to explain to us what a significant shift this would be for their software team, sales force, etc.. After about 45 minutes of that, I noticed that their local sales engineer who tagged along for the meeting had a big grin on his face. I asked him what was up, he smiled and turned his laptop around. On the screen were each of the companies products running inside of VMware Workstation as VMs. He explained that the entire sales engineering and software dev teams at the company had been doing this for some time as the company would never approve equipment requests for field demo gear to show customers or for the devs to do software testing..
If you replace the vSwitches with a vFirewall, this becomes moot. Especially if, for example, you have 16 vHosts and 14 vFirewalls are vLinecards and 2 are vSupervisors/vRouteEngines, then who’s to say that vMotion, et al can’t happen transparently.
If you want to determine the tradeoff you’ll have to make to run on OTS hardware, simply take any entry-level ASIC-based device and compare numbers with a similar non-ASIC model.
Also, for any processes that requires an HSM, good luck trying to run that ‘in the cloud.’
Hardware will always out perform software try running software without any hardware. It will not happen even If the software were programmed to run in your brain your brain would still be the hardware. In my opinion trying to use software to replace hardware in most cases is a pointless battle.
what ever you save in power consumption you have already spent to train your IT staff and in software lic. etc or in the specific hardware required to run the virtual hardware. its just a bunch of hype.
I remember when they first come out with DVD playback Decoder software what was the point it sucked and was choppy and unreliable.
the only reason it works better now days is due to faster and more powerful hardware.