Performance Implications Of Security Functions In Virtualized Environments
In my VirtSec presentations, I lead my audience through the evolution of virtualized security models that describes what configuration and architecture options we have in implementing existing and emerging security solutions both now as well as projected out to about 3 years from now.
I’ll be posting that shortly.
Three of the interesting things that I highlight that result in having light bulbs go off in the audience are when I discuss:
- The compute (CPU) and I/O overhead that is added by security software running in either the VM’s on top of the guest OS’s, security virtual appliances in the host, or a combination both.
- The performance limitations of the current implementations of virtual networking and packet handling routines due to virtualization architectures and access to hardware
- The complexity imposed when having to manage/map a number of physical to virtual NICS and configuring the vSwitch and virtual appliances appropriately to manipulate traffic flows (at L2 and up) through multiple security solutions either from an intra-host perspective, integrated with external security solutions, or both.
I’m going to tackle each of these issues in separate posts, but I’d be interested in speaking to anyone with whom I can compare results of my testing with.
Needless to say, I’ve done some basic mock-ups and performance testing with some open source and commercial security products in virtualized configurations under load, and much of the capacity I may have gained by consolidating low-utilization physical hosts into a virtualized single host is eroded by the amount of processing needed by the virtual appliance(s) to keep up with the load under stress without dropping packets or introducing large amounts of latency.
Beware of what this might mean in your production environments. Ever see a CPU pegged due to a runaway process? Imagine what happens when every packet between virtual interfaces gets crammed through a virtual appliance in the same host first in order to "secure" it.
I made mention of this in my last post:
The reality is that for reasons I’ve spoken of many times, our favorite ISV’s have been a little handicapped by what the virtualization platforms offer up in terms of proper integration against which we can gain purchase from a security perspective. They have to sell what they’ve got while trying to remain relevant all the while watching the ground drop out beneath them.
These vendors have a choice: employ some fancy marketing messaging to make it appear as though the same products you run on a $50,000+ dedicated security appliance will actually perform just as well in a virtual form.
Further, tell you that you’ll enjoy just as much visibility without disclosing limitations when interfaced to a virtual switch that makes it next to impossible to replicate most complex non-virtualized topologies.
Or, just wait it out and see what happens hoping to sell more appliances in the meantime.
Some employ all three strategies (with a fourth being a little bit of hope.)
This may differ based upon virtualization platforms and virtualization-aware chipsets, but capacity planning when adding security functions is going to be critical in production environments for anyone going down this path.
/Hoff
"when every packet between virtual interfaces gets crammed through a virtual appliance in the same host first in order to "secure" it."
Or – when every packet gets crammed through a *series* of virtual appliances that we've chained together to get us the functionality of the typical stack of dedicated appliances that we currently have.
And when the accidental 802.1d spanning tree loop inevitably happens, as it does at least once on almost every complex L2 redundant network, what operational access will we have to the virtual appliances and their virtual consoles for troubleshooting? Will we even *know* that the virtual switch is slinging a million packets per second in a circle? Will the box be so badly buried that that it takes its own management processes and interfaces down with it as it descends into its death spiral? Where is the wire that we unplug to break the loop? A virtual plug?
Dedicated appliances, especially ones that operate at layer 2 with real ASIC's and real 9600,n,8,1 consoles, have interesting properties that can make them quite robust and even somewhat manageable while being subjected to high loads, DOS attacks, L2 problems, etc. Out-of-band serial management is one property, separation of packet processors from management processors is another.
A slammer-like incident all contained within a fully virtualized stack on a quad-core, quad-socket something or another with a hundred virtual severs would be most amusing to watch. (And probably impossible to mitigate short of a complete re-boot.) And of course, when you recover the whole mess by disconnecting the *physical* networks and re-booting, will you have enough bits left behind to reconstruct what happened?
As much as I hate to say it, I'd almost rather see a network hardware vendor build a switch fabric in silicon on network card for my virtual servers. This would give me the network vendors decades of accumulated network knowledge burnt into a chipset as the core of my virtual network fabric.
@michael:
"Or – when every packet gets crammed through a *series* of virtual appliances that we've chained together to get us the functionality of the typical stack of dedicated appliances that we currently have."
…is exactly right (and what I meant to communicate.)
You're one comment ahead of of one of my next posts as I was about to suggest that we might just see "appliances" collapsed in the form of PCI cards or fatter NIC cards featuring FPGAs to bring the appliance back in the box. I plan to contrast that with the I/O virtualization solutions I blogged about earlier where all that stuff will get done in a big honkin' switch external to the host.
I (sadly) really like your death spiral analogy.
Thanks for the comment!
/Hoff
Chris,
Couldn't agree more with this post. My two core problems with virtual networks (before we even get to performance) are 1) they're all software and 2) people don't yet know how to manage virtual software switch environments. Michael nailed this idea: you can't drop a tap on a virtual switch or dump the CAM tables b/c the same software that would support a tap is also passing the packets in the first place. So absolutely, when you start adding security functions to this mix, you're going to see big performance problems and a huge bottleneck at the network, both in the form of packets and in the shared resources used to manage those 10 guests all funneling back to that single e1000 ESX driver.
Just think about running SNORT and doing full packet re-assembly on a virtual guest: not only would SNORT be competing for compute resources against the other guests and the software switch itself, but it would also be storing these packets in memory shared by both the guest and the host stack. Imagine the buffer overflow exploit potential there: overflow the SNORT guest buffer, own the host stack.
My ideal solution would be the best of a few worlds: start with purpose-built hardware to manage the switching (ala ASICs), throw a hypervisor on top of that to manage network calls, and then build security directly into the hypervisor with an off-board management component to monitor for things like overflows to the ASICs software buffer. If you move just the network brawn into hardware, then you get such a huge gain in pure line speed and offloading packet computing power that any performance losses from security solutions not baked right into the hypervisor would be offset.
I'm working on a post over at thevirtualdc.com on the idea that we should be leveraging the hypervisor for more than just an OS platform tool, specifically so it can become the brains of the virtual OS operation and do things like manage network hardware directly and provide security. I think that could solve a fair number of the problems we've all listed.
Nice post. 🙂
-Alan
Can you provide more information on your testing parameters? It would be interesting to know how much the Snort ruleset was tuned, and what kind of throughput levels you were using.
Also, did you find a "knee of the curve", where performance dramatically dropped off after a certain threshold? Or did performance degrade linearly?