Even M(o)ore on Purpose-built UTM Hardware
Alan Shimel made some interesting points today in regards to what he described as the impending collision between off the shelf, high-powered, general-purpose compute platforms and supplemental "content security hardware acceleration" technologies such as those made by Sensory Networks — and the ultimate lack of a sustainable value proposition for these offload systems:
I can foresee a time in the not to distant future, where a quad core,
quad proccessor box with PCI Express buses and globs of RAM deliver
some eye-popping performance. When it does, the Sensory Networks of
the world are in trouble. Yes there will always be room at the top of
the market for the Ferrari types who demand a specialized HW box for
their best-of-breed applications.
Like Alan, I think the opportunities that these multi-processor, multi-core systems with fast buses and large ram banks will deliver is an amazing price/performance point for applications such as security — and more specifically, multi-function security applications such as those that are used within UTM offerings. For those systems that architecturally rely on multi-packet cracking capability to inspect and execute a set of functional security dispositions, the faster you can effect this, the better. Point taken.
One interesting point, however, is that boards like Sensory’s are really deployed as "benign traffic accelerators" not as catch-all filters — as traffic enters a box equipped with one of these cards, the system’s high throughput potential enables a decision based on policy to send the traffic in question to the Sensory card for inspection or pass it through uninspected (accelerate it as benign — sort of like a cut-through or fast-path.) That "routing" function is done in software, so the faster you can get that decision made, the better your "goodput" will be.
Will this differential in the ability to make this decision and offload to a card like Sensory’s be eclipsed by the uptick on the system cpu speed, multicores and lots of RAM? That depends on one very critical element and its timing — the uptick in network connectivity speeds and feeds. Feed the box with one or more GigE interfaces, and the probability of the answer being "yes" is probably higher.
Feed it with a couple of 10GigE interfaces, the answer may not be so obvious, even with big, fat buses. The timing and nature of the pattern/expression matching is very important here. Doing line rate inspection focused on content (not just header) is a difficult proposition to accomplish without adding latency. Doing it within context is even harder so you don’t dump good traffic based on a false positive/negative.
So, along these lines, the one departure point for consideration is that the FPGA’s in cards like Sensory’s are amazingly well tuned to provide massively parallel expression/pattern matching capabilities with the flexibility of software and the performance benefits of an ASIC. Furthermore, the ability to parallelize these operations and feed them into a large hamster wheel designed to perform these activities not only at high speed but with high accuracy *is* attractive.
The algorithms used in these subsystems are optimized to deliver a combination of scale and accuracy that are not necessarily easy to duplicate by just throwing cycles or memory at the problem as the "performance" of the effective pattern matching taxonomy requirements is as much about accuracy as it is about throughput. Being faster doesn’t equate to being better.
These decisions rely on associative exposures to expressions that are not necessecarily orthagonal in nature (an orthogonal classification is one in which no item is a member of
more than one group, that is, the classifications are mutually
exclusive — thanks Wikipedia!) Depending upon what you’re looking for and where you find it, you could have multiple classifications and matches — you need to decide (and quick) if it’s "bad" or "good" and how the results relate to one another.
What I mean is that within context, you could have multiple matches that seem unrelated so flows may require iterative
inspection (of the entire byte-stream or offset) based upon "what" you’re looking for and what you find when
you do — and then be re-subjected to inspection somewhere else in the
byte-stream.
Depending upon how well you have architected the software to distribute/dedicate/virtualize these sorts of functions across multi-processors and multi-cores in a general purpose hardware solution driven by your security software, you might decide that having purpose-built hardware as an assist is a good thing to do to provide context and accuracy and let the main CPU(s) do what they do best.
Switching gears…
All that being said, signature-only based inspection is dead. If in the near future you don’t have behavioral analysis/behavioral anomaly capabilities to help provide context in addition to (and in parallel with) signature matching, all the cycles in the world aren’t going to help…and looking at headers and netflow data alone ain’t going to cut it. We’re going to see some very intensive packet-cracking/payload and protocol BA functions rise to the surface shortly. The algorithms and hardware required to take multi-dimensional problem spaces and convert them down into two dimensions (anomaly/not an anomaly) will pose an additional challenge for general-purpose platforms. Just look at all the IPS vendors who traditionally provide signature matching scurry to add NBA/NBAD. It will happen in the UTM world, too.
This isn’t just a high end problem, either. I am sure that someone’s going to say "the SMB doesn’t need or can’t afford BA or massively parallel pattern matching," and "good enough is good enough" in terms of security for them — but from a pure security perspective I disagree. Need and afford are two different issues.
Using the summary argument regarding Moore’s law, as the performance of systems rise and the cost asymptotically approaches zero, then accuracy and context become the criteria for purchase. But as I pointed out, speed does not necessarily equal accuracy.
I think you’ll continue to see excellent high performance/low cost general purpose platforms to provide innovative software-driven solutions being assisted by flexible, scalable and high performance subsystems designed to provide functional superiority via offload in one or more areas.
/Chris
Recent Comments