Dear Public Cloud Providers: Please Make Your Networking Capabilities Suck Less. Kthxbye
There are lots of great discussions these days about how infrastructure and networking need to become more dynamic and intelligent in order to more fully enable the mobility and automation promised by both virtualization and cloud computing. There are many examples of how that’s taking place in the enterprise.
Incumbent networking vendors and emerging cloud/network startups are coming to terms with the impact of virtualization and cloud as juxtaposed with that of (and you’ll excuse the term) “pure” cloud vendors and those more traditional (Inter)networking service providers who have begun to roll out Cloud services atop or alongside their existing portfolio of offerings.
- On the one hand we see hardware-based networking vendors adding software-based virtual switching and virtual appliance extensions in order to claw back the networking and security functions which have been abstracted into the virtualization and cloud stacks. This is a big deal in the enterprise and especially with vendors looking to stake a claim in the private cloud space which is the evolution of traditional datacenter capabilities extended with virtualization and leverages the attributes of Cloud to provide for a more frictionless computing experience. Here is where we see innovation and evolution with the likes of converged data and storage networking and unified fabric solutions.
–
- On the other hand we see massively-scaled public cloud providers and evolving (Inter)networking service providers who have essentially absorbed the networking layers into their cloud operating platforms and rely on the software functionality embedded within to manifest the connectivity required to enable service. There is certainly networking hardware sitting beneath these offerings, but depending upon their provenance, there are remarkable differences in the capabilities and requirements between them and those mentioned above. Mostly, these providers are really shouting for multi-terabit layer two switching fabric interconnects to which they interface their software-enabled compute platforms. The secret sauce is primarily in software.
For the purpose of this post, I’m not going to focus on the private Cloud camp and enterprise cloud plays, or those “Cloud” providers who replicate the same architectures to serve these customers, rather, I want to focus on those service providers/Cloud providers who offer massively scalable Infrastructure and Platform-as-a-Service offerings as in the second example above and highlight two really important points:
- From a physical networking perspective, most of these providers rely, in some large part, on giant, flat, layer two physical networks with the actual “intelligence,” segmentation, isolation and logical connectivity provided by the hypervisor and their orchestration/provisioning/automation layers.
- Most of the networking implementations in these environments are seriously retarded as it relates to providing flexible and extensible networking topologies which make for n-Tier application mapping nightmares for an enterprise looking to move a reasonable application stack to their service.
I’ve been experimenting with taking several reasonably basic n-Tier app stacks which require mutiple levels of security, load balancing and message bus capabilities and design them using several cloud platform providers offerings today.
The dirty little secret is that there are massive trade-offs with each of them, mostly due to constraints related to the very basic networking and security functionality offered by the hypervisors that power their services today. The networking is basic. Just the way they like it. It sucks for me.
This is a problem I demonstrated in enterprise virtualization in my Four Horsemen of the Virtualization Apocalypse presentation two years ago. It’s much, much worse in Cloud.
Not supporting multiple virtual interfaces, not supporting multiple IP addresses per instance/VM, not supporting multicast or broadcast capabilities for software-based load balancing (and resiliency of the LB engines themselves)…these are nasty issues that in many cases require wholesale re-engineering of app stacks and push things like resiliency and high availability into uncertain waters.
It’s also going to cost me more.
Sure, there are ways of engineering around these inadequacies, but they require additional levels of complexity, more cost, additional providers or instances and still leave me without many introspection options and detective and preventative security controls that I’m used to being able to rely on in traditional networking environments using colocation services or natively within the enterprise.
I’m sure I’ll see comments (public and private) suggesting all sorts of reasons why these are non-issues and how it’s silly to try and replicate the enterprise approach in the cloud. I have 500 reasons why they’re wrong…the Fortune 500, that is. You should also know I’m not apologizing for the sorry state of non-dynamic infrastructure, but I am suggesting that forcing me to re-tool app stacks to fit your flat network topologies without giving me better security and flexible connectivity options simply sucks.
In may cases, people just can’t get there from here.
I don’t want to have to re-architect my app stacks to work in the cloud simply because of a lack of maturity from a networking perspective. I shouldn’t have to. That’s simply backward. If the power of Cloud is its ability to quickly, flexibly, and easily allow me to provision, orchestrate and deploy services, that must include the network, also!
The networking and security capabilities of public Cloud providers needs to improve — and quickly. Applications that are not network topology-dependent and only require a single interface (or more specifically an IP address/socket) to communicate aren’t the problem. It’s when you need to integrate applications and/or infrastructure solutions that require multiple interfaces, that *are* topology dependent and require insertion between these monolithic applications that things break down. Badly.
The “app on a stick” model doesn’t work when enterprises struggle with taking isolated clusters of applications (tiers) and isolate/protect them with physical or virtual appliances that require multiple interfaces to do so. ACL’s don’t cut it, not when I need FW, IPS, DLP, WAF, etc. functionality. Let’s not forget dedicated management, storage or backup interfaces. These are many of the differences between public and private cloud offerings.
I can’t do many of the things I need to do easily in the Cloud today, not without serious trade-offs that incur substantial cost and given the immaturity of the market as a whole put me at risk.
For the large enterprise, if the fundamental networking and security architectures don’t allow for easy portability that does not require massive re-engineering of app stacks, these enterprises are going to turn to niche or evolving (Inter)networking providers who offer them the capability to do so, even if they’re not as massively scaleable, or they’ll simply build private clouds instead.
/Hoff
Chris,seems like dynamic networking is much harder in reality than it appears. 🙂 I mean historically the management software to provision and manage networks is complicated and unreliable. It's one of the reasons so many router/switch jockeys still use telnet to make changes. Even doing something like VLAN steering is delicate.
good stuff. Someone should listen. LOL
The main issue here is the continuing need for accessible control-plane APIs from networking vendors – like Cisco, heh – which can then be dynamically orchestrated by the applications and supporting layer-7 infrastructure.
Of course, networking vendors will try and invert this, but layer-7 should be in control of the network, not the other way around.
Juniper are getting out front with this kind of thing – but Cisco have the potential to dominate, should they ever get around to actually delivering something in this space – and doing so in a cross-platform fashion.
@Roland Dobbins
Absolutely not meaning to be argumentative, but you missed the point.
Control-plane API's from networking vendors basically don't factor into Facebook, Google or Amazon in the scenario I described above. These massive-scale public cloud providers abstract networking into the VMM. All the connectivity, (lack of) security and isolation are done there.
The underlying switching hardware (in their view) requires basically two things: (1) high port density and (2) massive throughput. We're back to massive, flat, layer 2 networks.
I specifically made the point to call out the differences between AWS, etc. and providers like Savvis, Terremark, etc. which are more focused on providing parity of service and capability (in both VMM and networking) between private clouds in public cloud offerings…this isn't a question of better/worse, but different.
The choice of the VMM has a huge effect on how, what and where you define and execute the networking component of the stack.
/Hoff
Application aware switching was pitched about back in 97 by Berkeley Networks. It turned out that their claims were a fraud. Then MPLS was pitched about by Cisco as a way of blunting true QoS networking offered by ATM. Then we started hearing about ASPs and huge data centers that would serve-up all sorts of internet-based offerings to the masses. Nobody was calling it cloud back then. Now today we're in full cloud mode and everyone is clamoring for this feature whilst positioning themselves for the mass migration to on-line computing in one form or another.
Service suckability, for lack of a better term, will always exist. At its core, we're talking about a freaking Internet circuit from which one receives some kind of electronic service. Though arguing who is responsible for delivering what is a valid point of discussion, ultimately the customer must decide what they need and who is best able to deliver. Period.
Administrators should understand what capabilities they possess and what a vendor can provide. Ultimately, the LAN guy will have to manage the inside traffic converging on their WAN link. I suppose a customer of considerable worth could insist the vendor assumes certain responsibilities but this has always been the case. It's a business case: is the reward greater than the cost and risk?
Beyond who's responsible for what services, personally, I think there's a lot of other issues that need to be addressed. I'd start with that most existing IT reporting systems suck so badly management has no idea what IT is doing. A permanent technology fog bank is in-place that provides great "suck coverage" for most day-to-day operations.
That's suck you can believe in.
The only reason that the choice of VMM matters here is because a more ideal solution has not been realized that does true network virtualization. We're going to see it this year, though. It's already in process in a number of providers. And it's not Cisco-based.
–Randy
@Randy Bias
Two things:
(1) Please define "true network virtualization." and
(2) In terms of VMM networking extensions, are you referring to this, by chance:
http://www.rationalsurvivability.com/blog/?p=865
/Hoff