Cloud Catastrophes (Cloudtastrophes?) Caused by Clueless Caretakers?
Enter the dawn of the Cloudtastrophe…
I read a story today penned by Maureen O’Gara titled “Carbonite Loses Cloud-Based Data, Sues Storage Vendor.”
I thought this was going to be another story regarding a data breach (loss) of customer data by a Cloud Computing service vendor.
What I found, however, was another hyperbolic illustration of how the messaging of the Cloud by vendors has set expectations for service and reliability that are out of alignment with reality when you take a lack of: sound enterprise architecture, proper contingency planning, solid engineering and common sense and add the economic lubricant of the Cloud.
Stir in a little journalistic sensationalism, and you’ve got CloudWow!
Carbonite, the online backup vendor, says it lost data belonging to over 7,500 customers in a number of separate incidents in a suit filed in Massachusetts charging Promise Technology Inc with supplying it with $3 million worth of defective storage, according to a story in Saturday’s Boston Globe.
The catastrophe is the latest in a series of cloud failures.
…
The widgetry was supposed to detect disk failures and transfer the data to a working drive. It allegedly didn’t.
The story says Promise couldn’t fix the errors and “Carbonite’s senior engineers, senior management and senior operations personnel…spent enormous amounts of time dealing with the problems.”
Carbonite claims the data losses caused “serious damage” to its business and reputation for reliability. It’s demanding unspecified damages. Promise told the Globe there was “no merit to the allegations.”
…
Carbonite, which sells largely to consumers and small businesses and competes with EMC’s Mozy, tells its customers: “never worry about losing your files again.”
The abstraction of infrastructure and democratization of applications and data that Cloud Computing services can bring does not mean that all services are created equal. It does not make our services or information more secure (or less for that matter.) Just because a vendor brands themselves as a “Cloud” provider does not mean that “their” infrastructure is any more implicitly reliable, stable or resilient than traditional infrastructure or that proper enterprise architecture as it relates to people, process and technology is in place. How the infrastructure is built and maintained is just as important as ever.
If you take away the notion of Carbonite being a “Cloud” vendor, would this story read any differently?
We’ve seen a few failures recently of Cloud-based services, most of them sensationally lumped into the Cloud pile: Google, Microsoft, and even Amazon; most of the stories about them relate the impending doom of the Cloud…
Want another example of how Cloud branding, the Web2.0 experience and blind faith makes for another FUDtastic “catastrophe in the cloud?” How about the well-known service Ma.gnolia?
There was a meltdown at bookmark sharing website Ma.gnolia Friday morning. The service lost both its primary store of user data, as well as its backup. The site has been taken offline while the team tries to reconstruct its databases, though some users may never see their stored bookmarks again.
The failure appears to be catastrophic. The company can’t say to what extent it will be able to restore any of its users’ data. It also says the data failure was so extensive, repairing the loss will take “days, not hours.”
So we find that a one man shop was offering a service that people liked and it died a horrible death. Because it was branded as a Cloud offering, it “seemed” bigger than it was. This is where perception definitely was not reality and now we’re left with a fluffy bad taste in our mouths.
Again, what this illustrates is that just because a service is “Cloud-based” does not imply it’s any more reliable or resilient as one that is not. It’s just as important that as enterprises look to move to the Cloud that they perform as much due diligence on their providers as makes sense. We’ll see a weeding out of the ankle-biters in Cloud Computing.
Nobody ever gets fired for buying IBM…
What we’ll also see is that even though we’re not supposed to care what our Cloud providers’ infrastructure is powered by and how, we absolutely will in the long term and the vendors know it.
This is where people start to freak about how standards and consolidation will kill innovation in the space but it’s also where the realities of running a business come crashing down on early adopters.
Large enterprises will move to providers who can demonstrate that their services are solid by way of co-branding with the reputation of the providers of infrastructure coupled with the compliance to “standards.”
The big players like IBM see this as an opportunity and as early as last year introduced a Cloud certification program:
IBM to Validate Resiliency of Cloud Computing Infrastructures
Will Consult With Businesses of All Sizes to Ensure Resiliency, Availability, Security; Drive Adoption of New Technology
ARMONK, NY – 24 Nov 2008: In a move that could spur the rise of the nascent computing model known as “cloud,” IBM (NYSE: IBM) today said it would introduce a program to validate the resiliency of any company delivering applications or services to clients in the cloud environment. As a result, customers can quickly and easily identify trustworthy providers that have passed a rigorous evaluation, enabling them to more quickly and confidently reap the business benefits of cloud services.
Cloud computing is a model for network-delivered services, in which the user sees only the service and does not view the implementation or infrastructure required for its delivery. The success to date of cloud services like storage, data protection and enterprise applications, has created a large influx of new providers. However, unpredictable performance and some high-profile downtime and recovery events with newer cloud services have created a challenge for customers evaluating the move to cloud.
IBM’s new “Resilient Cloud Validation” program will allow businesses who collaborate with IBM on a rigorous, consistent and proven program of benchmarking and design validation to use the IBM logo: “Resilient Cloud” when marketing their services.
Remember the “Cisco Powered Network” program? How about a “Cisco Powered Cloud?” See how GoGrid advertises their load balancers are f5?
In the long term, like the CapitalOne credit card commercials challenging the company providing your credit card services by asking “What’s in your wallet?” you can expect to start asking the same thing about your Cloud providers’ offerings, also.
/Hoff
Hoff,
Many of the points you've made here are worth repeating and recasting so as to get the message across.
And, I agree with you that the componentry from which a service is constructed makes a huge difference. The notion that "My service uses the best ingredients…" should often be necessary criterion for customer selection of a cloud-based supplier. But, you also know that it's not sufficient. There's nothing that guarantees a service built on even the best cloud-based componentry will be well-designed and immune to failure.
Designing services for scalability is still an art, and not yet a well-understood craft. In our various adventures into cloud computing, the enterprise customer had best be willing to spend the extra dollars and extra attention to alternatives and to recovery from catastrophic failure due to unforeseen design or even insufficient attention to process. We are all just learning about what datacenter (virtual or conventional) resilience, elasticity and scalability mean. And when you throw in stringent requirements for the maintenance of service levels, … well, no need to belabor it. Your point: Just 'cause it says "cloud" is no guarantee. Look at the ingredients. Vet the design. Assess the folks running the *aaS. Along with this brave new world, have they also considered improved ways to operate, administer and manage the offerings? If not, have they at least contained their (and your) risk of failure by emphasizing the adoption of recognized process?
This is going to be a hard-learned lesson for even the enterprise adopters of "private cloud" or in-house utility computing who, after successful deployment on the in-house cloud, figure that scaling out or cloudbursting will be a cakewalk as they take the controls of their "virtual private datacenters."
So, now all we need to do is figure out how to dial back the journalistic hyperbole to which we'll no doubt be subjected at each occurrence of an outage, data breach or commercial failure of a "cloud" player.
= Rich
http://www.willemijns.com/backup.htm, see on the middle of the (free) list to see paid competitors of carbonite 😉
I would like to make sure that your readers understand two points with regard to Carbonite’s lawsuit against Promise Technologies:
1) This event happened over a year ago. We do not say this to minimize the matter. But we do want to point out that this has not happened in a long time and is not an ongoing problem.
2) The total number of Carbonite customers who were unable to retrieve their data was 54, not 7,500.
Here is what happened: The Promise servers that we were purchasing in 2006 and 2007 use RAID technology to spread data redundantly across 15 disk drives so that if any one disk drive fails, you don't lose any data. The RAID software that makes all this work is embedded as "firmware" in the storage servers. In this case, we believe that the firmware on the servers had bugs that caused the servers to crash. Carbonite automatically restarted all 7,500 backups and more than 99% of these were completely restored without incident. Statistically, about 2 out of every 1,000 consumer hard drives will crash every week, so 54 of these customers had their PCs crash before their re-started backups were complete. Since they weren’t completely backed up when their PCs crashed, these customers were unable to restore all of their files from Carbonite. Most of the 54 got some or most of their data back. We took full responsibility for what happened and I did my best to call each of these customers personally to apologize.
As a result of our problems with the Promise servers, we switched to a popular Dell server that uses RAID6 – an improved RAID that allows for the loss of 3 of the 15 drives simultaneously before you lose any data. This configuration is in theory 36 million times more reliable than a single disk drive — the chances of 3 out of 15 drives failing at the same time are almost nil.
So far, Promise has refused to accept responsibility for their equipment’s failures, so now we are suing them to get our money back. The Dell RAID servers have been flawless and we're extremely happy with them.
Dave Friend, CEO
Carbonite, Inc.
@Dave Friend Thanks for clarifying, Dave. Again, my point is that this isn't a "Cloudtasrophe" but rather an issue with infrastructure in general. Were Carbonite not branded as a "Cloud" company, this story would be getting much less attention…