Posted by Stephanie Balaouras on July 24, 2008
This week Amazon’s Simple Storage Service (S3) suffered a major outage that affected several websites that rely on the service. This is actually the second major outage for Amazon S3 this year. As a result of these and other reported outages, some companies will come to question whether they should pursue these new cloud-based services in the future. I agree with Nick Carr, whether you’re a startup looking to rely on the cloud almost exclusively for computing power and storage capacity or you’re a brick and mortar company who may want to use SaaS services for CRM or an online backup service, these outages should not scare companies away from cloud-based services. Outages are inevitable; no one, not the most sophisticated internal IT shops on Wall Street, or the largest service providers can offer 100% availability all the time. Amazon threw everything it had to fix the problem and was able to address the outage in several hours. How well would you be able to execute on your disaster recovery plan if you had a major outage?
Instead of avoiding cloud-based services, organizations need to be savvier about security and resiliency of the service provider. In fact, your organization may already be in pursuit of these services. Online backup is becoming a viable alternate to premise-based solutions for PC backup as well as remote office backup. Next will be a number of services related to information management such as online archiving and online records management and more online storage offerings to support low cost storage. Further down the road, there will also be hosted, multi-tenancy Exchange solutions. Get involved in these discussions. Don’t take it for granted that the potential service provider has hardened data centers that meet Tier III or Tier IV classifications (these classifications describe data center site infrastructure and topology, Tier IV is the highest rating), that your data is replicated to another data center, that your data is encrypted in flight and at rest and that the service provider has strong security measures in place so that administrators can support the infrastructure but not access or even see your organization’s information. Organizations should have consistent processes before, during and after the contracts have been signed.
And, when you ask about SLAs regarding resiliency, keep in mind that there will be some downtime for routine maintenance and that some unplanned downtime is inevitable. Consider a service provider that might boast about 99.9% availability (8 hours/year outage for 24x7). What is the difference between the following?
· 8 AM to 4 PM on the last Friday of the quarter
· Biweekly outages of 30 min at 4 AM local time
Timing and duration are more important than total downtime/outage.
Get involved in these discussions but be careful not to come off as the obstacle or as the doomsayer. Quite the opposite, you want to be seen as the enabler. Help the organization understand some of the potential risks but then help the organization define its resiliency requirements, security requirements, and risk tolerance. When the organization knows this, it can more confidently go out and select the right service provider, negotiate the appropriate SLAs and be prepared ahead of time with contingency plans for any potential service outages.