As a follow-up to my blog post yesterday, there’s another area that’s worth noting in the resurgence of interest in BC preparedness, and that’s standards. For a long time, we’ve had a multitude of both industry and government standards on BCM management including Australian Standards BCP Guidelines, Singapore Standard for Business Continuity / Disaster Recovery Service Providers (which became much of the foundation for ISO 24762 IT Disaster Recovery), FFIEC BCP Handbook, NIST Contingency Planning Guide, NFPA 1600, BS 25999 (which will become much of the foundation for the soon to be released ISO 22301), ISO 27031, etc. There are also standards in other domains that touch on BC, security standards like ISO 27001/27002.
And when you come down to it, several of the broad risk management standards like ISO 31000 are applicable. At the end of the day, the same risk management disciplines underpin BC, DR, security and enterprise risk management. You conduct a BIA, risk assessment, then either accept, transfer or mitigate the risk, develop contingency plans, and make sure to keep the plans up to date and tested.
In my most recent research into various BCM software vendors and BC consultancies, as well as input from Forrester clients, BS 25999 seems to be the standard with the most interest and adoption. In the US at least, part of this I attribute to the fact that BS 25999 is now one of the recognized standards for US Department of Homeland Security’s Voluntary Private Sector Preparedness Accreditation and Certification Program. The other standards are NFPA 1600 and ASIS SPC.1-2009. I’ve heard very few Forrester clients mention the latter as their standard.
During the last 12 to 18 months, there have been a number of notable natural catastrophes and weather related events. Devastating earthquakes hit Haiti, Chile, China, New Zealand, and Japan. Monsoon floods killed thousands in Pakistan, and a series of floods forced the evacuation of thousands from Queensland. And of course, there was the completely unusual, when for example, ash from the erupting Eyjafjallajökull volcano in Iceland forced the shutdown of much of Western Europe’s airspace. These high profile events, together with greater awareness and increased regulation, have renewed interest in improving business continuity and disaster recovery preparedness. Last quarter, I published a report on this trend: Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And 2011.
Each year for the past three years I've analyzed and written on the state of enterprise disaster recovery preparedness. I've seen a definite improvement in overall DR preparedness during these past three years. Most enterprises do have some kind of recovery data center, enterprises often use an internal or colocated recovery data center to support advanced DR solutions such as replication and more "active-active" data center configurations and finally, the distance between data centers is increasing. As much as things have improved, there is still a lot more room for improvement not just in advanced technology adoption but also in DR process management. I typically find that very few enterprises are both technically sophisticated and good at managing DR as an on-going process.
When it comes to DR planning and process management, there are a number of standards including the British Standard for IT Service Continuity Management (BS 25777), other country standards and even industry specific standards. British Standards have a history of evolving into ISO standards and there has already been widespread acceptance of BS 25777 as well as BS 25999 (the business continuity version). No matter which standard you follow, I don’t think you can go drastically wrong. DR planning best practices have been well defined for years and there is a lot of commonality in these standards. They will all recommend:
If you still subscribe to fixed site recovery services using shared IT infrastructure from the likes of HP, IBM BCRS, or SunGard, among others, you will quickly become a dinosaur in the next 1 to 2 years.
These types of shared infrastructure services involve lengthy restores from tape and a recovery time objective of 72 hours, at best. Plus, you'll be lucky if you recover at all because chances are, you've had trouble scheduling a test with your service provider and it's been a LONG time since the last one, if indeed you’ve ever tested.
72 hours recovery just doesn't cut it anymore. And frankly, understanding your provider's oversubscription ratio to shared infrastructure to determine the risk of multiple invocations, or attempting to negotiate exclusions zones and availability guarantees is a time suck. Most companies are either taking DR back in-house or, if they still rely on a DR service provider, they are using dedicated infrastructure.
TechCrunchIT reported today that a Rackspace data center went down for several hours during the evening due to a power grid failure. Because Rackspace is a managed service provider (MSP), the downtime affected several businesses hosted in the data center.