On Monday, Hurricane Sandy slammed into the East Coast of the United States, flooding entire towns in New York and New Jersey, triggering large-scale power outages and killing at least 17 people. The health and safety of individuals is the first and foremost priority, followed by the recovery of critical infrastructure services (power, water, hospital services, transportation etc.). As these services begin to recover, many business and IT leaders are wondering how they will resume normal operations to ensure the long-term financial viability of the company and the livelihoods of their employees and how they will serve their loyal customers.
Most likely, if you have offices that lie in the path of Hurricane Sandy, you are experiencing some sort of business disruption, large or small. The largest enterprises, especially those in financial services, spend an enormous amount of money on business, workforce and IT resiliency strategies. Many of them shifted both business and IT workloads to other corporate locations in advance of the storm, proactively closed offices and directed employees to work from home or a designated alternate site.
If you are small and medium enterprise and, like many of your peers, you didn’t have an alternate workforce site, robust work-from-home employee capabilities, an automated notification system or a recovery data center, what do you do now? While it’s too late to implement many measures to improve resiliency, there are several things you can do now to help your organization return to normal operations ASAP. Here are Forrester’s top recommendations for senior business technology leaders:
At the recent Disaster Recovery Journal Fall World conference, I gave a presentation of the state of BC readiness. I had some great discussions with the audience (especially about where BC should report), but one of the statistics that really stood out for me and I made it a point to emphasize with the audience, is the state of partner BC readiness.
According to the joint Forrester/Disaster Recovery Journal survey on BC readiness, 51% of BC influencers and decision-makers report that they do not assess the readiness of their partners. If this doesn’t shock you, it should. Forrester estimates that the typical large enterprise has hundreds of third-party relationships – everyone from supply chain partners to business process outsourcers, IT service providers and of course cloud providers. As our reliance on these partners increases so does our risk – if they’re down, it greatly affects your organization’s business performance. And with the increasing availability of cloud services, the number of third parties your organization works with only increases, because now, business owners can quickly adopt a cloud service to meet a business need without the approval of the CIO or CISO and sometimes without the approval of any kind of central procurement organization.
Even among those organizations that do assess partner BC readiness, their efforts are superficial. Only 17% include partners in their own tests and only 10% conduct tests specifically of their critical partners.
During the past three years, you may have noticed that security and risk professionals have added a new term to their lexicon – business resiliency. Is this just an attempt by vendors to rebrand business continuity (BC) and IT disaster recovery (DR) in much the same way that vendors rebranded information security as cybersecurity to make it seem sexier and to sell more of their existing products? Some of it certainly is rebranding. However, like the shift in the threat landscape from lone hackers to well-funded crime syndicates and state sponsored agents that precipitated the use of the term cybersecurity, a real shift has also taken place in BC/DR.
If you look up the term “resiliency” in the dictionary, it’s defined as “an occurrence of rebounding or springing back”. Thus, business resiliency refers to the ability of a business to spring back from a disruption to its operations. Historically, BC/DR focused on the ability of the business to recover from a disruption. Recovery implies that there was in fact a disruption, that for some period of time, business operations were unavailable, there was downtime as the business strove to recover. Resiliency, on the other hand, implies that an event may have affected the business’ operations, perhaps the business operated in a diminished state for some period of time, but operations were never completely unavailable, the business was never down.
The current state of business continuity management (BCM) standards? Abysmal. According to a joint Forrester/DRJ study, 69% of respondents said that British Standard (BS) 25999 did not influence or only somewhat influenced BCM at their company. It’s not much better for NFPA 1600, 70% of respondents said that it did not, or only somewhat, influenced BCM at their company. I find this shocking. BS 25999 is one of the most widely recognized standards for BCM worldwide and NFPA 1600 has been popular in the US for years. In addition, the U.S Department of Homeland Security’s Private Sector Preparedness Program (PS‑Prep) recognizes both of these standards for assessing preparedness. If you’re wondering what standards respondents named in the “Other” category, it was mostly the Federal Financial Institutions Examination Council (FFIEC) and NIST. Not surprising but also a little disheartening, it’s clear that unless compelled to do so, most BC professional would not adopt or follow a BCM standard.
Even if you don’t intend to certify to these standards, they should strongly influence your BCM program. Why? It’s because:
They provide a foundation and a common vocabulary for BCM best practices and processes. This is important if you need to implement BCM across a geographically dispersed enterprise or you have to work with a multitude of global partners on joint preparedness.
In a recent Forrester/DRJ joint survey on BC preparedness, of organizations that have invoked a BC plan in the last five years, 37% said that their BC plans had not adequately addressed communication. In my experience, I’ve found that many organizations:
Don’t appreciate the importance of effective communication. Many organizations focus the content of their BC plans and the goals of their BC exercises on the details of recovery procedures but don’t focus on how they will contact and coordinate response teams, employees, partners, first responders and customers. If you can’t communicate, you can’t respond to anything.
Rely on manual procedures like call lists or email alone. By themselves, manual procedures are unreliable, they don’t scale for organizations with thousands of employees (or citizens) and they don’t provide any kind of reporting.
Underestimate the difficulty of communicating effectively under stress. During the incident is not the time to attempt to craft effective communication messages or look for a secondary mode of communication because your first mode of communication (land lines and email) is no longer available.
A recent RFP for consulting services regarding strategic platforms for SAP from a major European company which included, among other things, a request for historical and forecast data for all the relevant platforms broken down by region and a couple of other factors, got me thinking about the whole subject of the use and abuse of market share histories and forecasts.
The merry crew of I&O elves here at Forrester do a lot of consulting for companies all over the world on major strategic technology platform decisions – management software, DR and HA, server platforms for major applications, OS and data center migrations, etc. As you can imagine, these are serious decisions for the client companies, and we always approach these projects with an awareness of the fact that real people will make real decisions and spend real money based on our recommendations.
The client companies themselves usually approach these as serious diligences, and usually have very specific items they want us to consider, almost always very much centered on things that matter to them and are germane to their decision.
The one exception is market share history and forecasts for the relevant vendors under consideration. For some reason, some companies (my probably not statistically defensible impression is that it is primarily European and Japanese companies) think that there is some magic implied by these numbers. As you can probably guess from this elaborate lead-in, I have a very different take on their utility.
As a follow-up to my blog post yesterday, there’s another area that’s worth noting in the resurgence of interest in BC preparedness, and that’s standards. For a long time, we’ve had a multitude of both industry and government standards on BCM management including Australian Standards BCP Guidelines, Singapore Standard for Business Continuity / Disaster Recovery Service Providers (which became much of the foundation for ISO 24762 IT Disaster Recovery), FFIEC BCP Handbook, NIST Contingency Planning Guide, NFPA 1600, BS 25999 (which will become much of the foundation for the soon to be released ISO 22301), ISO 27031, etc. There are also standards in other domains that touch on BC, security standards like ISO 27001/27002.
And when you come down to it, several of the broad risk management standards like ISO 31000 are applicable. At the end of the day, the same risk management disciplines underpin BC, DR, security and enterprise risk management. You conduct a BIA, risk assessment, then either accept, transfer or mitigate the risk, develop contingency plans, and make sure to keep the plans up to date and tested.
In my most recent research into various BCM software vendors and BC consultancies, as well as input from Forrester clients, BS 25999 seems to be the standard with the most interest and adoption. In the US at least, part of this I attribute to the fact that BS 25999 is now one of the recognized standards for US Department of Homeland Security’s Voluntary Private Sector Preparedness Accreditation and Certification Program. The other standards are NFPA 1600 and ASIS SPC.1-2009. I’ve heard very few Forrester clients mention the latter as their standard.
During the last 12 to 18 months, there have been a number of notable natural catastrophes and weather related events. Devastating earthquakes hit Haiti, Chile, China, New Zealand, and Japan. Monsoon floods killed thousands in Pakistan, and a series of floods forced the evacuation of thousands from Queensland. And of course, there was the completely unusual, when for example, ash from the erupting Eyjafjallajökull volcano in Iceland forced the shutdown of much of Western Europe’s airspace. These high profile events, together with greater awareness and increased regulation, have renewed interest in improving business continuity and disaster recovery preparedness. Last quarter, I published a report on this trend: Business Continuity And Disaster Recovery Are Top IT Priorities For 2010 And 2011.
Each year for the past three years I've analyzed and written on the state of enterprise disaster recovery preparedness. I've seen a definite improvement in overall DR preparedness during these past three years. Most enterprises do have some kind of recovery data center, enterprises often use an internal or colocated recovery data center to support advanced DR solutions such as replication and more "active-active" data center configurations and finally, the distance between data centers is increasing. As much as things have improved, there is still a lot more room for improvement not just in advanced technology adoption but also in DR process management. I typically find that very few enterprises are both technically sophisticated and good at managing DR as an on-going process.
When it comes to DR planning and process management, there are a number of standards including the British Standard for IT Service Continuity Management (BS 25777), other country standards and even industry specific standards. British Standards have a history of evolving into ISO standards and there has already been widespread acceptance of BS 25777 as well as BS 25999 (the business continuity version). No matter which standard you follow, I don’t think you can go drastically wrong. DR planning best practices have been well defined for years and there is a lot of commonality in these standards. They will all recommend:
If you still subscribe to fixed site recovery services using shared IT infrastructure from the likes of HP, IBM BCRS, or SunGard, among others, you will quickly become a dinosaur in the next 1 to 2 years.
These types of shared infrastructure services involve lengthy restores from tape and a recovery time objective of 72 hours, at best. Plus, you'll be lucky if you recover at all because chances are, you've had trouble scheduling a test with your service provider and it's been a LONG time since the last one, if indeed you’ve ever tested.
72 hours recovery just doesn't cut it anymore. And frankly, understanding your provider's oversubscription ratio to shared infrastructure to determine the risk of multiple invocations, or attempting to negotiate exclusions zones and availability guarantees is a time suck. Most companies are either taking DR back in-house or, if they still rely on a DR service provider, they are using dedicated infrastructure.