I've been tackling an interesting challenge recently: how to define a mature business technology resiliency (aka disaster recovery) program. It's something I've been thinking about for years, but it was only a few months ago that I sat down to develop a concrete framework that enterprises could use to compare themselves to. Yes, I know there are existing frameworks for defining what maturity is for a business technology resiliency program, but in my model, I was trying to accomplish the following:
Simplicity. Without going overboard, I wanted to put together a model that could be completed within a few hours, rather than something that would take weeks to complete. The tradeoff, of course, is that this model is much less detailed than others. However, with many conflicting priorities, I know that many IT leaders can't take the time to fill out an assessment the length of the last installment of Harry Potter.
Objectivity. One of the benefits I have at Forrester is the ability to address this from a vendor-neutral perspective. I have no ulterior motives with this model and no vendor allegiances that could influence the outcomes.
Process-orientation. I strongly believe that a mature business technology resiliency program is built on a bedrock of repeatable, standardized, and streamlined processes. In the model, you will see there is a section on technology maturity, but the emphasis overall is on the process components.
At the recent Disaster Recovery Journal Fall World conference, I gave a presentation of the state of BC readiness. I had some great discussions with the audience (especially about where BC should report), but one of the statistics that really stood out for me and I made it a point to emphasize with the audience, is the state of partner BC readiness.
According to the joint Forrester/Disaster Recovery Journal survey on BC readiness, 51% of BC influencers and decision-makers report that they do not assess the readiness of their partners. If this doesn’t shock you, it should. Forrester estimates that the typical large enterprise has hundreds of third-party relationships – everyone from supply chain partners to business process outsourcers, IT service providers and of course cloud providers. As our reliance on these partners increases so does our risk – if they’re down, it greatly affects your organization’s business performance. And with the increasing availability of cloud services, the number of third parties your organization works with only increases, because now, business owners can quickly adopt a cloud service to meet a business need without the approval of the CIO or CISO and sometimes without the approval of any kind of central procurement organization.
Even among those organizations that do assess partner BC readiness, their efforts are superficial. Only 17% include partners in their own tests and only 10% conduct tests specifically of their critical partners.
It should come as no surprise that websites thrive on traffic. So naturally, it follows that driving traffic to your site is a strong motivation for any company looking to grow their web presence. However ironically, driving traffic to your site can also be a double-edged sword if your infrastructure is not properly prepared to handle the load. This means that, strangely, popularity can actually become a potential cause of an outage.
Yesterday, popular Internet forum and message board Reddit discovered this firsthand.In an interesting campaign move, President Barack Obama graced the site with his presence by doing an “Ask Me Anything” (AMA) thread, a message thread in which commenters submit questions and the original poster responds. Word about this rare opportunity to send the President of the United States a direct message spread across social media like a wildfire, leading to a massive spike in traffic that ultimately brought down Reddit a mere few minutes into the life of the thread. Current figures show that their number of unique connections and pageviews both more than tripled compared to their typical traffic. Eventually the site came back online and the AMA progressed as usual.
During the past three years, you may have noticed that security and risk professionals have added a new term to their lexicon – business resiliency. Is this just an attempt by vendors to rebrand business continuity (BC) and IT disaster recovery (DR) in much the same way that vendors rebranded information security as cybersecurity to make it seem sexier and to sell more of their existing products? Some of it certainly is rebranding. However, like the shift in the threat landscape from lone hackers to well-funded crime syndicates and state sponsored agents that precipitated the use of the term cybersecurity, a real shift has also taken place in BC/DR.
If you look up the term “resiliency” in the dictionary, it’s defined as “an occurrence of rebounding or springing back”. Thus, business resiliency refers to the ability of a business to spring back from a disruption to its operations. Historically, BC/DR focused on the ability of the business to recover from a disruption. Recovery implies that there was in fact a disruption, that for some period of time, business operations were unavailable, there was downtime as the business strove to recover. Resiliency, on the other hand, implies that an event may have affected the business’ operations, perhaps the business operated in a diminished state for some period of time, but operations were never completely unavailable, the business was never down.
The current state of business continuity management (BCM) standards? Abysmal. According to a joint Forrester/DRJ study, 69% of respondents said that British Standard (BS) 25999 did not influence or only somewhat influenced BCM at their company. It’s not much better for NFPA 1600, 70% of respondents said that it did not, or only somewhat, influenced BCM at their company. I find this shocking. BS 25999 is one of the most widely recognized standards for BCM worldwide and NFPA 1600 has been popular in the US for years. In addition, the U.S Department of Homeland Security’s Private Sector Preparedness Program (PS‑Prep) recognizes both of these standards for assessing preparedness. If you’re wondering what standards respondents named in the “Other” category, it was mostly the Federal Financial Institutions Examination Council (FFIEC) and NIST. Not surprising but also a little disheartening, it’s clear that unless compelled to do so, most BC professional would not adopt or follow a BCM standard.
Even if you don’t intend to certify to these standards, they should strongly influence your BCM program. Why? It’s because:
They provide a foundation and a common vocabulary for BCM best practices and processes. This is important if you need to implement BCM across a geographically dispersed enterprise or you have to work with a multitude of global partners on joint preparedness.
In a recent Forrester/DRJ joint survey on BC preparedness, of organizations that have invoked a BC plan in the last five years, 37% said that their BC plans had not adequately addressed communication. In my experience, I’ve found that many organizations:
Don’t appreciate the importance of effective communication. Many organizations focus the content of their BC plans and the goals of their BC exercises on the details of recovery procedures but don’t focus on how they will contact and coordinate response teams, employees, partners, first responders and customers. If you can’t communicate, you can’t respond to anything.
Rely on manual procedures like call lists or email alone. By themselves, manual procedures are unreliable, they don’t scale for organizations with thousands of employees (or citizens) and they don’t provide any kind of reporting.
Underestimate the difficulty of communicating effectively under stress. During the incident is not the time to attempt to craft effective communication messages or look for a secondary mode of communication because your first mode of communication (land lines and email) is no longer available.
There has been a lot of buzz around using the cloud for disaster recovery lately, and with good reason -- it's a new and compelling approach to fast recovery. However, along with any hype comes a certain amount of confusion, so I set out to get some clarity on what cloud-based disaster recovery really is. The core feature of any cloud-based recovery is that ability to actually recover at the providers' location using their cloud assets. Just copying data there is not true recovery. I also realized that the term "cloud-based disaster recovery" was too broad, and that actually solutions fall into one of three categories:
Do-it-yourself (DIY): Using the public cloud to architect a custom failover solution leveraging the agility and speed of the cloud.
DR-as-a-service (DRaaS): Prepackaged services that provide a standard DR failover to a cloud environment that you can buy on a pay-per-use basis with varying rates based upon your recovery point objective (RPO) and recovery time objective (RTO). Data is either sent using backups or replication.
Cloud-to-cloud disaster recovery (C2C DR): The ability to failover infrastructure from one cloud data center to another, either within a single vendor's environment or across multiple vendors.
Right now, the internet probably seems like the Wild West. Hackers are roaming around, seemingly attacking websites on a whim. Most recently, groups like Anonymous, the Jester, and Lulz Security (LulzSec – now supposedly disbanded) have been attacking and successfully taking down web sites of all types. Government and corporate, public and private, anybody seems as though they can be a target for these attacks. While their reasons for attacking a site range from political statement to simply for the fun of it, hacktivists and black hat trouble makers alike, the end result is that hacking is now a real cause of downtime.
Disaster recovery-as-a-service (DRaaS), in my opinion, is one of the most exciting areas I look at. To me, using the cloud for disaster recovery (DR) purposes makes perfect sense: the cloud is an on-demand resource that you pay for as you need it (i.e., during a disaster or testing). Up until now, there haven't been many solutions out there that truly offered DRaaS--replicating physical or virtual servers to the cloud and the ability to failover production to the cloud provider's environment (you can read more about my definition of DRaaS in my recent TechRadar report), but so far today, we've seen TWO new DRaaS platforms announced from VMware and SunGard! Here's a quick roundup of what was announced today:
VMware. VMware announced at VMworld that they will be making their popular Site Recovery Manager (SRM), a DR automation tool, available as a service through hosting and cloud partners. At launch, participating partners are FusionStorm, Hosting.com, iland, and Veristor. Benefits: Built into the VMware platform. Limitations: VMware specific.