Each year for the past three years I've analyzed and written on the state of enterprise disaster recovery preparedness. I've seen a definite improvement in overall DR preparedness during these past three years. Most enterprises do have some kind of recovery data center, enterprises often use an internal or colocated recovery data center to support advanced DR solutions such as replication and more "active-active" data center configurations and finally, the distance between data centers is increasing. As much as things have improved, there is still a lot more room for improvement not just in advanced technology adoption but also in DR process management. I typically find that very few enterprises are both technically sophisticated and good at managing DR as an on-going process.
When it comes to DR planning and process management, there are a number of standards including the British Standard for IT Service Continuity Management (BS 25777), other country standards and even industry specific standards. British Standards have a history of evolving into ISO standards and there has already been widespread acceptance of BS 25777 as well as BS 25999 (the business continuity version). No matter which standard you follow, I don’t think you can go drastically wrong. DR planning best practices have been well defined for years and there is a lot of commonality in these standards. They will all recommend:
Two years ago, Forrester and the Disaster Recovery Journal partnered together to field surveys on a pair of pressing topics in Risk Management: Business Continuity (BC) and Disaster Recovery (DR). The surveys help highlight trends in the industry and to provide organizations with some statistical data for peer comparison. The partnership has been a huge success. In 2007, we examined the state of disaster recovery preparedness, in 2008, we examined the state of business continuity preparedness and this year, we examine the state of crisis communications and the interplay between enterprise risk management and business continuity.
We decided to focus on crisis communications because as last year’s study revealed, one of the lessons learned from organizations who had invoked a business continuity plan (BCP) was that they had greatly underestimated the importance and difficulty of communication and collaboration within and without the organization. In any situation, a natural disaster, a power outage, a security incident or even a corporate scandal, crisis communication is critical to responding quickly, managing the response and returning to normal operations.
Organizations approach crisis communication differently. In some organizations, crisis communications is a separate team that works together with BC/DR planning teams to embed communication strategies into BCPs/DRPs and in other companies, BC/DR planning teams do its best to address crisis communication.
Yesterday IBM announced the availability of their new IBM Information Archive Appliance. The appliance replaces IBM’s DR550. The new appliance has significantly increased scale and performance because it’s built on IBM’s Global Parallel File System (GPFS), more interfaces (NAS and an API to Tivoli Storage Manager) and accepts information from multiple sources – IBM content management and archiving software and eventually 3rd party software. Tivoli Storage Manager (TSM) is embedded in the appliance to provide automated tiered disk and tape storage as well as block-level deduplication. TSM’s block-level deduplication will reduce storage capacity requirements and its disk and tape management capabilities will let IT continue to leverage tape for long-term data retention. All these appliance subcomponents are transparent to the IT end user who manages the appliance – he or she just sees one console where they define collections and retention policies for those collections.
On a weekly basis, I get at least one inquiry request from either a vendor or an end-user company seeking industry averages for the cost of downtime. Vendors like to quote these statistics to grab your attention and to create a sense of urgency to buy their products or services. BC/DR planners and senior IT managers quote these statistics to create a sense of urgency with their own executives who are often loath to invest in BC/DR preparedness because they view it as a very expensive insurance policy.
BC/DR planners, senior IT managers and anyone else trying to build the business case for BC/DR should avoid the use of industry averages and other sensational statistics. While these statistics do grab attention, more often than not, they are misleading and inaccurate, and your executives will see through them. You'll hurt your business case in the end because you haven't done your homework and your execs will know it.
I saw a study recently that stated the cost of downtime for the insurance industry was $1,202,444 per hour. You might be tempted to grab this statistic and throw it into the next presentation to your C-level exec but what is this statistic really telling you? Do the demographics of the companies in the study match yours? Do you trust the accuracy of the data? Consider the following:
What is the definition of insurance industry in this case? Is it companies that focus solely on insurance or does it include companies that also provide financial advice and monetary instruments to their clients?
Storage-as-a-Service is relatively new. Today the main value proposition is as a cloud target for on-premise deployments of backup and archiving software. If you have a need to retain data for extended periods of time (1 year plus in most cases) tape is still the more cost effective option given it's low capital acquisition cost and removability. If you have long term data retention needs and you want to eliminate tape, that's where a cloud storage target comes in. Electronically vault that data to a storage-as-service provider who can store that data at cents per GB. You just can't beat the economies of scale these providers are able to achieve.
If you're a small business and you don't have the staff to implement and manage a backup solution or if you're an enterprise and you're looking for a PC backup or a remote office backup solution, I think it's worthwhile to compare the three year total cost of ownership of an on-premise solution versus backup-as-a-service.
In May, I blogged about NetApp's announced acquisition of deduplication pionneer, Data Domain. The announcement triggered an unsolicted counter-offer from EMC, followed by another counter from NetApp. But after a month of offers, counter-offers and regulatory reviews, EMC ultimately outbid NetApp with an all cash offer of $2.1 billion. I believe that Data Domain would have been a better fit in the current NetApp portfolio; it would have been easier for NetApp to reposition its current VTL as a better fit for large enterprises that still planned to leverage tape. It's also said that more than half of Data Domain's current employees are former NetApp employees so there would have been a clear cultural fit as well.
For $2.1 billion, EMC gets Data Domain's more than 3000 customers and 8000 installs but it also gets a product that in my opinion, overlaps with its current Quantum-based disk libraries, the DL1500 and DL3000. In Forrester inquiries and current consulting engagements, Data Domain is regularly up against the EMC DL1500 and DL3000. EMC will need to quickly explain to customers how it plans to position its new Data Domain offerings with its current DL family, both the Quantum- and Falconstor-based DLs as well as its broader data protection portoflio that includes Networker and Avamar - which also offer deduplication.
2009 was the year we focused on virtualization and consolidation of IT infrastructure to drive down costs. Virtualization and consolidation will remain top initiatives in the second half of 2009 as IT organizations strive to save more by expanding virtualization and driving up the ratio of virtual machine to physical server. But what’s next? For one, virtualization is changing IT management, processes, and roles but most organizations have yet to adapt. Second, a lot of initiatives were put on hold in 2009 to focus on projects that had an immediate return on investment. As a result, many organizations put off infrastructure upgrades, postponed ITIL process adoption, and stepped back from process automation. But in order to achieve the next level of IT operational efficiency we’ll need to reprioritize these initiatives. And by doing so, we’ll be in a better position to selectively leverage web, cloud, and outsourcing services to eliminate some costs completely.
If you want to learn more about these topics, please join my complimentary Webinar, "Transforming IT Infrastructure And Operations in 2010" on July 16th at 11AM EST. You can register for the session by visiting: www.forrester.com/ioassessmentwebinar.
Over the past 2 months, I've seen an increase in the number of end user inquiries regarding high availability and almost more importantly, how to measure high availability (HA). HA means something different depending on whom you're talking with so it's worth a quick definition. I define HA as:
Focused on the technology and processes to prevent application/service outages at the primary site or in a specific IT system domain.
This is in contrast to disaster recovery or IT service continuity (ITSC) which is about preventing or responding to outages of the entire site.
Why so many inquiries about HA recently? I believe that due to our increasing reliance on IT as well as the 24X7 operating environment that companies of all sizes and industries are becoming more and more sensitive to application and system downtime. The interest in measurement is driven by the need to continuously improve upon IT services and justify IT investments to senior management, especially now.
Despite the availability of multiple backup appliances supporting deduplication, Data Domain has continued to win customers at a steady pace. As of March 2009, the company had more than 2,900 customers and recruited hundreds of value added resellers. Its proven deduplication technology, integrated replication, and aggressive campaign to eliminate tape garnered it a tremendous amount of mind share and put it on most customers’ short lists. So it comes as no surprise that they were acquired by a major storage vendor.
That it was acquired by NetApp does come as a bit of surprise. NetApp does have its own successful VTL that supports deduplication. But then again, NetApp didn’t introduced deduplication in its VTL until the Fall of 2008 (the last of the major storage vendors to do so) and it typically sells its VTL into its own customer base. With Data Domain, NetApp now owns one of the toughest competitors in the backup appliance market and it gives the company a system that it (and the hundreds of NetApp channel partners around the globe) can sell into non-NetApp environments.
The US Center for Disease Control (CDC) has confirmed 64 cases of swine flu in the United States and as other countries including Canada (6), New Zealand (3), the United Kingdom (2), Israel (2), Spain (2), and now Germany have confirmed cases, the World Health Organization has raised the worldwide pandemic threat level to Phase 4. This means health officials have confirmed that the disease can spread person-to-person and has the potential to cause "community-level" outbreaks. The CDC recommends avoiding travel to Mexico and if you get sick, to stay home from work. Large numbers of employees out sick will impact the business (revenue) and cost your company a lot of money in productivity loss (you still pay employees their salary when they're out).
Stopping the spread of the disease and treating those infected is obviously a health issue, but the swine flu outbreak does have implications for IT professionals in both the short term and the long term. First, if you haven't done so already, you need find a copy of the bird flu business continuity plan (BCP) that your company developed in 2006 and call a walk through exercise immediately. And if your responsibility is IT disaster recovery and not necessarily business continuity, don't wait around for someone else to dust of the plan and call the exercise - this is too important to wait. Call your CIO, CISO, COO, and CEO and tell them it needs to be done now. There's a good chance that the plan is out of date and that it hasn't been exercised in a long time.