Over the past 2 months, I've seen an increase in the number of end user inquiries regarding high availability and almost more importantly, how to measure high availability (HA). HA means something different depending on whom you're talking with so it's worth a quick definition. I define HA as:
Focused on the technology and processes to prevent application/service outages at the primary site or in a specific IT system domain.
This is in contrast to disaster recovery or IT service continuity (ITSC) which is about preventing or responding to outages of the entire site.
Why so many inquiries about HA recently? I believe that due to our increasing reliance on IT as well as the 24X7 operating environment that companies of all sizes and industries are becoming more and more sensitive to application and system downtime. The interest in measurement is driven by the need to continuously improve upon IT services and justify IT investments to senior management, especially now.
Despite the availability of multiple backup appliances supporting deduplication, Data Domain has continued to win customers at a steady pace. As of March 2009, the company had more than 2,900 customers and recruited hundreds of value added resellers. Its proven deduplication technology, integrated replication, and aggressive campaign to eliminate tape garnered it a tremendous amount of mind share and put it on most customers’ short lists. So it comes as no surprise that they were acquired by a major storage vendor.
That it was acquired by NetApp does come as a bit of surprise. NetApp does have its own successful VTL that supports deduplication. But then again, NetApp didn’t introduced deduplication in its VTL until the Fall of 2008 (the last of the major storage vendors to do so) and it typically sells its VTL into its own customer base. With Data Domain, NetApp now owns one of the toughest competitors in the backup appliance market and it gives the company a system that it (and the hundreds of NetApp channel partners around the globe) can sell into non-NetApp environments.
The US Center for Disease Control (CDC) has confirmed 64 cases of swine flu in the United States and as other countries including Canada (6), New Zealand (3), the United Kingdom (2), Israel (2), Spain (2), and now Germany have confirmed cases, the World Health Organization has raised the worldwide pandemic threat level to Phase 4. This means health officials have confirmed that the disease can spread person-to-person and has the potential to cause "community-level" outbreaks. The CDC recommends avoiding travel to Mexico and if you get sick, to stay home from work. Large numbers of employees out sick will impact the business (revenue) and cost your company a lot of money in productivity loss (you still pay employees their salary when they're out).
Stopping the spread of the disease and treating those infected is obviously a health issue, but the swine flu outbreak does have implications for IT professionals in both the short term and the long term. First, if you haven't done so already, you need find a copy of the bird flu business continuity plan (BCP) that your company developed in 2006 and call a walk through exercise immediately. And if your responsibility is IT disaster recovery and not necessarily business continuity, don't wait around for someone else to dust of the plan and call the exercise - this is too important to wait. Call your CIO, CISO, COO, and CEO and tell them it needs to be done now. There's a good chance that the plan is out of date and that it hasn't been exercised in a long time.
We all know the appliance and VTL vendors offering dedupe, including COPAN Systems, Data Domain, EMC, Exagrid, FalconStor, HP, IBM (Diligent), NEC, NetApp, Quantum, Sepaton, Sun StorageTek, and others.
And there were existing backup software vendors, including EMC Avamar, Symantec NetBackup PureDisk, and many online backup software vendors, like Asigra. Now add CommVault Simpana 8.0 and IBM Tivoli Storage Manager (TSM) V6.
Friday, Iron Mountain and Microsoft announced a new partnership. Customers of Microsoft's backup offering, Data Protection Manager (DPM) 2007 service pack 1, can electronically vault redundant copies of their data to Iron Mountain's CloudRecovery service. This is welcomed news for DPM customers. Customers will continue to backup locally to disk for instant restore but rather than vault data to tape and physically transport tape to an offsite storage service provider, customers will vault data over the Internet to Iron Mountain. For disaster recovery purposes and long-term retention services, you need this redundant copy of your data offsite. By eliminating the physical tape transport you eliminate the risk of lost or stolen tapes or the need to deploy some kind of tape encryption solution. Microsoft DPM hasn't taken the backup world by storm since its introduction in 2005, but each subsequent release has added critical features and application support. Additionally, because it is often bundled in with Microsoft System Center, I expect adoption will increase among small and medium businesses (SMBs) and small and medium enterprises (SMEs).
In my coverage of business continuity and disaster recovery, I talk to both IT infrastructure and operations professionals as well as IT security professionals and I've found that the term "data protection" means something different to each. This comes as no surprise and I think for a long time it didn't really matter because IT operations and security professionals operated in independent silos. But as silos break down and "data protection" is a shared responsibility across the organization, it's important to be specific and to understand who is responsible for what.
But there are new options emerging from governance, risk, and compliance (GRC) vendors. For example, Archer Technologies has added a business continuity management module to its GRC SmartSuite Framework. I recently saw a demo of the offering and I found it to be intuitive and comprehensive. It's also closely aligned with the British Standard for Business Continuity Management, BS 25999. I also recently met with MetricStream, they have also added a BCM module to their GRC platform. Provided that you've already purchased the core GRC platform from one of these vendors, buying the BCM module is significantly less expensive that buying or subscribing to a tier 1 stand-alone BCM offering. Tier 1 offerings start at US$100K and average sales prices can be in the hundreds of thousands of dollars. The add-on modules to these GRC platforms will start between $30K-$50K.
If you still subscribe to fixed site recovery services using shared IT infrastructure from the likes of HP, IBM BCRS, or SunGard, among others, you will quickly become a dinosaur in the next 1 to 2 years.
These types of shared infrastructure services involve lengthy restores from tape and a recovery time objective of 72 hours, at best. Plus, you'll be lucky if you recover at all because chances are, you've had trouble scheduling a test with your service provider and it's been a LONG time since the last one, if indeed you’ve ever tested.
72 hours recovery just doesn't cut it anymore. And frankly, understanding your provider's oversubscription ratio to shared infrastructure to determine the risk of multiple invocations, or attempting to negotiate exclusions zones and availability guarantees is a time suck. Most companies are either taking DR back in-house or, if they still rely on a DR service provider, they are using dedicated infrastructure.