100% Uptime, Always-On, Always-Available Services, And Other Tall Tales

We live in a time when customers expect services to be delivered non-stop, without interruption, 24x7x365. Need proof? Just look at the outrage this week stemming from RIM's 3+ day BlackBerry service/outage impairment. Yes, this was an unusually long and widespread disruption, but it seems like every week there is a new example of a service disruption whipping social networks and blogs into a frenzy, whether it's Bank of America, Target, or Amazon. I'm not criticizing those who use social media outlets to voice their dissatisfaction over service levels (I've even taken part in it, complaining on Twitter about Netflix streaming being down on a Friday night when I wanted to stream a movie), but pointing out that now more than ever infrastructure and operations professionals need to rethink how they deliver services to both their internal and external customers.

Is 100% uptime achievable? No, but there are a lot of advanced technologies that can help you get pretty darn close (I outline some of them in my TechRadar™ report on IT service continuity technologies). In the end, however, technologies alone aren't going to get you to the always-on, always-available dream. One of the most important things a company can do when trying to improve service availability is to start thinking like end customers and measure service performance from their perspective. What do I mean? For example, what is the difference between:

  • An outage of a critical service from 8 AM to 4 PM on the last Friday of the quarter
  • Biweekly outages of the same service for 20 min every other Saturday at 4 AM local time

The difference in business impact and customer perception is enormous, BUT they both would represent approximately 99.9% availability, or 8.76 hours of downtime a year. Oftentimes measuring pure uptime in terms of "nines" is misleading, which is why I recommend that organizations look at timing and duration of outages in addition to pure uptime. You may not be able to achieve always-on, always-available services, but you can at least strive to make services available when your customers most need them.

Of course, that's just one small step companies can take in order to get closer to achieving higher levels of service availability, but I'll be discussing this topic in more depth at our upcoming Infrastructure & Operations Forum in Miami next month (also, if you are interested in metrics, I'm running a preconference workshop on metrics for infrastructure and operations departments), and I'll be publishing more research on it in the near future.

I also want to hear from you: Do you feel this pressure to deliver always-on, always-available services? If so, which ones? How do you ensure that your most-critical services remain always on and always available?