How Amazon Ruined My Christmas: A Lesson In Downtime

Here's how Amazon ruined* my Christmas: after devouring a lovely rib roast with a porcini-spinach stuffing (recipe here in case your stomach is now growling), we all curled up on the couch with hot cocoa, turned on Netflix streaming to watch classic Christmas movies (and past Doctor Who Christmas Specials)... only to get an error message. That's right, in case you missed it, Netflix was down on Christmas Eve and Christmas Day in North America for many users due to issues with Amazon's Elastic Load Balancing (ELB) service in the US East region. It's interesting to note, that this is at least the third time issues with the ELB service has caused problems for Netflix, with each time, the company making improvements to prevent this from happening again.

You might be thinking, "ruin" is a strong word to describe what happened to me (and many others) on Christmas Eve, but I use it to illustrate a point: even though this particular outage was probably not the most severe (in duration or number of customers impacted), it may well be the most costly for Netflix. Why? Because of TIMING. I've been saying for a while that timing and duration are more critical indicators of availability performance and impacts than looking at "nines" (99.99%, 99.999%, etc.). If this same outage had occurred just a day or two earlier, the impact would be significantly different. And unfortunately for Netflix, because of the timing, this is an outage that many customers will remember.

I write this not to be punitive towards Amazon or Netflix (or any of the other services that experienced downtime on the 24th/25th), but as a reminder/cautionary tale that:

  • Downtime will happen at the worst possible time. When designing continuity plans, it's prudent to hope for the best, but plan for the worst. Since the universe tends to be cruel and somewhat random, you may experience an outage at the worst possible time. Any calculations on the costs of downtime must account for this.
  • The cloud is not inherently resilient. Netflix is one of the most mature implementations of cloud resiliency that I have seen, and they still experience outages. You are responsible the for resiliency of the applications you deploy in the cloud, not your cloud provider. If you architect your applications to be able to withstand the loss of systems or sites (Netflix, for example, uses chaos monkeys and gorillas for this), you will be much more withstand failures from the cloud provider.
  • Don't take away my Doctor Who Christmas specials. Seriously, don't do it.

Happy Holidays Everyone!

*For those of you who missed my sarcastic note here, Netflix and Amazon did not in fact ruin my Christmas. Instead of watching movies, we did some old-fashioned things like reading, talking, playing board games.

Comments

I'm liking the way you put

I'm liking the way you put this, I did feel as if Netflix took a big chunk out of my holiday plans as well. I appreciate the spin you put on this as a reminder about downtime. With the new year right around the corner I hope it will be a nice reminder to think about your three bullet points (I mean two).

This company is bad news

I think your analysis is spot on. Timing is everything.

It was the last straw for me and I canceled our membership. In the hopes of one day getting a Netflix that employees & customers can actually like (like Apple) he's what they are doing wrong:

1) The product didn't work when I needed it most - Christmas Eve. Techie error message - no apology. Bad, bad, bad.
2) The CEO is an easy guy to hate. Between his practices on laying off employees quickly (fire don't fix), trying to say Facebook posts are public to all investors (get real), and doubling prices thinking no one will care (duh), I'm not a fan. At all. Yes Reed you. You're an ass. Not a good front man for the company. At all.
3) The Board just doubled his compensation for FY13. Toss the BOD as well. Tools.
4) Current titles I want are not stream-able yet on Netflix but are on Amazon. More than happy to PPV at Amazon. Who needs monthly fees for old content?

Rachel’s analysis definitely

Rachel’s analysis definitely rings true. Users - you and me included - expect constant availability, regardless of location or device, and nobody is happy when an app doesn’t deliver. But here’s the reality: we’ll continue to see outages until companies understand the scope of possible users at any moment is infinite. Planning for that scenario is the only way companies can really bypass downtime. While the timing of this Netflix outage was particularly unfortunate, there’s never a good time for an outage.

Paul Campaniello
www.ScaleBase.Com

Thank you for the awesum

Thank you for the awesum post.... it is full of informative facts!!

Nice Post!

This is a good one! I got to know a few things about amazon which I had no idea about.

Awesome post...!! thanks for

Awesome post...!! thanks for sharing this information with us.

Informative

Informative blog...!!
http://www.netdepot.com/

Interesting and true. Another

Interesting and true. Another aspect on this matter is the how information itself is provided. To have well structured, consistent and relevant information that is easy to find, search and filter once the current webpage or system has responded is also very important!