Cloud Inefficiency - Bad Habits Are Hard To Break

We all have habits we would like to (and should) break such as leaving the lights on in rooms we are no longer in and good habits we want to encourage such as recycling plastic bottles and driving our cars more efficiently. We often don't because habits are hard to change and often the impact isn't immediate or all that meaningful to us. The same has long been true in IT. But keep up these bad habits in the cloud, and it will cost you - sometimes a lot.

As developers, we often ask for more resources from the infrastructure & operations (I&O) teams than we really need so we don't have to go back later and ask for more - too painful and time consuming. We also often don't know how many resources our code might need, so we might as well take as much as we can get. But do we ever give it back when we learn it is more than we need? 

On the other hand, I&O often isn't any better. The first rule we learned about capacity planning was that it's more expensive to underestimate resource needs and be wrong than to overestimate, and we always seem to consume more resources eventually. 

Well, infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) clouds change this equation dramatically, and you can reap big rewards if you change with them. For example, sure, you can ask for as many resources as you want - there's no pain associated with getting them and no pain to ask for more, either. But once you have them and figure how much you really need, it heavily behooves you to give back what you aren't using. Because you are paying for what you allocated whether you are really using it or not. 

Cloudyn, a SaaS-based cloud cost management company, knows how much this is costing its enterprise clients, as it uses monitoring capabilities to map the difference between what its clients are paying for and what they are really using. It recently shared with Forrester the latest findings in its CloudynDex metrics report that aggregates cloud use and cost data from more than 100 of its clients running on Amazon Web Services' (AWS) IaaS cloud (collected randomly and anonymously with its clients' permission). The data is clear proof that we are bringing these bad habits to the cloud. These clients are spending between $12,000 to $2.5 million per year with AWS and throwing away about 40% of that expense. What kind of waste are they incurring? 

  • Overallocation of resources. Cloudyn found that the degree of sustained utilization across the 400,000-plus instances being monitored was just 17%. The company said the common issue was allocating Large or Extra Large instances when a Small or Medium would suffice. This one's easy to find (especially with a cost analysis tool like Cloudyn) and easy to fix. Not surprisingly, Cloudyn also found that the larger the instance, the worse the utilization, with Extra Large instances averaging just 4% utilization. That's worse than the average utilization of physical servers in 2001 - before virtualization. For shame!
  • Static workloads. Cloudyn also found that many client instances were forgotten and left running but not doing anything for days, even months, at a time. Cloud vendors will certainly be happy to take your money for this. But, really. Is it really that hard to shut down an instance and restart it when needed?
  • Not using Reserved Instances. The statistics also showed the average client had a persistent use of cloud instances that would have benefited from the discounts that come with AWS Reserved Instances but that clients weren't taking advantage of these discounts, which can amount to up to 70% lower bills. This one takes longer to assess, but once you know you will be staying in the cloud for a year or more, there's no excuse not to take advantage of this. Customers using Cloudyn or similar cost-tracking tools that continuously track resource activity are quickly getting wise to this benefit. Cloudyn's data shows a big increase in adoption of Reserved Instances from January to May of 2012.
Forrester has found a number of other bad habits from cloud users, some of which were noted in our latest Forrester Cloud Playbook report "Drive Savings And Profits With Cloud Economics," such as not configuring load balancing/auto-scaling properly to turn off instances fast enough as demand declined, not leveraging caching enough between application layers or at the edge, and not optimizing packet flows from the cloud back to your data center.
 
It's understandable why we bring our bad habits with us to the cloud. Heck, simply by using the cloud, we're saving the company money. But don't let the optics of the cloud pull blinders over the real costs. A medium instance at $0.32 per hour sounds so cheap, but when daily consumption leads to $130,000 in annual spend, which was the average for this group of customers, then 40% savings is very, very significant. 
 
How are you using cloud platforms? With what apps, dev tools, software? Participate in Forrester's latest survey on this topic today. 
 
And for more on the true costs of using the cloud, read Andrew Reichman's report on cloud storage and set a Forrester.com research alert on analyst Dave Bartoletti, whose next report will provide a full breakdown of the cost of the cloud versus in-house deployments. You can learn more about your use of cloud by subscribing to Cloudyn - your cloud consumption data will be automatically correlated with others in the CloudynDex report to help you continually get a clearer picture of real cloud use. We all benefit by learning from others. 

Comments

AWS performance may be part of the reason

Great post -- could not agree more with the overall point. When it comes to the Cloudyn survey, one reason many people use much larger instances than they need on AWS is to get better I/O performance, and to avoid "noisy neighbors" by allocating an instance that fills an entire physical server. Adrian Cockroft's blog last year had great details on this strategy: http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html So perhaps the use of far larger instances than required is deliberate.

Secondly, Bluelock did some work last year that showed that elastic capacity was a better choice for unpredictable workloads, even if they were relatively static in nature, because of the tendency to over-estimate the resources required (as you point out). One of the advantages of a true hybrid cloud is that if you find something is relatively static, you can simply bring it back to your own data center / private cloud and run it there for less. But if your app is bound to a specific cloud, you don't have that option.

Great observations

Excellent, excellent points and observations.

AWS performance and other considerations

The I/O observation is very true. There may be other reasons:
Some applications (Mongo DB is one of the most notorious) perform much better on high-RAM instances, so you have no choice but to use highest available instances.
A slave DB must be of the same type as its master to be able to take over, even when it does nothing most of the time.

But putting these use cases aside, a huge percentage of instances is still over-provisioned. We see cases where for Hadoop processing people launch extra-large servers using only a fraction of the available I/O. We see high-compute servers just sitting there doing nothing. We saw an RDS Extra Large DB Instance doing an I/O of ~10 kb/hour for about 2 months ..

Obviously in many cases you should keep your selected instance sizes. But you should not take the size for granted. And that's what Cloudyn is about: it points you to the possible waste. You know best whether the recommendations make sense or no, but it's worth checking. The saving may be quite rewarding.

Amazon AWS will not be happy to take your "underutilized" money

James S - Great interesting findings. I invite you to check out more on out site at http://www.newvem.com/cloud-radar

You wrote:

"Cloud vendors will certainly be happy to take your money for this. But, really. Is it really that hard to shut down an instance and restart it when needed? "

I am not sure that I agree with you on that. I believe that Amazon doesn't want its cloud customers to feel like it is taking advantage of that "cloud fog" in order to extend its business. Check the followings:

1 - Their "Monitor Estimated Charges Using Billing Alerts" feature - http://aws.typepad.com/aws/2012/05/monitor-estimated-costs-using- - amazon-cloudwatch-billing-metrics-and-alarms.html

2 - Trusted adviser tools
http://aws.amazon.com/about-aws/whats-new/2012/01/30/amazon-web-services...

These prove Amazon efforts to provide analytic capabilities. It is only a matter of priorities - it doesn't matter if you are a small startup or a giant company you still have priorities. AWS cloud develops very fast and no doubt that its first priority is to supply the demand for newcomers with regards to compute and storage resources and only than comes all other "important feature requests".

These type of uncontrolled costs put cloud adoption at risk as companies go to the cloud to eliminate IT costs. Once they get their cloud monthly bill, some of them decide to leave AWS cloud. Companies like Cloudyn and Newvem help AWS customers to adopt the cloud right and take one control step at a time. I invite you to read my CloudAve.com post about what's Cloud Management including some more stuff I think is important for this discussion - http://www.cloudave.com/19762/whats-cloud-management/

Ofir.
@iamondemand