Cloud TCO

By Markus Klems

Intro

What is the Total Cost of Ownership (TCO) of running your business on an external cloud, say Amazon EC2+S3, in comparison to your own infrastructure? And: How do you approach such a calculation? First of all, this depends on the degree to which you are utilizing your existing infrastructure. If your data center is underutilized, you waste money: CapEx (depreciation of your IT hardware and wires) + OpEx (your IT staff, energy, etc). If, on the other hand, you do not leave free space, you cannot start new projects and everything will crash in case of an emergency.

Half a year ago I ran some calculations to investigate on this topic within the scope of a lecture at my University. My ideas how to make such a comparison are only a starting point and I am well aware that my approach is by far not scientific enough to be published (that’s why I did not publish it). However, here are my thoughts.

Methodology to Compare Infrastructure TCO with Cloud Costs

  1. You need a model to calculate the TCO of your infrastructure (or the part of your infrastructure that you are planning to “cloudsource”)
  2. How do you deal with under- and over-utilization? What are the risks and costs involved if something crashes (over-utilization)? What are the costs of running idle servers (under-utilization)?
  3. What would be a comparable amount of cloud resources (provided there is a utility computing model involved)? How do you compare resources?
  4. Now: calculate!

Does it Scale?

The critical point seems to be Nr. 2: How do you deal with under- and over-utilized resources? Paco Nathan gets to the point:

In terms of grids, the organization had been using AWS for some work, but had fallen into a common trap of thinking “Us (data center) versus Them (AWS)”, arguing about TCO, vendor lock-in, etc. Sysadmins calculated a ratio of approximately 3x cost at AWS – if you don’t take scalability into consideration. In other words, if a server needs to run more than 8 hours per day, it starts looking cheaper to run on your own premises than to run it on EC2.

I agree with the ratio; it’s strikingly similar to the 3x markup you find buying a bottle of good wine at any decent restaurant. However, scalability is a crucial matter. Scaling up as a business grows (or down, as it contracts) is the vital lesson of Internet-based system architecture. Also, the capacity to scale up failover resources rapidly in the event of an emergency (data center lands under an F4 tornado) is much more compelling than TCO.

This being said, have a look at Theo Schlossnagle’s graphic of Internet traffic spikes:

Is there more to say? Comments welcomed.

Leave a Reply