We regularly come across companies where the IT team assesses the cloud providers and come to the conclusion that owning your own hardware is cheaper than AWS. It’s possible that this is true, but only very rarely, and in our opinion only in the instance that a company is running at facebook or google level scale already.
Let’s look at this post from Deep Value stating that EC2 is 380% more expensive than internal cluster
In this blog post Paul Haefele, the Managing Director for Technology tries to argue very scientifically and calculates the cost for his own cluster vs. the same cluster in AWS and his results are stunning, EC2 is 380% more expensive!
380% more expensive is so high that with our experience we certainly wonder about the accuracy. We have seen from experience again and again that cloud infrastructure usually saves significant money for technology companies of many shapes and sizes. So how can an experienced technology professional arrive at such a disheartening comparison, and then why are so many companies making the move? Let’s look at some factors in detail.
Paul starts with the comparison of the cost for a 3TB hard disc with the cost for the same amount of storage in AWS S3 and of course he finds out that AWS is more expensive. Besides the advantage that you would only pay for the amount of storage you are really using he seems to forget that S3 is not just a hard disk, it’s a scalable storage solution with a durability of eleven 9s! This means if he wants to store more than the 3TB he can’t just add another hard disk and that’s it. He will need to install a distributed file system, additional servers, gather the expertise to build such a system and ensure it’s uptime and durability SLA. In fact, S3 can *only* achieve eleven 9s durability by spreading objects over more than one data centre.
To be fair, he goes on to describe how he builds a Hadoop cluster with 20 servers and 8 hard disk each, but still he doesn’t include the costs for putting this cluster together. Nor does he consider costs to maintain the cluster (e.g. hard disks have an average failure rate of 4-6% and servers have an average failure rate of 2-4%.) With 20 servers and and 160 hard disks this makes in average one incident every 3rd week! Having set up several Hadoop clusters ourselves from scratch, if we were doing it again, would definitely choose to use AWS elastic map reduce (EMR) and S3 for the solution, both from a cost perspective and an ease of management one.
One point Paul neglected completely is the durability of S3. As mentioned above, S3 offers eleven 9s of durability. The 20 servers Paul puts together are in one single Data Center. We know that in order to get to more than four 9s of durability we need to run our servers in 2 separated locations. Here we have to speculate, but does Paul and his company need a higher durability or are they fine to have an outage if their DC is down? In case they need higher durability we obviously would need to increase his costs, as he would need to run the same infrastructure in a 2nd DC. The other option would be to use S3 with reduced redundancy storage, which is 20% cheaper.
This is the biggest issue in Paul’s calculation. He assumes his cluster of 20 servers runs at 100% utilization. For example with regards to storage, he’s assuming that his 160TB of usable storage would be 100% full all the time. This just is not feasible in any way. In S3 if you add 1 TB, you pay for just that, 1TB. Also your data storage may fluctuate over time and you’re only paying for GB/hrs in S3. I would very much imagine that if Deep Value is processing stock trades, then once the raw data is analysed, it can be deleted or “archived”. He can’t do that nearly as efficiently in his Hadoop cluster. Basically any spare storage space in the physical Hadoop cluster would not be paid for in S3.
For the utilization of compute power he gives us a bit more information, though it’s still a lot of speculation. He speaks about cluster jobs that “ typically take 1-2 hours to run”. The beauty of EC2 is that we pay for compute power at an hourly rate and thus we can scale the resources up and down based on load. There is a question in our minds as to how often these 1-2 hour runs are done. Even if Deep Value is running these tasks very regularly, I simply don’t believe that their application is going to be running jobs 24 hours a day, 365 days a year. We constantly hear customers saying “we are different, our load is pretty flat, cloud doesn’t make sense for us”. When we look under the covers, it isn’t. Having done some work for an algorithmic trading house, I know that they do perform their historical experiments on an ad-hoc basis. The load is quite varied, and I’d be surprised if Deep Value are using their cluster at more than a 30% utilisation.
It’s also important to note that Deep Value isn’t a static operation. What happens when the company needs more compute power. It can hardly just add one server at a time. This is just not efficient from a technical standpoint or an operational one. Generally they’ll be up for buying chunks of servers in blocks of 20 or more. Also, I question why Deep Value are running clusters of 900 instances but only buying a cluster equivalent to 115 extra large instances. Again from experience with algorithmic trading applications, analysing data faster means the staff looking at the results can operate more efficiently, inaccuracies are picked up quicker. Furthermore, there will be a time when Deep Value decides to do more or fewer experiments across time. EC2 provides flexibility for this. A fixed Hadoop cluster will not.
It’s difficult from the outside to assess the nature of Paul’s application, and so in this analysis we are making some fairly heavy handed assumptions, but feel comfortable doing so having worked with such similar clients in the past, and at the very least we can point out some clear inaccuracies.
- Reduced durability is enough (otherwise the costs of AWS are easily cheaper than running a fully mirrored DC).
- Utilisation for storage is at 60%. This is usually highest amount of storage utilisation in the industry, and running at a greater utilisation should trigger provisioning more equipment.
- Utilisation for compute power (EC2) is 8h per business day (This is a big assumption, but drawn from our own experience a very generous utilisation estimate).
- $1,000 per week for both, human resources and actual hardware maintenance. This includes all extra time spent looking at monitoring, dashboards, managing employees to change hardware components, etc. which wouldn’t be necessary in the cloud.
Build your own
(included in storage costs)
|Maintenance and fixing of HW failures||
This is without further optimizations such as the use of spot instances. Also it does not take into account the amount of work needed to setup this cluster including sourcing and purchasing decisions, management involvement, and physical installation. It doesn’t account for the spike in investment when another 20 (or more) servers need to be purchased. In Paul’s architecture, there is no physical DR outside of a single DC. There is most definitely an added cost to achieve the level of fault tolerance and a risk with not providing it. This is achievable in an AWS installation out of the box, from both a storage and compute point of view.
We had to make a few assumptions and therefore we can’t be 100% sure about the real numbers, but the fact is that building this cluster and installing Hadoop on it was quite some effort for Paul and his team. This time would have been better spent focusing on Deep Value’s core competencies in analysing stock data, while leveraging AWS’s world leading competency running warehouse scale compute infrastructure.
There’s the time Paul has to spend as a Managing Director at this company reviewing what his team does, whether they monitor the storage utilisation and purchase and provision new hardware in case they hit capacity. Getting quotes from different companies, buying the hardware and then installing it all takes time. Management time is hard to measure, but something which trickles along and takes the focus away from the tasks a manager should be focusing on. Deep Value makes software that analyses stock trading data and executes trades better than anyone else can do it.
For us this looks like a great application for AWS and a way to save money and time for Deep Value. Perhaps most importantly, using AWS will reduce the complexity of Deep Value’s IT operation.