My major client over the past year came to me with a storage problem. They had about 1 petabyte of image data to store, and were in the process of building a distributed object store themselves to run on their own hardware. The company was in a management transition and believed that an outside perspective was required to look at alternatives to building the storage solution in house. The nature of the datastore access is heavy read access against millions of raw image data files and heavy writes to tokyo cabinet datastore for web image storage. The tokyo cabinets across some 400 “surveys” amounted to approximately 3 billion objects.We looked into 3 alternatives
- Build in house object store and run on existing and newly purchased hardware
- Setup a glusterFS cluster on existing and newly purchased hardware
- Evaluate and design to use a cloud provider for storage, processing and webserving
Without going into the full project details, it was decided that a cloud provider, namely AWS using EC2 (elastic compute cloud) for processing and S3 (simple storage system) for storage, would fulfill the technical requirements, be cost competitive, facilitate business agility and allow for global expansion for the company. It’s important to note that because of the nature of the business operation, and the key unique selling points of their product, it was necessary to move all the infrastructure into the cloud in order for it to be a viable solution, and so below I go over some of the costing estimates we made from that.
What’s important to note in this is that we did extensive modeling of costs across the three alternatives. The graphs below indicate some of the aggregate numbers and cost breakdowns.
Figure 1 shows the aggregate costs comparing the web serving and processing infrastructure
Figure 2 shows cost comparison of the web component infrastructure
Figure 3 shows cost comparison of the cluster image processing infrastructure
Figure 4 shows cost breakdown into different cost sections for web infrastructure.
Figure 5 shows cost breakdown of the web component on AWS
Figure 6 shows colocation and owned hardware breakdown for running cluster processing infrastructure.
Figure 7 shows the cost breakdown of the cluster compute component infrastructure on AWS.
In our cost analysis, in order to be conservative, we tried to take a worst case view of AWS and a best case view of colocation and owned hardware. There were many assumptions involved and costs were based on 2011 prices of equipment and AWS. We assumed a 10% cost reduction in both AWS prices and hardware prices. This might be a diminishing reduction over time but for the purposes of our analysis, 10% straight line sufficed.
While AWS was cost competitive, the real advantage was a strategic business one centered on flexibility and agility down the track.