Category Archives: amazon web services

Deep Value claims AWS is 380% more expensive – Can this be true?

We regularly come across companies where the IT team assesses the cloud providers and come to the conclusion that owning your own hardware is cheaper than AWS. It’s possible that this is true, but only very rarely, and in our opinion only in the instance that a company is running at facebook or google level scale already.

Let’s look at this post from Deep Value stating that  EC2 is 380% more expensive than internal cluster

In this blog post Paul Haefele, the Managing Director for Technology tries to argue very scientifically and calculates the cost for his own cluster vs. the same cluster in AWS and his results are stunning, EC2 is 380% more expensive!

380% more expensive is so high that with our experience we certainly wonder about the accuracy. We have seen from experience again and again that cloud infrastructure usually saves significant money for technology companies of many shapes and sizes. So how can an experienced technology professional arrive at such a disheartening comparison, and then why are so many companies making the move? Let’s look at some factors in detail.

Storage costs


Paul starts with the comparison of the cost for a 3TB hard disc with the cost for the same amount of storage in AWS S3 and of course he finds out that AWS is more expensive. Besides the advantage that you would only pay for the amount of storage you are really using he seems to forget that S3 is not just a hard disk, it’s a scalable storage solution with a durability of eleven 9s! This means if he wants to store more than the 3TB he can’t just add another hard disk and that’s it. He will need to install a distributed file system, additional servers, gather the expertise to build such a system and ensure it’s uptime and durability SLA. In fact, S3 can *only* achieve eleven 9s durability by spreading objects over more than one data centre.

To be fair, he goes on to describe how he builds a Hadoop cluster with 20 servers and 8 hard disk each, but still he doesn’t include the costs for putting this cluster together. Nor does he consider costs to maintain the cluster (e.g. hard disks have an average failure rate of 4-6% and servers have an average failure rate of 2-4%.) With 20 servers and and 160 hard disks this makes in average one incident every 3rd week! Having set up several Hadoop clusters ourselves from scratch, if we were doing it again,  would definitely choose to use AWS elastic map reduce (EMR) and S3 for the solution, both from a cost perspective and an ease of management one.

Durability

One point Paul neglected completely is the durability of S3. As mentioned above, S3 offers eleven 9s of durability. The 20 servers Paul puts together are in one single Data Center. We know that in order to get to more than four 9s of durability we need to run our servers in 2 separated locations. Here we have to speculate, but does Paul and his company need a higher durability or are they fine to have an outage if their DC is down? In case they need higher durability we obviously would need to increase his costs, as he would need to run the same infrastructure in a 2nd DC. The other option would be to use S3 with reduced redundancy storage, which is 20% cheaper.

Utilisation

This is the biggest issue in Paul’s calculation. He assumes his cluster of 20 servers runs at 100% utilization.  For example with regards to storage, he’s assuming that his 160TB of usable storage would be 100% full all the time. This just is not feasible in any way. In S3 if you add 1 TB, you pay for just that, 1TB. Also your data storage may fluctuate over time and you’re only paying for GB/hrs in S3. I would very much imagine that if Deep Value is processing stock trades, then once the raw data is analysed, it can be deleted or “archived”. He can’t do that nearly as efficiently in his Hadoop cluster. Basically any spare storage space in the physical Hadoop cluster would not be paid for in S3.

For the utilization of compute power he gives us a bit more information, though it’s still a lot of speculation. He speaks about cluster jobs  that “ typically take 1-2 hours to run”. The beauty of EC2 is that we pay for compute power at an hourly rate and thus we can scale the resources up and down based on load. There is a question in our minds as to how often these 1-2 hour runs are done. Even if Deep Value is running these tasks very regularly, I simply don’t believe that their application is going to be running jobs 24 hours a day, 365 days a year. We constantly hear customers saying “we are different, our load is pretty flat, cloud doesn’t make sense for us”. When we look under the covers, it isn’t. Having done some work for an algorithmic trading house, I know that they do perform their historical experiments on an ad-hoc basis. The load is quite varied, and I’d be surprised if Deep Value are using their cluster at more than a 30% utilisation.

Elasticity.

It’s also important to note that Deep Value isn’t a static operation. What happens when the company needs more compute power. It can hardly just add one server at a time. This is just not efficient from a technical standpoint or an operational one. Generally they’ll be up for buying chunks of servers in blocks of 20 or more. Also, I question why Deep Value are running clusters of 900 instances but only buying a cluster equivalent to 115 extra large instances. Again from experience with algorithmic trading applications, analysing data faster means the staff looking at the results can operate more efficiently, inaccuracies are picked up quicker. Furthermore, there will be a time when Deep Value decides to do more or fewer experiments across time. EC2 provides flexibility for this. A fixed Hadoop cluster will not.

Recalculation

It’s difficult from the outside to assess the nature of Paul’s application, and so in this analysis we are making some fairly heavy handed assumptions, but feel comfortable doing so having worked with such similar clients in the past, and at the very least we can point out some clear inaccuracies.

Assumptions:

  • Reduced durability is enough (otherwise the costs of AWS are easily cheaper than running a fully mirrored DC).
  • Utilisation for storage is at 60%. This is usually highest amount of storage utilisation in the industry, and running at a greater utilisation should trigger provisioning more equipment.
  • Utilisation for compute power (EC2) is 8h per business day (This is a big assumption, but drawn from our own experience a very generous utilisation estimate).
  • $1,000 per week for both, human resources and actual hardware maintenance. This includes all extra time spent looking at monitoring, dashboards, managing employees to change hardware components, etc. which wouldn’t be necessary in the cloud.

Build your own

AWS

Storage

$12,700.00

$5,926.91

Compute Costs

(included in storage costs)

$11,229.61

Maintenance and fixing of HW failures

$4,345.00

Sum

$17,045.00

$17,156.52

This is without further optimizations such as the use of spot instances. Also it does not take into account the amount of work needed to setup this cluster including sourcing and purchasing decisions, management involvement, and physical installation. It doesn’t account for the spike in investment when another 20 (or more) servers need to be purchased. In Paul’s architecture, there is no physical DR outside of a single DC. There is most definitely an added cost to achieve the level of fault tolerance and a risk with not providing it. This is achievable in an AWS installation out of the box, from both a storage and compute point of view.

Conclusion

We had to make a few assumptions and therefore we can’t be 100% sure about the real numbers, but the fact is that building this cluster and installing Hadoop on it was quite some effort for Paul and his team. This time would have been better spent focusing on Deep Value’s core competencies in analysing stock data, while leveraging AWS’s world leading competency running warehouse scale compute infrastructure.

There’s the time Paul has to spend as a Managing Director at this company reviewing what his team does, whether they monitor the storage utilisation and purchase and provision new hardware in case they hit capacity.  Getting quotes from different companies, buying the hardware and then installing it all takes time. Management time is hard to measure, but something which trickles along and takes the focus away from the tasks a manager should be focusing on. Deep Value makes software that analyses stock trading data and executes trades better than anyone else can do it.

For us this looks like a great application for AWS and a way to save money and time for Deep Value. Perhaps most importantly, using AWS will reduce the complexity of Deep Value’s IT operation.

Leave a Comment

Filed under amazon web services, cloud computing, technology

Project payback

Possibly the most important question for every CIO or technology leader in considering a move to cloud infrastructure is “when will this migration start saving me money in absolute terms”. While Amazon Web Services and others provide calculators to assess the potential savings for infrastructure costs run in their respective clouds, in all the projects we’ve worked on here at Cloud Advantage, we’ve never really found these tools of any real benefit except as guides and verifications on the more complex financial models we so love to build. The questions that must be answered are:

  • When will my investment in a cloud migration project start paying dividends
  • How much is the initial additional cost of migration
  • What architectural changes do I need to make
  • When will I have paid for the migration and start saving real money in my IT operation
  • I believe cloud could be cheaper, but prove it to me

These are the questions we would like to answer for businesses everywhere, and we’ve had some experience.  Recently we’ve been working on a major review of IT infrastructure costs at a client whose annual IT spend is around $AUD9m. This represents about 11% of total annual revenue and this means the company has slightly above industry standard expenditure on IT. A lot of our time is always spent fact gathering and checking, but the interesting work starts when we plug figures into some our financial and project models. I know I’ve written about this in the past, but in this post I wanted to share an interesting output from one of these models here. This diagram shows three key points on a project timeline.

imageFirstly, it shows that when migration starts, there are some additional costs. For example, additional key talent and added costs incurred of migrating infrastructure onto cloud while keeping the old system running. At some point these additional costs start to drop off due to reduced headcount, and ending certain contracts with existing providers. At some point the cloud infrastructure will become cheaper than the existing operating model, and then with some time, the cost savings pay back the project costs and the depreciation costs of the existing hardware depending on how old it is. We have seen this time and time again and it’s really the crux of what your decision making process should be based on. It’s important to note that the chart on the left is for a client who in 2011 purchased approximately $AUD800,00 in new hardware and then $AUD250,000 in late 2012. Even with the high depreciation cost going forward, the cloud model is still compelling. There is a lot to say about this finding and I’m sure a lot of questions are being raised in the readers mind right now.

One of the points that is made in just about every client we work with is “our workload is different, it doesn’t suit cloud because it’s very “steady, constant, flat”". Ok, great, let’s take a look at the data then. Inevitably when we look at the utilisation data, the utilisation patterns across the fleet of servers clearly show most if not all of the following: (These examples of variability below are expanded upon very well in this Microsoft Report)

  • Time of day variability (for example heavy load during drive time owing to heavy radio listening)
  • Consumer load variability (the number of customers visiting a website isn’t predictable or consistent)
  • Multiple resource variability (some resources need high IO, others are CPU intensive and so forth)
  • Industry specific variability (some online retailers see most their traffic during Christmas)

Which leads me to the below on the left. This is another common output of our analysis, by far the vast majority of guests/servers have CPU utilisation less than 10%. Having said this, it’s important to note that in a virtualised environment, the actual hardware CPU utilisation is significantly better than this. Nonetheless, it’s still not possible to size this underlying hardware up and down on demand and that’s essentially where the benefit is in cloud.

The final point I’d like to make here is that cloud provides the right tools for the right job. Sometimes this means tools that in the past were prohibitively expensive to all but the largest of technology companies. For example, few companies have the ability to spend the R&D dollars on building a distributed, fault tolerant and massively scalable object store like S3. Or take for example a database with high IOPS requirements. We often see a database like this consuming 95% of the IOPS on a SAN grade storage array while the rest of the VMs are doing very little. This may seem unimportant except that the storage array costs something like $100,000.

It’s an exciting time in our industry, and the future I believe is even more exciting with the variety of offerings for technology companies growing rapidly. I’m very enthusiastic about what all this means for Australian businesses.

Leave a Comment

Filed under amazon web services, cloud computing

Can Cloud Infrastructure save mid-size companies money?

One of the great things about cloud computing is that allows access to capabilities that were once only accessible for a large company with a large budget. This makes the cloud a no-brainer for start-ups creating their infrastructure from the scratch, but what about the larger companies who have already invested large sums of money into their owned infrastructure? Can they save money and is it wise to take an bearish approach to cloud?

We often have mid-size companies asking us to find the answer to this question for them. We can say: “Yes, it’s possible!”

When we come to a company and start a 2-week analysis to build a business case for a cloud computing model, we often see something like this: The CEO, or another less technical business leader engages us based on a desire to see that his or her company isn’t missing an opportunity or spending too much on IT. Kudos to them!  The IT leader typically says: “Yes, cloud is interesting and we looked already at it, but we are a special case, cloud computing isn’t right for us. We calculated it already and it’s far too expensive.”

This last sentence is it, where the problem begins: An IT team, which spends many hours to keep a traditional system running, does a quick and sometimes half-hearted approach to put the costs for cloud computing together, but following a really traditional and probably the same architecture as they currently have. We see this all the time, and often they’re just off track. Here’s some common mistakes.

  • Price the storage based on the same amount of excess capacity the existing  system has. This averages around 40% excess, but is often much worse.
  • Price the compute capacity on the capacity required only at peak time.
  • The amount of maintenance the existing hardware or virtual environment takes gets often neglected.
  • The existing hardware doesn’t run forever (sounds trivial, but we saw quite often that hardware refreshments for the existing system were not considered.

The average utilization of traditional systems is hardly ever above 60%, studies rather talk about the 30% mark. This spare capacity represents inflated infrastructure cost. But also, it’s important to note the cost sourcing, securing funding, negotiating terms on and ordering the hardware, the cost for hosting, for configuring and and building the kit with it’s inherent roadblocks and challenges. There’s also the cost of maintenance, for power and numerous other licensing costs and the like. Another big driver is often the required WAN bandwidth. Buying fixed ISP bandwidth often means waste and this is often a big win in many cloud providers.

We’ll take a deep dive into the technical side of any IT operation we regularly see that the larger and more complex the infrastructure is, the more the savings in the cloud actually can be.

Obviously there are other key drivers for a cloud move like business flexibility and reduced IT complexity, but cost will always be an important question.  Cloud Advantage is currently helping companies to realize 40-70% savings on their IT projects by leveraging AWS. If you would like us to run a company-specific scenario for you, please contact me at +61 2 8003 5048 or askwil@cloudadvantage.com.au.

1 Comment

Filed under amazon web services, cloud computing, strategy

New AWS Command Line tool and jq

AWS et al have released a new version of their command line tools, most appropriately called awscli. It’s built on python boto and for example can be used as such:

aws ec2 describe-instances

I don’t want to rewrite how you set it up, the best place to start is at the GitHub page: https://github.com/aws/aws-cli. You’ll need your AWS access and secret keys, to setup some environment (I run on os x) and obviously use python and associated required packages. I like to work with virtualenv and virtualenvwrapper and here’s a good post on those tools.

What’s interesting about this tool is firstly it’s built on boto, and this is good for the boto community and python programmers like me. Actually it’s using the boto-core project which is a new version of the lower level boto functionality. For my interest, what’s interesting is that boto-core is using the requests package from Kenneth Reitz which I believe I suggested to Mitch Garnatt a while back (it’s a big *if* if I was the cause but I like to think so ;-) .

What’s interesting for this post is that it outputs the results in json. This brings me to my next interesting tool, jq. The idea of jq is that it’s like writing SQL for json. JQ is a standalone tool, if you use virtualenv and virtualenvwrapper you can just put your jq tool in $VIRTUAL_ENV_HOME/my_new_virtual_env/bin.

Here you can do some pretty cool stuff which allows you to summarise your EC2 (and other aws) infrastructure. For example, now I can use awscli to do things like:


aws ec2 describe-instances | jq '.reservationSet[].instancesSet[].keyName'
"mykey1"
"mykey2"
"etc"

Above I’m asking for the keys associated with my entire list of instances. With this command below I can just grab the instance ids with their public dns values.

aws ec2 describe-instances | jq '.reservationSet[].instancesSet[] | {dnsName, instanceId}'

{
 "instanceId": "i-98e793e6",
 "dnsName": "ec2-23-23-57-193.compute-1.amazonaws.com"
}
{
 "instanceId": "i-97ed79e6",
 "dnsName": "ec2-23-21-75-169.compute-1.amazonaws.com"
}

You can see that using this approach becomes fairly powerful and the more one gets used to jq and aws commands, the more powerful it becomes. It’d be interesting to see some more examples of how this can work.

1 Comment

Filed under amazon web services, technology

Cloudyscripts and copying AMIs to AWS Sydney Region

Well today I finally got to a task I’m doing for Punters Paradise. For the record, Punters Paradise is an amazing site in its user experience, but also knowing their backend architecture, they’ve achieved such good scalability for their resource usage it serves as a reminder of keeping things as purpose built as possible.

I digress. Recently, cloudyscripts informed us that they now support the Sydney region and with a little effort I have it migrating some AMIs from Singapore to Sydney.

Firstly, get the cloud scripts git repo.

git clone git clone git://rubyforge.org/cloudyscripts.git

Then download install the rubygems.

sudo gem install CloudyScripts

Then take a look at the samples/sample_copy_ami.rb to see how to use copy_ami interface.

cd cloudyscripts

vi samples/sample_copy_ami.rb

Within there you’ll need to make some changes. Namely, entered all the relevant fields required.

For example, you need to set these values:

aws_access_key = "MYACCESSKEY" # Your AWS access key

aws_secret_key = "MYSECRET_KEY" # Your AWS secret key

aws_source_endpoint = "ap-southeast-1.ec2.amazonaws.com"
 source_ssh_user = "root"
 source_ssh_key_file = "/Users/james/.ssh/myKey.pem"
 source_ssh_key_name = "myKey"
 aws_ami_id = "ami-xxxxxxx" # Your EC2 AMI to Copy

aws_target_endpoint = "ap-southeast-2.ec2.amazonaws.com"
 aws_target_region = "ap-southeast-2.ec2.amazonaws.com"
 target_ssh_user = "ec2-user"
 target_ssh_key_file = "/Users/james/.ssh/mySydneyKey.pem"
 target_ssh_key_name = "mySydneyKey"
 new_ami_name = "My migrated ami name"
 new_ami_description = "this is my migrated ami"

Most importantly, you need to ensure you have a new base AMI to use for the migration in the Sydney region. Inside the sample file, you’ll find a chunk of code that looks like this roughly. See the line I created there for the Sydney region.

def self.get_basic_aws_linux_ami(region)
 map = {'us-east-1.ec2.amazonaws.com' => 'ami-23f53c4a', #'ami-09ab6d60', #'ami-8c1fece5',
 'us-west-1.ec2.amazonaws.com' => 'ami-013a6544', #'ami-17eebc52', #'ami-3bc9997e',
 'us-west-2.ec2.amazonaws.com' => 'ami-42f77a72',
 'eu-west-1.ec2.amazonaws.com' => 'ami-f3c3fe87', #'ami-940030e0', #'ami-47cefa33',
 'ap-southeast-1.ec2.amazonaws.com' => 'ami-b4f18be6', #'ami-cec9b19c', #'ami-6af08e38',
 'ap-southeast-2.ec2.amazonaws.com' => 'ami-a1b3249b', #NOTE, this is the line I ADDED
 'ap-northeast-1.ec2.amazonaws.com' => 'ami-8a07b38b', #'ami-96b50097' #'ami-300ca731'
 'sa-east-1.ec2.amazonaws.com' => 'ami-1e34eb03'
 }

Once you’ve done all that, you can basically run the script from the command line:

ruby  -rubygems samples/sample_copy_ami.rb

You’ll see some output that looks like this:


describe images: {"imagesSet"=>{"item"=>[{"rootDeviceType"=>"ebs", "description"=>"2012-11-12_MyServer", "imageLocation"=>"XXXXXXXXXXXXXX/2012-11-12_MyServer", "hypervisor"=>"xen", "architecture"=>"x86_64", "imageType"=>"machine", "isPublic"=>"false", "imageState"=>"available", "imageId"=>"ami-xxxxxx", "virtualizationType"=>"paravirtual", "blockDeviceMapping"=>{"item"=>[{"deviceName"=>"/dev/sda1", "ebs"=>{"volumeSize"=>"10", "deleteOnTermination"=>"true", "snapshotId"=>"snap-xxxxxxx"}}]}, "rootDeviceName"=>"/dev/sda1", "name"=>"2012-11-12_MyServer", "imageOwnerId"=>"123412341234"}]}, "xmlns"=>"http://ec2.amazonaws.com/doc/2011-11-01/", "requestId"=>"19db0e4a-48a2-4c83-b3a3-0ae0f86a1863"}
D, [2012-11-26T15:02:35.663348 #770] DEBUG -- : INPUT PARAM remote_command_handler is set [but not logged]
D, [2012-11-26T15:02:35.663422 #770] DEBUG -- : INPUT PARAM ec2_api_handler is set [but not logged]
D, [2012-11-26T15:02:35.663447 #770] DEBUG -- : INPUT PARAM logger is set [but not logged]
D, [2012-11-26T15:02:35.663470 #770] DEBUG -- : INPUT PARAM target_ssh_username = "ec2-user"
D, [2012-11-26T15:02:35.663501 #770] DEBUG -- : INPUT PARAM target_ec2_handler is set [but not logged]
D, [2012-11-26T15:02:35.663534 #770] DEBUG -- : INPUT PARAM description = "Migrated server"
D, [2012-11-26T15:02:35.663560 #770] DEBUG -- : INPUT PARAM target_key_name = "mySydneyKey"
D, [2012-11-26T15:02:35.663587 #770] DEBUG -- : INPUT PARAM result = {:done=>false, :failed=>false}
D, [2012-11-26T15:02:35.663610 #770] DEBUG -- : INPUT PARAM source_ssh_username = "root"
D, [2012-11-26T15:02:35.663633 #770] DEBUG -- : INPUT PARAM target_ssh_keyfile = "/Users/james/.ssh/mySydneyKey.pem"
D, [2012-11-26T15:02:35.663656 #770] DEBUG -- : INPUT PARAM source_key_name = "myKey"
D, [2012-11-26T15:02:35.663680 #770] DEBUG -- : INPUT PARAM target_ssh_keydata is set [but not logged]
D, [2012-11-26T15:02:35.663703 #770] DEBUG -- : INPUT PARAM name = "MigratedServer"
D, [2012-11-26T15:02:35.663726 #770] DEBUG -- : INPUT PARAM source_ssh_keyfile = "/Users/james/.ssh/myKey.pem"
D, [2012-11-26T15:02:35.663749 #770] DEBUG -- : INPUT PARAM target_ami_id = "ami-xxxxxx"
D, [2012-11-26T15:02:35.663771 #770] DEBUG -- : INPUT PARAM ami_id = "ami-xxxxxx"
D, [2012-11-26T15:02:35.663796 #770] DEBUG -- : INPUT PARAM source_ssh_keydata is set [but not logged]
0: new progress message = Checking parameters...
check if 'xxxxxx' keypair exists

Hopefully, all going well, this will succeed. It’ll take quite a while to copy the data across. Be patient.

If it worked correctly, you should see something like this

0: new progress message = going to delete 'CloudyScripts Opened Security Group' Security Group...
D, [2012-11-26T15:46:04.263844 #792] DEBUG -- : delete Security Group (name: CloudyScripts Opened Security Group)
D, [2012-11-26T15:46:05.414971 #792] DEBUG -- : 'CloudyScripts Opened Security Group' Security Group found.
state change notification: new state = Done (terminated)
== > Results of Copy AMI: true
New AMI ID: ami-xxxxxxxx
done in 2271s

Please post any questions you might have.

PS. Thanks Cloudyscripts! You’re awesome!

Cheers

James

1 Comment

Filed under amazon web services, technology

AWS Sydney Region, I love you!

Last week, AWS announced the launch of their Sydney region, the second in APAC and their 9th globally. For Cloud Advantage, this is very much a dream come true. Often we’re talking to organisations who want to move to cloud but come up against the common issue of high latency to the Singapore or US regions and also the problem of data sovereignty. Now in Australia we mostly solve both those problems. The region was officially launched at the AWS Customer Appreciation Day at the Sydney Westin last week on 13th November by Andy Jassy. What I found most compelling about his talk was the insight into how AWS was born, from someone who was actually there at inception. I’d heard versions of the story before yet Andy’s account was very much the most entertaining and candid.

At Cloud Advantage, we believe cloud computing fundamentally changes the way organisations employ computing and we seek to be at the bleeding edge of what can be done, always looking for radical improvements for all our clients. Ultimately our goal is to see tangible results for customers in the form of cost savings and vastly improved solutions.

I believe the Sydney region launch marks the beginning of the journey for many Australian organisations into the cloud. After the day I was so tired of hearing the word cloud.  Like any term which can mean many different things, I begin to cringe when it is used in the wrong context, and it also eventually loses it’s meaning. The true story here is that cloud represents an evolution in the way people use computers: whether it be small companies creating a vastly superior email solution than inhouse exchange by using Google Apps, or a large bank circumventing the burden of provisioning anything with owned infrastructure by running a new project in AWS, or Dropbox all of a sudden providing a more user friendly and technically superior solution to old file servers in house, or Heroku providing developers with a PaaS that makes them more efficient and that they love using. Cloud infrastructure and services are changing the economics of computing in organisations, reducing the huge overheads of the past, obsoleting the large enterprise installation teams previously required to roll out changes that are now often just a click of a button. This means a huge productivity increase for our workforce, and a far lower cost of ownership for all organisations.

Technically, the Australian region means lower latencies for web applications serving content out of EC2 instances or S3, as well as the ability to ingest large amounts of data much faster through internet connections or AWS Direct Connect, and will make users more likely to attach their existing networks via AWS VPC. Having used EC2 and S3 locally already, I’m seeing a far more snappy interaction. Logged into the instance is more like having the machine on premises and S3 is able to deliver and receive data far quicker. Several partners I’ve spoken to are excited about serving content locally simply because it means google will acknowledge faster page load times and thus improve their respective search relevance scores.

Pricing in the Sydney region is exactly that of Singapore which more than anything is a relief for most prospective customers. Their fears of a higher price point due to Australia’s climbing electricity and labour cost structures are relieved.

One of the services notably absent from our Sydney region is AWS Glacier. I’m sure they’re working on it, however, AWS will never tell when or if the service will be released. One of our clients recently moved around 300TB of data into Glacier in the Northern California region and say regularly the service must have been built just for them. This makes me happy! However, Amazon have a tendency to build services that solve common problems at their core thus enabling vendors to build on top of that to provide enhanced services, and in this case, rest assured many vendors are scrambling engineering resources to enable their products for Glacier. I remember reading a blog post on backblaze building their own “storage pod” in order to produce cheaper storage than relying on something like S3. I feel their unique advantage has been severely threatened by Glacier, though their value is still firm as a leading consumer backup solution.

Regardless of this missing service, I personally love the AWS Sydney region. For Cloud Advantage, it injects new excitement and life into our business and means we’ll be able to deliver radically improved solutions to organisations around Australia.

Cheers

James

Leave a Comment

Filed under amazon web services, info, technology

Cloud Advantage 2.0

I’m pleased to say that I’m joining forces with a business partner I’ve worked with extensively for the past year and who I trust and respect greatly. Wil Schaffner, formerly CTO of NearMap, is coming on board as a co-founder of Cloud Advantage and his leadership skills, creative problem solving and business strategic intelligence experience compliments my technical know-how and pragmatic approach to cloud implementations. Cloud Advantage is moving quickly to become the premier Amazon Web Services consultancy in Australia, specialists in innovative cloud thinking and strategy, as well as complicated hands on implementations involving hundreds of servers and thousands of terabytes. There are very few companies who bring the experience and perspective that Cloud Advantage does when it comes to cloud. The reasons are simple.

1. We come from startups.

That means we take a “can do” approach to cloud initiatives. Our goal is to demonstrate value by creating cost savings and business agility as soon as possible. There are too many companies in this space who strive to land the deal or plug into a cushy enterprise contract. I believe one of the key wins for big enterprises in using cloud is that it can offer executives and management the ability to reclaim agility for pieces of their IT operation often by start afresh or switching completely. It is a way for them to implement projects from a clean slate, and because of the nature of cloud, the security model, the best practices and the self service nature, avoid some (*although certainly not all) of the pitfalls associated with their legacy systems. Just as importantly, it gives them the ability to do it without the risk of hefty capital investments. All of this speaks to the reasons why it’s important that cloud consultants have a lean and agile pedigree as we do: with the advent of cloud computing, many companies are now simply doing it wrong!

2. We have built software.

As a former software engineer, when I started using AWS and EC2, it was literally a dream come true. As anyone who has done development knows, setting up environments, acquiring resources to develop on and trying out new stuff in an isolated way pre-cloud was always a very hard problem to solve. Virtualisation took us some of the way, but the cloud changes it completely. I sincerely believe many of the new services around today quite possibly would not exist were it not for cloud computing (think Pinterest, Instagram, Air BNB and many others), and that we have barely scratched the surface of what this change means to computing globally. However, every day I come across a new consulting company or professional services organisation who wants to be involved in cloud. The problem is that their DNA is old school thinking about IT, enterprise and software. They want to employ the same solutions but some how tack on a cloud component, which can mean missing out on the most substantial benefits of moving to cloud. Because we have worked for startups and built software, migrated complete companies into the cloud, we have the ability to see the solution in new ways that actually matter, not that jump through enterprise hoops and tick old boxes.

3. Runs on the board

Cloud Advantage together with a leading online map provider has moved a complex web and image processing cluster infrastructure to AWS involving hundreds of servers and more than 500TB of data. This has meant employing almost every technology inside AWS, as well as pushing some services such as S3 to it’s limits (more here). We injected 5,000,000,000+ objects to S3 to replace what was a very complicated Tokyo Cabinet service spread across 32 machines in 2 data centres. We have enabled auto-scaling in a web infrastructure which was static. By harnessing the AWS network cost model of TB served per month, we were able to save hundreds of thousands per year over the old contracts with fixed bandwidth stipulations. We did in house cost comparisons based on real incurred costs and compared them like-for-like to the cloud. This is one of the keys to getting both management and technical buy in. It’s crucial for a company to be able to say with surety that they have looked at the costs in a conservative way and cloud is still compelling.

Having done this in the past, and learned what a real implementation costs in a reality, Cloud Advantage brings this crucial know how to the table on every new piece of work. Cloud Advantage has also worked on smaller implementations with 5-10 servers. We’ve helped companies do big data infrastructure setup and execution. Both founders have 12+ years of hard core technology experience. We worked for startups, enterprises, and medium sized companies. We’ve led teams, worked as senior management, and had to make board pitches for a move to cloud. We know what companies care about when it comes to IT, but crucially, we’re able to go back to first principles to see the real nature of the problem at hand. It may start with cost, but ultimately it’s about business agility and systems that support that. This is the real value of cloud. Cloud Advantage isn’t just another consulting company dabbling in cloud computing because “we see it as an important channel into the enterprise” or some other strategic alignment. Our company is founded on the premise that cloud computing is a necessary evolution in the way computing resources are utilised. We will look at the problem before we find the right solution, often it’s not what you’d expect. Cloud has the capacity reap back some of the lost productivity in IT we see causing so many executive headaches and risks in the enterprise and even in small business. Our business lives and breaths this because we believe it is a *much* better way to do things, and we love finding new, innovative solutions to problems for our customers. Cloud computing gives us the ability to make huge improvements quickly for companies of all sizes. That’s why we’re Cloud Advantage.

Leave a Comment

Filed under amazon web services, general, technology

Plan, architect and test for failure in the cloud

With 2 recent outages in AWS‘s east region, I believe it’s worth writing a post about the oft-touted tenet of architecting for failure. If you’re using AWS, your application should be built and tested* for an entire AWS availability zone failure as was the case in the most recent 2 failures in the US East AWS region. Providers like Heroku should really be providing this sort of resilience. My site which runs on Heroku went down (shock horror being such a precious web service and all!) with both these outages and I’ve been talking to the Heroku team about this shortcoming. I’ll post back here when I have a better understanding of their fault tolerance strategies.

While amazon are regularly espousing the “architect for failure”, few companies fully understand how to do it, and even fewer actually test such an event. Certainly part of the blame has to lie with AWS for failing to test their redundant power supply setups, especially given the same problem has happened twice in as many weeks, but nonetheless, this is the reason they always build more at least two AZs in any one region, and in the case of US-EAST-1, four AZs!

Cloud advantage specializes in fault tolerant cloud architectures, but one of the missing pieces is a clear test strategy for new AWS users. Simon Elisha, principal solution architect for AWS here in Australia provides are great introductory presentation on this sort of thinking. Whilst with the latest failure in AWS, Netflix admit some amount of disruption of service to customers, their architecture is also renowned to be extremely resilient to failure. Their team was so focussed on designing for failure, they invented chaos monkey to simulate a variety of different failure events, and more specifically gorilla monkey to simulate an entire AZ failure.

The truth is that the cloud will be the future for tech infrastructure, but a paradigm (please excuse the use of that word) change is required in application development. A good place to start is with AWS’s whitepaper itself.

Despite what is said by customers and the like regarding this latest disruption, I’m very interested to here what comes from the AWS team’s postmortem on root cause.  When an event like this happens to a provider as big as AWS, I believe the actual downtimes for various web sites and the like is less than initially stated in the media. Secondly, it would be good to see as thorough postmortem from properties like Instagram and Netflix about why their services failed to failover to other availability zones sooner.

If you’re reading this blog, I’d love the chance to chat to your firm about how to build a robust cloud infrastructure since I believe with the right design, almost always a better result than your current infrastructure is achievable.

Update:

Finally an AWS postmortem: http://aws.amazon.com/message/67457/

Leave a Comment

Filed under amazon web services, development, technology