Monthly Archives: April 2012

Lean Cloud Event, Sydney

With so many incredibly boring technology events in any technology professional’s calendar, this event from AWS and Lean Startup with Werner Vogels and Eric Ries was a really good one indeed. The key takeaways for me were the following.

  1. AWS is enabling companies to employ a more “lean” or experimental approach to creating and releasing technology
  2. AWS allows for startups to scale up for success in a completely elastic way, and also scale down, which is a key component of cloud.
  3. Build your technology to be scalable by using small building blocks.
  4. Lean is all about running intelligent experiments (yeah, read the book)
  5. Validated learning. (read the book)
  6. Minimal viable product (MVP – read the book)
  7. Measure metrics that can be tied to levers that the company’s decisions and actions can affect.
  8. While software development techniques such as agile are great for improving team’s ability to build the right software, lean startup takes the step up to the company strategy level and asks how to build the right company with the right product.

I really do believe that Eric Ries has written something extremely valuable for all entrepreneurs. I remember several times in my career working for startups having flashes of “why are we choosing the approach? it seems so arbitrary and such a guess. Surely we should validate this somehow” yet I never thought it right through to completion and I take my hat off to Mr Ries for having really investigated and invented an approach with such merit and thoroughness.

1 Comment

Filed under general, info

Fabric-licious

Recently, working on  migrating large amounts of data to S3, (see S3 learnings), I discovered the fantastic tool, Python Fabric. I used it extensively for the migration work. I had written a rudimentary “manage cluster” boto script which got some serious overloading during this time, but a large bulk of the commands had to be done across 10-40 nodes at once, and Fabric proved ESSENTIAL! When I would push code to github, I could command a group of nodes to update and then re-run the migration code. The list of stuff I could do with a single command is truly awesome! What I wanted to share was how I managed different clusters and sub clusters. The nature of migration meant over time we somewhat manually had to tune the size of the cluster either launching nodes or shutting down completed ones and the like. When I would launch a new subcluster, I’d give that cluster the original “umbrella” tag, say, “import_cluster_1″, (all nodes had this tag) and a sub cluster tag, say “import_sub_cluster_1a”. Then when I wanted to run my fabric commands on just those nodes, I could essentially list all the tags within the entire cluster, make fabric roles for each of them, and then just name the role I wanted my fabric commands to be run with.

# This is my own method of course which simply returns all the instances in
# that region with that "umbrella" tag
host_instances = get_instance_list('us-west-1', 'import_cluster_1', only_running = True)
tags = []
[(tags.extend(i.tags.keys())) for i in host_instances if i.state == 'running']
tags = set(tags)
for t in tags:
    env.roledefs[t] = [i.public_dns_name for i in host_instances if i.state == 'running' and t in i.tags.keys()]
    env.roledefs[t + '_internal'] = [ i.private_dns_name for i in host_instances if i.state == 'running' and t in i.tags.keys()]
print 'Available roles: {0}'.format(', '.join(env.roledefs.keys()))

This gives me great flexibility in the way I command the subclusters and essentially allows me to run any combination of roles I like. The concept of fabric roles combined with tagging in EC2 provides a really powerful way to interact with nodes in a cloud environment.

Leave a Comment

Filed under technology

Maximising S3 throughput and excess capacity

Recently I’ve been working on a major migration from in house managed colocation hardware to Amazon Web Services. The main thrust of this piece of work was to meet the challenge of providing distributed petabyte scale object store and S3 is certainly a great solution for this. The prevailing mythology in house to this point had been that cloud storage was 4 times more expensive than owned hardware. However, the model was fundamentally flawed. Firstly, this calculation was based on very little actual analysis, minimized the cost of developing and/or maintaining what would be an in house developed solution or a 3rd party or open source solution such as GlusterFS. Now, I should say that GlusterFS is a great piece of software, but I do believe that for a real highly active object store, the price of owning rather than “renting” is comparable if not in favour of cloud storage.

The magic expression I’d like to emphasize about S3 and other cloud storage services like it is simple: $ per byte/hours. A customer only pays for the storage used and for the time it is used. Try doing this with an in house data centre. Unless you are “internet scale”, like the facebooks, zyngas and googles of the world, it’s unlikely you can compete with AWS for storage cost and sophistication. Secondly, it’s important to note what S3 is. It’s a highly available, low latency, highly durable, highly scalable object store. My client stores billions of 100k files, which need to be able to delivered to a web application in 10s of milliseconds. Certainly there are companies out there building cheaper “storage” solutions, but solutions such as backblaze ( I love backblaze btw and am a customer for backing up my macbook pro) have a very different use pattern to S3.

We recently migrated approximately 3 billion objects from an in house tokyo cabinet service to S3. I want to point out something about this. The same people at my client who espoused the “cloud is 4x more expensive” myth, also underestimated how much waste and excess capacity was inherent in their system. 5 copies were stored of every single object. That’s fine if it’s just redundancy, but it wasn’t. Really, just 3 copies could have been stored for that purpose. The reason for 5 copies was an operational one. That’s how the process of generating objects, and then pushing them out to the various DCs had evolved. It was also an artifact of a process that grew up in a startup which didn’t have time to build a highly flexible and generalized object store. Therein lies my point. Most startups that want to focus on their business objectives and key value proposition don’t want to build a really awesome distributed data store (eg. dropbox). Also, often the servers that housed the 100TB + of data obviously weren’t 99.9% full. There was, let’s say theoretically 20% unused capacity. (Realistically this number is more like 50%).

The point is, byte/hours is amazing. Let’s take for example our migration effort. We shipped approximately 120TB of data via 3TB disks to Amazon in Seattle. We then had to migrate that from raw tokyo cabinet .tch files into individual objects on S3. So, granted we needed the extra storage to move to S3 in the first place, but this sort of data migration is done all the time in house. So, all of a sudden, we would need an extra 120TB of storage, but I only need it for about 3 weeks. Where can I get such a solution in a tradition DC? I buy myself a JBOD or similar and fill it with 40 x 3TB disks or so, and then keep it around because selling it isn’t prudent. This point really outlines one of the key advantages of cloud. Those that understand it are sitting there saying, “oh, of course”. Those that don’t should be paying attention.

Incidentally, we did face some challenges importing. We ran up to 30 large instances in parallel, each pulling down .tch files with ~25 threads and then wrote objects with approximately 64 threads each. We would roughly see 45MB/sec write rates across a max 10 concurrent instances. This is roughly equivalent to 4Gbps writing. Try doing that with a single JBOD or even a lot of physical machines. This works out to about 5000 puts/sec.  Incidentally, if you are going to be using S3 for heavy throughput, it’ws worth paying attention to this from S3 team, essentially you’ll need to partition your keynames so that S3 can partition the data appropriately. If you’re really pushing S3, don’t take this as advice, take it more as “do this or get throttled”. Getting throttled is one thing in our case of a data import, I’m sure it’s another thing in the case of a live production system. Also, if you’re really pushing the limits of S3, you will see this error a bit:

InternalError: We encountered an internal error. Please try again. (code 500).

The error codes for S3 can be found here. Through out discussions with the S3 team, we were advised to take an exponential backoff approach on this error, meaning on this error we should wait 1, 2, 4, 8, 32… seconds before trying again. This gives the S3 service time to scale out as it reaches partition limits and the like. It’s worth mentioning in relation to the post from the S3 team mentioned earlier that we ended up with a partitioning scheme of [A-Z][a-z][0-9][_-] {2} (essentially two alpha-numeric values with some two extra characters. It turns out to be a mod(64) operation on two independent pieces of meta data for each object).  such that at the root of every keyname in our bucket is something like 0A, 0B, etc etc. This gives us a keyspace of 4096 which is relevant when you consider the blog post above regarding S3 tips and tricks. The downside is that now we have to make 4096 requests to delete a single “set” which would otherwise be stored in a single directory. The trade off is worth it. Also of note, when you start talking about billions of objects, this translates to approximately $10k per billion just in puts! So be aware of this, and make your objects bigger or chunked together if at all possible. Nonetheless, I was able to simply provision 30 large instances, attach a 1TB drive to each one, pull down the tokyo cabinet files to each server, reprocess the data, store it to S3, shutdown the instances when I was done or when we were waiting for data to arrive in the import/export service, and delete the original data once it was processed. This sort of flexibility in computing resources is really incredible and something that many companies don’t realise.

In this blog post I capture the metrics we used for doing a real cost comparison between S3 and “build it yourself” distributed datastore. It’s an interesting read.

1 Comment

Filed under technology