Tag Archives: clusters

Fabric-licious

Recently, working on  migrating large amounts of data to S3, (see S3 learnings), I discovered the fantastic tool, Python Fabric. I used it extensively for the migration work. I had written a rudimentary “manage cluster” boto script which got some serious overloading during this time, but a large bulk of the commands had to be done across 10-40 nodes at once, and Fabric proved ESSENTIAL! When I would push code to github, I could command a group of nodes to update and then re-run the migration code. The list of stuff I could do with a single command is truly awesome! What I wanted to share was how I managed different clusters and sub clusters. The nature of migration meant over time we somewhat manually had to tune the size of the cluster either launching nodes or shutting down completed ones and the like. When I would launch a new subcluster, I’d give that cluster the original “umbrella” tag, say, “import_cluster_1″, (all nodes had this tag) and a sub cluster tag, say “import_sub_cluster_1a”. Then when I wanted to run my fabric commands on just those nodes, I could essentially list all the tags within the entire cluster, make fabric roles for each of them, and then just name the role I wanted my fabric commands to be run with.

# This is my own method of course which simply returns all the instances in
# that region with that "umbrella" tag
host_instances = get_instance_list('us-west-1', 'import_cluster_1', only_running = True)
tags = []
[(tags.extend(i.tags.keys())) for i in host_instances if i.state == 'running']
tags = set(tags)
for t in tags:
    env.roledefs[t] = [i.public_dns_name for i in host_instances if i.state == 'running' and t in i.tags.keys()]
    env.roledefs[t + '_internal'] = [ i.private_dns_name for i in host_instances if i.state == 'running' and t in i.tags.keys()]
print 'Available roles: {0}'.format(', '.join(env.roledefs.keys()))

This gives me great flexibility in the way I command the subclusters and essentially allows me to run any combination of roles I like. The concept of fabric roles combined with tagging in EC2 provides a really powerful way to interact with nodes in a cloud environment.

Leave a Comment

Filed under technology