Tag Archives: python

New AWS Command Line tool and jq

AWS et al have released a new version of their command line tools, most appropriately called awscli. It’s built on python boto and for example can be used as such:

aws ec2 describe-instances

I don’t want to rewrite how you set it up, the best place to start is at the GitHub page: https://github.com/aws/aws-cli. You’ll need your AWS access and secret keys, to setup some environment (I run on os x) and obviously use python and associated required packages. I like to work with virtualenv and virtualenvwrapper and here’s a good post on those tools.

What’s interesting about this tool is firstly it’s built on boto, and this is good for the boto community and python programmers like me. Actually it’s using the boto-core project which is a new version of the lower level boto functionality. For my interest, what’s interesting is that boto-core is using the requests package from Kenneth Reitz which I believe I suggested to Mitch Garnatt a while back (it’s a big *if* if I was the cause but I like to think so😉.

What’s interesting for this post is that it outputs the results in json. This brings me to my next interesting tool, jq. The idea of jq is that it’s like writing SQL for json. JQ is a standalone tool, if you use virtualenv and virtualenvwrapper you can just put your jq tool in $VIRTUAL_ENV_HOME/my_new_virtual_env/bin.

Here you can do some pretty cool stuff which allows you to summarise your EC2 (and other aws) infrastructure. For example, now I can use awscli to do things like:

aws ec2 describe-instances | jq '.reservationSet[].instancesSet[].keyName'

Above I’m asking for the keys associated with my entire list of instances. With this command below I can just grab the instance ids with their public dns values.

aws ec2 describe-instances | jq '.reservationSet[].instancesSet[] | {dnsName, instanceId}'

 "instanceId": "i-98e793e6",
 "dnsName": "ec2-23-23-57-193.compute-1.amazonaws.com"
 "instanceId": "i-97ed79e6",
 "dnsName": "ec2-23-21-75-169.compute-1.amazonaws.com"

You can see that using this approach becomes fairly powerful and the more one gets used to jq and aws commands, the more powerful it becomes. It’d be interesting to see some more examples of how this can work.

1 Comment

Filed under amazon web services, technology


Recently, working on  migrating large amounts of data to S3, (see S3 learnings), I discovered the fantastic tool, Python Fabric. I used it extensively for the migration work. I had written a rudimentary “manage cluster” boto script which got some serious overloading during this time, but a large bulk of the commands had to be done across 10-40 nodes at once, and Fabric proved ESSENTIAL! When I would push code to github, I could command a group of nodes to update and then re-run the migration code. The list of stuff I could do with a single command is truly awesome! What I wanted to share was how I managed different clusters and sub clusters. The nature of migration meant over time we somewhat manually had to tune the size of the cluster either launching nodes or shutting down completed ones and the like. When I would launch a new subcluster, I’d give that cluster the original “umbrella” tag, say, “import_cluster_1”, (all nodes had this tag) and a sub cluster tag, say “import_sub_cluster_1a”. Then when I wanted to run my fabric commands on just those nodes, I could essentially list all the tags within the entire cluster, make fabric roles for each of them, and then just name the role I wanted my fabric commands to be run with.

# This is my own method of course which simply returns all the instances in
# that region with that "umbrella" tag
host_instances = get_instance_list('us-west-1', 'import_cluster_1', only_running = True)
tags = []
[(tags.extend(i.tags.keys())) for i in host_instances if i.state == 'running']
tags = set(tags)
for t in tags:
    env.roledefs[t] = [i.public_dns_name for i in host_instances if i.state == 'running' and t in i.tags.keys()]
    env.roledefs[t + '_internal'] = [ i.private_dns_name for i in host_instances if i.state == 'running' and t in i.tags.keys()]
print 'Available roles: {0}'.format(', '.join(env.roledefs.keys()))

This gives me great flexibility in the way I command the subclusters and essentially allows me to run any combination of roles I like. The concept of fabric roles combined with tagging in EC2 provides a really powerful way to interact with nodes in a cloud environment.

Leave a comment

Filed under technology