Tuesday, February 5, 2013

Using Bosco to submit to Amazon EC2

A homework assignment for my storage class required running ~30 hours of benchmarks of the btrfs and ext4 filesystems.  I thought this would be an excellent time to test Bosco's ability to submit to Amazon EC2 to parallelize the benchmarks.

Preparing Submission

In order to start instances on Amazon EC2, you first need to sign up.  Go to https://aws.amazon.com/ and sign up in the top right.  After you sign up, you will need the access and secret key.  These can be found in the 'Security Credentials' from the account drop down box.  They are in the 'Access Keys' tab under the 'Access Key ID' and 'Secret Access Key'.  Write those values in 2 files, you will need them when you submit EC2 instances.

Screenshot of Amazon Security credentials site


Next, you will need a script to run at startup of the Amazon instance.  When the instance starts up, a service named CloudInit also start on the instance.  It will interpret the user data file as a shell script which can setup and start any other services you would like.  My shell script is provided below.
Loading ....

This shell script will install python-boto (python bindings for S3 storage and ec2) and git onto the instance.  Next, it will download the filebenchrunner (Benchmark runner for the homework), and start it.  Most people will probably want to shut down the instance after you are done with processing, in that case you can just add to the bottom a 'poweroff'.

Running the Instance

Running an Amazon instance is as easy as running a Bosco job.  First, you must create a Bosco submit file.  Below is the one I used:


Some important things to note.  I specified ec2_spot_price, which is the amount I am willing to pay for my m1.medium instance to run per hour.  I said $0.04 an hour, which is pretty low, but reasonable for a medium instance.  You can find all of the current spot prices either in the AWS console, or on the EC2 website.  As you can see, the spot prices are much, much smaller than the on-demand price of an instance.  For example, for the m1.medium instance, which has 1.7 GB of ram and 1 core, the spot price currently is $0.013 per hour.  The on-demand price is $0.120 per hour.  That's a 90% discount on a m1.medium.  Of course, you should always read the downsides of using a spot instance, such as it can be terminated at any time, without warning, by Amazon.  For my benchmarks, I can always re-run benchmarks if my instance is terminated.  I needed to run 10 - 10 minute benchmarks, therefore after every benchmark, I uploaded the resulting data to S3 immediately so I wouldn't lose any work if the instance was terminated.

Also, I used the regular Amazon Linux AMI.  They are listed on the Amazon website.  I could have very well used a CentOS, Ubuntu, or any other linux image for my instance.  But, I prefer the official Linux AMI since it provides a very up to date OS which is very similar to the feel of a CentOS 6 instance.  For example, it uses yum for repository management, and RPM's to install.  And has versions (except for the kernel) similar to CentOS 6.

I also added a special command, periodic_remove, in order to terminate the instance if something went wrong inside the instance.  Sometimes yum can hang, or the instance may not start up properly.  In those cases, amazon will not notify you of the problem, and Bosco will not be able to determine there is an issue.  Since my benchmarks should not last longer than 100 minutes, I automatically remove the instance after 150 minutes (a little breathing room) of running.

You may submit the instance with the normal 'condor_submit' command.  The job will move to the Running state when the instance has begun running.

Once the instance has started, you may ssh into the instance by using the unique ssh key that Bosco generates for you.  It is specified in the submit file as ec2_keypair_file.  You also need the DNS name for the instance, which is available in the job's classad.
condor_q -run

The command will output the hostname of the EC2 host. You may connect to the EC2 instance with the command, replacing the XXXX with the job number, and hostname with the address you get from the above command:
$ ssh -i keyfile.XXXXX ec2-user@hostname

Summary

Pros of using Bosco to submit Amazon EC2 Jobs:
  • Simple management of Amazon instance from your workstation.
  • Specify spot price right inside of the job description.
  • Ability to bootstrap the instance easily with user data scripts.
  • Ability to use HTCondor policies in order to manage the instances, such as periodic remove statement above.
Cons:
  • The EC2 universe is only available on Linux builds of Bosco.  You cannot manage EC2 instances on the Mac version of Bosco.
  • Amazon EC2 has hundreds and hundreds of features, Bosco only allows you to use the simple submit EC2 instances and spot pricing.  You will not be able to use the vast majority when you are using Bosco to manage your instances.  But if all you need is to run some processing, Bosco is great!

Bosco Download
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

2 comments:

  1. Instead of
    condor_q -format '%s\n' EC2RemoteVirtualMachineName

    you can simply use

    condor_q -run

    for HTCondor >=7.8 (iirc)

    ReplyDelete
    Replies
    1. You're correct. I updated the post, thanks for the heads up!

      Delete