Monday, October 29, 2012

BOSCO v1.1 Features: SSH File Transfer

I am hoping to write about a few of the new features of BOSCO 1.1 before it comes out in December or January.  This is part 1 of that series:

BOSCO v1.1 Feature: SSH File Transfer

What is it?
SSH File Transfer feature improves the method of staging input and output files to the remote cluster.  In 1.0, files are transferred by starting a daemon on the remote cluster that connects back to the submit host over a random port.  This required a lot of open ports on the submit host.  

The new SSH File Transfer will limit the number of ports required on the submit host.  BOSCO will now transfer files over a port that is forwarded over the SSH connection that BOSCO maintains with the remote cluster.  The transfers are inherently secure as they are over the SSH connection, as well as they are authenticated by the Condor daemons on either end of the connection (remote cluster and submit host).   

This fits into the BOSCO team's goal of lowering the amount of ports used by Condor.  Our eventual goal is using the Shared Port Daemon to limit the required ports to 1 for BOSCO on the submit host.

Why should I care?
This will greatly reduce the number of ports required if you are only using the universe=grid method of submitting jobs.  In fact, it will reduce the open ports on the submit host to 0.  That means no more configuring firewalls for BOSCO (no campus factory support, see below).  Additionally, there is no new configuration required for this feature, it 'just works' (famous last words?)

The Campus Factory, which adds features such as fault tolerant Condor file transfer and transparent multi-cluster support, still requires multiple open ports in the firewall.  Additional effort will be required to change the Campus Factory configuration and daemons to support the single port.  I hope that a single port will be all that is needed for v1.2.  

What's Next

Over the next couple weeks, I hope to write more about upcoming features such as:
  • Multi-Platform support (ie cluster and submit host are different platforms)
  • Mac OSX Support
  • Improved Multi-User support

Thursday, October 18, 2012

The Island Grid: Puerto Rico

In the last post, I discussed my trip to Puerto Rico.  Now that my trip is over, and I am back in Lincoln, I would like to share some of the successes of the trip.  I had a great time exploring old San Juan.  I have pictures as well.

Presenting the Holland Computing Center grid

Creating a new User

But on a more relevant note, the Campus Grid in Puerto Rico gained a new user while I was there.  I always feel that computing infrastructure is best built by user's demand.  That has certainly been the case at HCC, where we run between 95%+ utilization on our HPC clusters.  I met with Steve Massey to find how the Island Grid can help him.

Steve Massey is a bio-informatician at the University of Puerto Rico -- Rio Pedres.  His work is an ideal fit for High Throughput Computing.  His processing follows the model of using the same executable against many, many protein pdb files.  We talked for a while on Tuesday, both before and after the power was cut to the UPR campus, about how we can enable this work onto the UPR campus grid, flocking to UNL, and finally to the OSG.  While I was in PR, I worked with Steve to run one of his workflows on the OSG.

I'm not going to pretend to know what is really happening with the workflow, but it takes as input a set of DNA sequences that where pre-calculated (I believe somewhat random) and a pdb file, and calculates the robustness using an external application, Scwrl4.  The output is a robustness file that lists the robustness for the protein with the DNA sequences.

I was able to run this workflow on the OSG using Nebraska's GlideinWMS interface.  I created a submit.sh script that wrote out a simple Condor submit script, and wrapper.sh, that configured the environment on the worker node.  Both of the scripts are available on github.  Together, these two components create the workflow.

There is still work to be done.  The executable that Steve wrote does not properly detect the length of the strand of the amino acid, and therefore is not able to properly calculate robustness and/or send to Scwrl4.

Also, there is another workflow that Steve would like to run.  I hope to continue to work with him to enable these workflows.

Onto the Grid!

In addition to talking to Steve about creating workflows, I also discussed how to integrate his small cluster into the Island Grid in Puerto Rico.  The primary difficulty with adding his cluster is the university's very restrictive firewall.  Therefore, we will be unable to run a central submission host on his cluster, and flocking is out.  But, I believe between Jose and myself, we have a solution.  Either he can run BOSCO and submit through SSH to Steve's SGE cluster, or install Condor worker nodes on Steve's cluster to report to the primary campus cluster, nanobio.

I also talked with a Computer Science researcher to discuss putting his Mac Mini cluster on the Grid.  He is running Linux on the Mac Mini's, therefore it would be a good fit for the grid.  The frontend node to the Mac Mini cluster is not running linux, therefore BOSCO will not work as it needs a consistent platform on the frontend node and the worker nodes.  Instead we will have to install Condor worker nodes on the nodes themselves in order to access the Mac Mini's.

Conclusions

I believe that Puerto Rico has a lot to benefit from creating an Island (Campus) Grid.  They also have a very energetic administrator, Jose, who can spearhead the implementation, with close collaboration with myself and the campus grids team.

Monday, October 15, 2012

Building a Campus Grid on a Island

Hello everyone from San Juan, Puerto Rico.

Hanging Out
I joined my advisor, David, to the Track 2 external advisory board meeting.  My goals while here are:

  • Work with Steve Massey to enable his workflow to run on the UPR campus grid.
  • Enable HPCf to flock with other local Condor clusters (Steve's and CS's).
  • Continue to work with HPCf to enable gratia accounting on their resources.
  • Enable HPCf to Nebraska flocking.