Derek Weitzel: 2013

Monday, November 25, 2013

OSG at Super Computing!

Nebraska hosted a booth at the Super Computing conference this year.

Showing off the Nebraska booth with the Hadoop visualization

The OSG had a small, but visible presence at the Super Computing conference this year. The Nebraska delegation handed out placards designed by Soichi Hayashi from Indiana.

Nebraska

DOE Booth

Purdue

Clemson

International science grid this week (isgtw.org)

The CRC at Notre Dame

University of Oklahoma

Education & Research Computing Alliance (sserca.org)

Friday, October 18, 2013

Bosco @ CHEP

This week, I attended the Computing in High Energy Physics (CHEP) 2013 conference in Amsterdam to present a poster on Bosco.

Presenting the Bosco Poster

Throughout the conference there where many references to Bosco in talks. First and foremost was the CMS talk on using Bosco to access Gordon: Opportunistic Computing only knocks once: Processing at SDSC. Several other talks referenced Bosco when discussing access to opportunistic resources.

In addition to the conference, I was also able to explore Amsterdam. On Wednesday, I visited the Van Gogh museum.

Me with a Van Gogh self portrait

It has been a great trip, but now I must return to reality, continuing writing my dissertation.

Tuesday, September 10, 2013

The NSA vs. my grid certificate

Note: Opinions or statements made in this post are solely my own, and are not the opinions of any organization that I work with or for. Also, I probably got some details wrong... so there's that.

Though I'm not all that concerned about the NSA decrypting my grid certificate, I am interested in how hard it would be. I want to be clear, I am by no stretch of the imagination a security specialist. But I find these kind of discussions interesting. If I say anything incorrect, please leave a comment so I may correct it.

Recent Events

With much of the recent news that the NSA can decrypt "secure" web traffic, I was curious if they could decrypt my grid traffic, though if they did, it would be quite boring.

Yesterday I read a great blog post on the possible encryption indications from the perspective of a security researcher, Matthew Green. In his article, he gave 3 ways to attack the encryption:

Attack the cryptography. This is difficult and unlikely to work against the standard algorithms we use (though there are exceptions like RC4.) However there are many complex protocols in cryptography, and sometimes they are vulnerable.
Go after the implementation. Cryptography is almost always implemented in software -- andsoftware is a disaster. Hardware isn't that much better. Unfortunately active software exploits only work if you have a target in mind. If your goal is mass surveillance, you need to build insecurity in from the start. That means working with vendors to add backdoors.
Access the human side. Why hack someone's computer if you can get them to give you the key?

Will this work on the grid?

Attacking the cryptography

Grid security is based on Public Key Infrastructure (wikipedia). Each user is issued a Certificate which is then broken into a public key and a private key. Below is a picture of my Mac's keychain view of my certificate:

My Certificate

There are several attributes listed here, but you can see that the key size is 2048 bits. For the RSA algorithm, the largest key publicly factored correctly has been 768 bits (wiki). It took hundreds of computers months to do so. The NIST and the NSA lists 2048 bit encryption to be the standard for Federal computers until ~2030 (source), therefore, one can hopefully presume that they provide reasonable security for the time being.

Going after the implementation

Most grid software are use the OpenSSL library to validate certificates. Lets assume that OpenSSL is acceptable (Matthew makes the argument that it may not be), grid software is not known for it's robustness. It is not difficult to imagine that there are unknown security vulnerabilities in the grid software. In the OSG, the 'gatekeeper' software that validates and authorizes users are no less then:

Globus Gatekeeper (uses various libraries underneath, eventually OpenSSL)
Bestman
Gridftp (globus libraries)
XrootD
HTCondor
...

It may be that not all of these software packages have vulnerabilities, maybe they are all perfect. But it is unlikely, as Matthew says: software is a disaster.

Further, much of the communication in the Grid is authenticated, but not encrypted. For example, gridftp traffic, by default, is authenticated but not encrypted. Same with XrootD data, authenticated but not encrypted. This data could be easily gathered from anypoint along the network path.

Access the human side

The human side may be the easiest to manipulate. Until recently, the DOE issued all user and host certificates for the OSG. As the DOE is a federal agency, they may be more obligated to work with the NSA than a private company (conspiracy theory hat on). Recently a private company, DigiCert, has issued our certificates.

Of course the NSA could ask the DOE or DigiCert to issue duplicate certificates and could do man-in-the-middle attacks. But as Matthew points out, it seems unlikely that the NSA would be interested in such a direct attack. The unencrypted data could be had so much easier from passive gathering.

But, one has to wonder how many places my proxy is currently living, and how many administrative domains I am trusting, implicitly or explicitly. Just on the top of my head, I know my proxy is at:

My Laptop
The HCC GlideinWMS VO frontend
The 3 OSG GlideinWMS factories
The 20+ Globus Gatekeepers that we submit to
Hundreds or possibly thousands of worker nodes across the U.S.

Should we get out our tin foil hats?

Not just yet. It is unlikely that the NSA, or any state entities for that matter, would be interested in the scientific computing resources on a grid. Plenty of garden variety hackers would be interested in the computational power of the grid for bitcoin mining or botnets, but they are unlikely to have the resources to attack any of the above (except possibly the software). The Grid software has been hardened over time to these garden variety hackers, and I am somewhat confident that we can defend against them.

So for the time being, the grid certificates appear to be safe from most threats. Personally, I'm more worried about the software.

Friday, August 30, 2013

You asked, we delivered!

A few users of BoscoR, and specifically GridR, have requested several major features:

Allow custom packages to be installed on the remote worker node (#3).
Specify custom R binaries for the worker nodes (#4). Along with custom binaries, we also added logic so that if you upload new binaries to a URL, they will automatically be detected and installed (#7).
Asynchronous updating of batched processing. This will enable the results to show up in the result list before the entire processing has been completed (#6)

Along with the above mentioned issues, we have also updated the default download location to be hosted at Indiana rather than Dropbox (hence the earlier blog post).

I hope to write a few posts in the next weeks with a more in-depth analysis of these significant user facing improvements, but in the mean time you can read the release notes which include details on how the new features work and how you can use them.

Today we have released the new version, v0.9.8, which includes all of these changes.

How you know you are doing something right?

I just received this from Dropbox:

The notification!

An R user in Wisconsin was using GridR so much... that he used up my bandwidth limit on Dropbox while downloading the R binaries. I'm going to put this down as "a good problem to have".

Wednesday, August 28, 2013

CIC 13 Meeting

This week, the Campus Infrastructures Community debuted the OSG Connect service. Overall it was a very successful workshop. I met 2 users that where very excited about BoscoR, and we had great discussions about using the OSG.

Here are a few pictures from my trip to Duke for your viewing pleasure:

Mats Rynge teaching the basics of High Throughput Computing

Wandering around Duke University

The "Football" stadium at Duke. I won't make any comparisons to Nebraska's...

Monday, July 29, 2013

Bosco & OSG @ XSEDE13

Last week, several members of the Bosco team attended the XSEDE 13 Conference. We enjoyed the great San Diego weather, and even caught a little ComicCon, which was being held next door when I arrived. Here are some of the highlights.

OSG Summer School Programming Team Wins First place!

OSG Summer School Programming Team: Zhe Zhang, Travis Boettcher, Matthew Armbruster, Ben Albrecht, Cassandra Schaening

A group of OSG summer school students formed a team for the student programming challenge. They where given 10 difficult programming challenges to solve and had all day from 8 a.m. to 4 p.m. to work on them. On Friday, the results where released and the OSG team won first place in the programming challenge!

Carrier Dinner

On Tuesday evening, the XSEDE conference hosted a dinner on the USS Midway anchored not far from the conference. It was a great evening of food, drinks, and tours!

The USS Midway anchored in San Diego

Bosco @ XSEDE Poster Session

Wednesday night was the XSEDE Poster Session. We had a great reception. Many people approached us, asking questions about Bosco and how it could help them.

Derek Weitzel and Miha Ahronovitz

Thursday, July 11, 2013

Creating a native Mac Installer for Bosco

It has been our goal for some time to create a native installer for our supported platforms. This week, we created a prototype for the first of those platforms, the Mac installer.

The PKG file

The first step to creating the installer is the PKG file. I installed Bosco the usual way, using the multi-platform installer. Then, I looked through the Bosco files (using a find command) to locate all of the instances of hard coded paths. They where located in the bosco.sh, bosco_setenv, and the condor_config files. I modified those to use the environment variable HOME.

Then I used PackageMaker from the auxiliary tools for Mac package available on the Mac developer page (free registration required). In PackageMaker, I opened the Bosco directory in the home directory.

I unchecked the 'Require admin authentication' because bosco will be installed in the user's home directory. After these simple steps, I built the installer and tested, everything worked!

DMG File

A DMG is a disk image that is used for software distribution. The first step is to create the Bosco.dmg disk image using Mac's built-in Disk Utility.

After mounting the DMG, I copied the Bosco installer into the image. Then, I ran the command:

bless --folder /Volumes/Bosco --openfolder /Volumes/Bosco

This will cause the DMG to automatically open a window when mounting it.

Where to get this

The Bosco DMG makes it very easy to install Bosco on a Mac. The DMG is available on Dropbox, and will be put in production when a few more minor fixes are put in place.

Bosco.dmg

Friday, July 5, 2013

The next step for Bosco: BoscoR

With the 1.1 release of Bosco in January, the Bosco team finally had a usable product for researchers. It didn't have the best usability, but it was a solid tool to build on. When planning the next release, 1.2, we determined that we would not include many new features and instead focus on usability. With that focus, we interviewed researchers and looked around at our colleagues to see how most researchers interacted with their data on a daily basis, which was not with Condor or Bosco. Researchers used applications such as Matlab, Galaxy, or R. After investigating the options, we determined that R would be a great first application to integrate with Bosco. It is open source, has a strong community, and most importantly, it is heavily used by the researchers around us.

The Initial Steps

Our first step was to look at how people are submitting R to HTC resources now. The CHTC does a great job of submitting entire R scripts for Wisconsin researchers. There is also SubmitR from Hubzero which integrates a GUI with submission of R scripts. Both of these methods had good and bad attributes.

The CHTC method was very easy to submit complex R scripts with many package dependencies to a very large number of nodes. But the researcher had to submit only on UW machines with the proper scripts installed. Also, the researcher still had to learn at least a minimal amount of HTCondor commands to successfully run their jobs. The researcher had to leave their environment and learn another, HTCondor.

SubmitR, on the other hand, was very easy to use, with a graphical interface to upload and start jobs. SubmitR is open source, but deeply, deeply integrated with HubZero. Therefore, any submissions to SubmitR would need to be centralized to a small number of locations. Also, SubmitR had the same issue, the researcher had to leave their environment to learn another, in this case, HubZero and SubmitR.

We found that the with Bosco installed on the researcher's laptop, we could leverage that locality to create a better interface. Our next step was to look a possible packages that could submit R jobs to clusters. They are listed on the HPC page for R packages. The only package that fit our requirements, open source, released, and in CRAN was GridR.

Integrating Bosco with GridR

When first looking at the source of GridR, it can be very intimidating. It is using asynchronous assignments to variables (which I would later learn is frowned upon in the R community). It submits jobs by forking a new processes to do the submission and monitor the jobs. Is uses R on the remote side to load the functions and variables to run the processing.

Even though GridR is complicated, it has one great feature, everything is done from inside the R environment. The researcher simply loads a package and uses a familiar function, apply. With that function, the researcher can send a R function to a remote cluster to be processed, and the result will be written asynchronously back to the environment.

When I began looking at modifying GridR, my first step was to find the source repository. Unfortunately, the package hasn't received an update since 2009, and the source repo was no where to be found. Luckily though, another researcher from Harvard had just modified GridR to submit to R jobs to a locally installed HTCondor cluster. By studying his changes, I was able to fork his work and add Bosco submission to GridR.

The modifications to GridR where minimal. I needed to add a new submission type to package, bosco, and modify the local submit scripts with the Bosco submit settings. But GridR requires R to be installed on the remote cluster, so how is Bosco going to get R to the remote worker node before the job executes?

Several different ideas where floated to install R on the remote worker node. We settled on a solution that used a bootstrap method. When the remote job started, instead of directly executing R, it would run a custom bootstrap script written for Bosco submissions. This bootstrap script would determine if R is already installed, and if not, it will download and install R either in the user's home directory or temporarily on the worker node for the duration of the job. This bootstraping enables GridR submission to successfully run anywhere with network access, whether it's on a campus cluster, or even national infrastructure, such as the OSG or XSEDE.

Current Status

The first round of alpha testers have reported great success at running BoscoR. From installation to submission, all alpha testers were successful at running R jobs on remote clusters without any help from the Bosco developers. We also got much needed feedback from the R users on how the interface worked for them.

The most common feedback was the use of the apply function for running multiple jobs. The testers are used to the apply function built into the R language, which runs a function once for every element in a list, then returns a list of results. GridR can do this, but it is not the default action of GridR's apply, therefore it provided an inconsistent interface with the rest of the R language. The Bosco team is working hard on this, and it will be fixed in the next BoscoR beta release out next week.

The Bosco team has also submitted the updates to GridR for review by CRAN. Since GridR has not received an update since 2009, many policies have changed as to how packages can interact with the user's environment. As mentioned earlier, asynchronously modifying the user's environment, such as to set a variable to the result of a remotely execute function, is against CRAN policy. I am confident that this can be worked out, but it will take effort to include the updated GridR in CRAN, and it may take a while.

Closing Thoughts

Integrating with scientific applications and frameworks is the next step in Bosco to increase usability. Bosco cannot remain just a tool to be used, it must have interfaces built on top of it to connect with researchers, to reach out to them. The researchers do not want to leave their environment, and BoscoR provides them a way to stay in their comfort zone, all while completing their research at new levels.

Getting BoscoR

You can get BoscoR from the Bosco website.

Wednesday, June 5, 2013

HCC OSG Workshop

Over the last 2 days, HCC has been running a workshop on:

Building an HTC Cluster
Running on a HTC Cluster

Building an HTC Cluster

Documentation

This part of the workshop was very successful. We where able to help small universities across Nebraska build their very own HTC cluster. We went through the installation of Cobbler, Puppet, and Condor.

Building an HTC Cluster

Running on a HTC Cluster

Documentation - Largely borrowed from the 2012 OSG Summer School

We had many more people for the session on running on an HTC cluster. This was another very successful day. The users learned about HTC: how to write a Condor submit file, how to manage workflows, and how to transfer data. We wanted to focus more on the application and general concepts of HTC (the 'theory') rather than the actual technologies. This is why we choose to talk about portability of an application (R), and data transfer using HTTP.

As always, the conversations between the exercises are the most productive. The user's picking the brains of the experts tends to lead to future collaboration.

HCC OSG Workshop

Friday, May 31, 2013

Bosco at GPN Conference

This week, I presented the Bosco poster at the Great Plains Network annual meeting. It received a very good reception. Many people where interested in what Bosco could do for them. Most of the audience was HPC center admins or directors, therefore they where looking at how they can use Bosco to help their users utilize their campus clusters.

At the poser presentation

At the conference we heard a lot about networks, as GPN is mostly a network collaboration. But we also heard about the Condo of Condos proposal which the OSG is well represented (by Miron). There is a very good webcast of the description of Condo of Condos on the I2 Website. It feels like it's very early in the planning phase, but I am curious how the OSG will integrate with the Condos.

The weather has been terrible, but the meeting has been great.

Monday, May 20, 2013

Submitting R jobs with Bosco

The Bosco team has been working on integrating with the R statistics processing language. We have chosen to modify the GridR package in order to integrate with R.

How will the R user see Bosco?

The goal of the integration is to simplify the method of submitting processing, written in the R language, to remote clusters and grids. The expected steps for the integration are:

Install Bosco
Install the Bosco'ified GridR package into your local R environment.

After installing the 2 pieces of software above, the user creates a R script, which includes the 'function' that is to be executed on the remote cluster. The user can send any data as input, lists, tables, an entire CSV file (already read into a R variable). The function output will be automatically imported into the environment when the remote job has completed.

Below is a demo of the GridR package working with Bosco to submit to a campus cluster here at Nebraska.

RStudio IDE showing demo of Bosco + GridR integration

The steps in the demo are:

Load the GridR library
Create the function, in this case named simply 'a' that doubles the value of the argument.
Initialize the GridR integration to talk to Bosco
"Apply" the function. Run the function 'a', with the input 14, and write the result to the variable "x". Also, wait for the remote job to complete.
Finally, I printed out the value of x, which is 28, double the 14.

This is a very simple demo. You could imagine the function sent to the remote machine could parse the a CSV file, or more complex operations...

The Bosco team expects to have this integration done and in production by Mid-July for the R users meeting.

Tuesday, April 2, 2013

Reprocessing CMS events with Bosco

Prior to the LHC long shutdown, the CMS experiment increased the trigger rate of the detector, therefore increasing the data coming off the detector. The Tier-0 was unable to process all of the events coming off of the detector, therefore the events where only stored and not processed. After the run, the experiment wanted to process the backlog of events, but didn't have the computing power available to do it. So they turned to opportunistic computing and Bosco.

The CMS collaborators at UCSD worked with the San Diego Supercomputing Resource to run the processing on the Gordon supercomputer. Gordon is an XSEDE resource and does not include a traditional OSG Globus Gatekeeper. Also, we did not have root access to the cluster to install a gatekeeper. Therefore, Bosco was used to submit and manage the GlidienWMS Condor glideins to the resource.

Running jobs at Gordon, the SDSC supercomputer

As you can see from the graph, we reached nearly 4,000 CMS processing jobs on Gordon. 4k cores is larger than most CMS Tier 2's, and as big as a European Tier-1. With Bosco, overnight, Gordon became one of the largest CMS clusters in the world.

Full details will be written in a submitted paper to CHEP '13 in Amsterdam, and Bosco will be presented in a poster (and paper) as well. I hope to see you there!

(If I got any details wrong about the CMS side of this run, please let me know. I have intimate knowledge of the Gordon side, but not so much the CMS side).

Tuesday, March 5, 2013

Running Quantum Espresso on the OSG

While running Quantum Espresso on the Open Science Grid, we found a number of issues:

OpenMPI needs to have an rsh binary. Even if you are using shared memory for openmpi, and openmpi does not use rsh, it still looks for the binary and fails if it cannot find it.
Chroots (used on HCC machines for grid jobs) do not support pty's. OpenMPI has a compile option to turn off pty support.

Once these issues where fixed, we were able to submit QE jobs to the OSG using Condor's partitionable slots on 8 cores.

Preparing Submission

Before submitting our first QE job, we had to compile OpenMPI and QE. Since we are an HPC center, we had OpenMPI compiled for our Infiniband, therefore it would always fail on the OSG where there is no Infiniband (let alone our brand and drivers).

After compiling, we created compressed files that contained the required files to run QE:

bin.tar.gz - Only includes the cp.x file, specific to our run. It could have well included much more common pw.x.
lib.tar.gz - Includes the Intel math libraries and libgfortran.
openmpi.tar.gz - Includes the entire openmpi install directory (make install)

Additionally, we wrote a wrapper script, run_espresso_grid.sh, that unpacks the required files and sets the environment.

#!/bin/bash
tar xzf bin.tar.gz
tar xzf lib.tar.gz
tar xzf pseudo.tar.gz
tar xzf openmpi.tar.gz
mkdir tmp
 
export PATH=$PWD/bin:$PWD/openmpi/bin:$PATH
export LD_LIBRARY_PATH=$PWD/lib:$PWD/openmpi/lib:$LD_LIBRARY_PATH
export OPAL_PREFIX=$PWD/openmpi
 
mpirun --mca orte_rsh_agent `pwd`/rsh -np 8 cp.x < h2o-64-grid.in > h2o-64-grid.out

Submission

We used GlideinWMS to submit to the OSG, below is our HTCondor submit file.

universe = vanilla
output = condor.out.$(CLUSTER).$(PROCESS)
error = condor.err.$(CLUSTER).$(PROCESS)
log            = condor.log
executable = run_espresso_grid.sh
request_cpus=8
request_memory = 10*1024
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_input_files = bin.tar.gz, lib.tar.gz, pseudo.tar.gz, openmpi.tar.gz, h2o-64-grid.in, /usr/bin/rsh
transfer_output_files =h2o-64-grid.out
+RequiresWholeMachine=True
Requirements = CAN_RUN_WHOLE_MACHINE =?= TRUE
queue

Note that we pull rsh from the submission machine. OpenMPI does not actually use rsh to start the processes on a shared memory machine, but it does require that the RSH binary is available.

Acknowledgments

This was done with the tremendous help of Jun Wang.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Tuesday, February 5, 2013

Using Bosco to submit to Amazon EC2

A homework assignment for my storage class required running ~30 hours of benchmarks of the btrfs and ext4 filesystems. I thought this would be an excellent time to test Bosco's ability to submit to Amazon EC2 to parallelize the benchmarks.

Preparing Submission

In order to start instances on Amazon EC2, you first need to sign up. Go to https://aws.amazon.com/ and sign up in the top right. After you sign up, you will need the access and secret key. These can be found in the 'Security Credentials' from the account drop down box. They are in the 'Access Keys' tab under the 'Access Key ID' and 'Secret Access Key'. Write those values in 2 files, you will need them when you submit EC2 instances.

Screenshot of Amazon Security credentials site

Next, you will need a script to run at startup of the Amazon instance. When the instance starts up, a service named CloudInit also start on the instance. It will interpret the user data file as a shell script which can setup and start any other services you would like. My shell script is provided below.

Loading ....

This shell script will install python-boto (python bindings for S3 storage and ec2) and git onto the instance. Next, it will download the filebenchrunner (Benchmark runner for the homework), and start it. Most people will probably want to shut down the instance after you are done with processing, in that case you can just add to the bottom a 'poweroff'.

Running the Instance

Running an Amazon instance is as easy as running a Bosco job. First, you must create a Bosco submit file. Below is the one I used:

Some important things to note. I specified ec2_spot_price, which is the amount I am willing to pay for my m1.medium instance to run per hour. I said $0.04 an hour, which is pretty low, but reasonable for a medium instance. You can find all of the current spot prices either in the AWS console, or on the EC2 website. As you can see, the spot prices are much, much smaller than the on-demand price of an instance. For example, for the m1.medium instance, which has 1.7 GB of ram and 1 core, the spot price currently is $0.013 per hour. The on-demand price is $0.120 per hour. That's a 90% discount on a m1.medium. Of course, you should always read the downsides of using a spot instance, such as it can be terminated at any time, without warning, by Amazon. For my benchmarks, I can always re-run benchmarks if my instance is terminated. I needed to run 10 - 10 minute benchmarks, therefore after every benchmark, I uploaded the resulting data to S3 immediately so I wouldn't lose any work if the instance was terminated.

Also, I used the regular Amazon Linux AMI. They are listed on the Amazon website. I could have very well used a CentOS, Ubuntu, or any other linux image for my instance. But, I prefer the official Linux AMI since it provides a very up to date OS which is very similar to the feel of a CentOS 6 instance. For example, it uses yum for repository management, and RPM's to install. And has versions (except for the kernel) similar to CentOS 6.

I also added a special command, periodic_remove, in order to terminate the instance if something went wrong inside the instance. Sometimes yum can hang, or the instance may not start up properly. In those cases, amazon will not notify you of the problem, and Bosco will not be able to determine there is an issue. Since my benchmarks should not last longer than 100 minutes, I automatically remove the instance after 150 minutes (a little breathing room) of running.

You may submit the instance with the normal 'condor_submit' command. The job will move to the Running state when the instance has begun running.

Once the instance has started, you may ssh into the instance by using the unique ssh key that Bosco generates for you. It is specified in the submit file as ec2_keypair_file. You also need the DNS name for the instance, which is available in the job's classad.

condor_q -run

The command will output the hostname of the EC2 host. You may connect to the EC2 instance with the command, replacing the XXXX with the job number, and hostname with the address you get from the above command:

$ ssh -i keyfile.XXXXX ec2-user@hostname

Summary

Pros of using Bosco to submit Amazon EC2 Jobs:

Simple management of Amazon instance from your workstation.
Specify spot price right inside of the job description.
Ability to bootstrap the instance easily with user data scripts.
Ability to use HTCondor policies in order to manage the instances, such as periodic remove statement above.

Cons:

The EC2 universe is only available on Linux builds of Bosco. You cannot manage EC2 instances on the Mac version of Bosco.
Amazon EC2 has hundreds and hundreds of features, Bosco only allows you to use the simple submit EC2 instances and spot pricing. You will not be able to use the vast majority when you are using Bosco to manage your instances. But if all you need is to run some processing, Bosco is great!

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Wednesday, January 16, 2013

Rendering with Bosco

Example image rendered using HCC's distributed renderer

At the Nebraska Holland Computing Center, we take pride in eating our own dog food. So I want to highlight one of our uses of Bosco to enable transparent usage of our clusters.

HCC recently has made a push for enabling the non-traditional users of clusters. Sure it's easy to show how physics can benefit from a set of clusters, but what about the media arts? In this case we are working with a professor whose students render short movies using Maya.

Maya can utilize another Autodesk product called Backburner, which enables rendering across multiple nodes. The challenge was to allow Backburner to operate in a shared cluster environment. The standard method to use Backburner expects nodes dedicated solely to running the Backburner Server daemons. This Master-Worker fits the traditional HTC model well, therefore we felt that Bosco would be a great fit to enable submission to our clusters.

Architecture Diagram of the BOSCO enabled render (Credit: Adam Caprez)

In the architecture shown in the image above, we run a central service, hcc-render that monitors the Backburner queue in order to submit the Backburner servers to the Bosco queue. Bosco then submits a glidein to Tusker, which in turn runs the Backburner server and renders the scene.

In this case, since we are using Backburner server for the actual processing, the data is handled internally by Backburner. In practice this means that data is stored on the Tusker file system which is then mounted by the professors local machine that is in turn mounted by the clients.

This is a classic example of deeply integrating HTC into the user's application.
The user only needs to click the 'render' button from within Maya on their workstation, and we handle all the rest, automatically. This is only made possible because Maya has a sane back end renderer that is designed to run on Linux. This architecture may not work for all commercial applications.

Even though the user isn't directly using Bosco, a primary goal of Bosco, this is an excellent use of Bosco to enable HTC workflows.

Monday, January 14, 2013

Bosco 1.1.1 Release

Today I am pleased to announce that Bosco version 1.1.1 has been released. This is a patch fix for the 1.1 to address 2 issues that where affecting users. The release is available on the Bosco Download Page.

Release notes for the 1.1.1 release are available on the OSG Twiki.

Bosco 1.1 was a major release with many new features. The 1.1 release notes are also available on the OSG Twiki.

On behalf of the Bosco Team,
Derek Weitzel
Dan Fraser
Marco Mambelli
Jaime Frey
Brooklin Gore
Miha Ahronovitz

Improving Gratia's Web Interface

Over the winter break, I worked on improving the interface that most users use for OSG accounting. When I returned from break, I worked with Ashu to integrate the changes with some recent changes he had made. The new interface runs on gratiaweb-itb. The source for the new web page is hosted on github.

The first thing users will notice is the newly designed interface:

New OSG Accounting Interface

The updated interface brings the style of the website inline with that of MyOSG and OIM (or, close). The design stayed close the the original, but the menu on the right has changed significantly. First is just the style of the menu. But also we added a new category, Campus and Pilot View.

In the Campus and Pilot View, we have some new graphs that show usage by GlideinWMS, Campus users, XSEDE users, and in the future, Bosco users.

Lets run through a quick example. In this example, lets assume I'm a VO manager and want to see where my VO is running, how many hours, and who is running.

Select the Pilot & Campus Accounting link.
Scroll to the bottom, to the Refine View.
Enter your VO name into the VO text box and hit enter.

This will pull up the custom page that shows usage for only your VO. For example, if I look at the osg VO:

Usage by the OSG VO.

You can see from the graphs that the OSG VO has used ~80,000 CPU hours a day on the OSG. Also, they are running at over 20 sites. The sites at the bottom of the graph are listed in order of total hours (I am happy to see Nebraska resources as #3, #6, and #9).

You can also see from the graph that usage at sites depends on the day. Some days they get significant usage at the MWT2 (UChicago and IU), and some days they run a lot at Nebraska.

The new usage graphs are intended to help users, administrators, and VO managers view their usage. I hope you find them as useful as we have in the past.

We hope that webpage is an improvement. If there are any comments on further improvements, we are interested in your feedback.

Tuesday, January 8, 2013

Bosco 1.1 Release

In the last few months, I have outlined the new features of Bosco 1.1.

I am happy to announce that today we are releasing Bosco 1.1. The official Release notes are available on the OSG Twiki. It includes all of the features given above. Try it out!