Friday, April 20, 2012

Developments at Nebraska

I thought I would do a quick post about recent developments at Nebraska.

Tusker & OSG

New Tusker Cluster
We recently received a new cluster, Tusker.  It is our newest in a line of clusters that prioritize memory per core, and cores per node over clock speed.  Therefore, the cluster is 104 nodes, 102 of which are 64 core, 256GB nodes.

The goal of this cluster is to enable higher throughput of local user jobs while enabling backfilling of grid jobs.  The current breakdown of local and grid jobs can be found on the hcc monitoring page.

A common complaint among our local users is interference between processes on the nodes.  To address this, we patched torque to add cgroups support for cpu isolation.  Memory isolation should come into production in the next few weeks.  This will affect grid jobs by locking down their usage to only a single core.

Nebraska's goal is to support all OSG VO's, and give them equal priority (albeit lower than local users).  All OSG VO's are welcome to run on Tusker.

Nebraska's Contribution to OSG Opportunistic Usage
Opportunistic usage by Site (source)
Nebraska resources have become the largest contributor to opportunistic resources.  Easily over 1/4 of opportunistic usage is happening at Nebraska.  We are #1 (Tusker), #2 (prairiefire), and after adding Firefly's different CE's, #7.  We are very proud of this contribution, and hope it continues.

Stay tuned, the next year should be exciting...

Monday, April 16, 2012

BOSCO + Campus Factory

Checklist while implementing CF + Bosco integration

For the last several months, the campus infrastructure team has worked on software that will help users create a larger, more inclusive campus grid.  The goal largely has been to make the software easier to install and expand.

Integrating the Campus Factory (which is already used on many campuses) with Bosco has been a key goal for this effort.  Last week, I finally integrated the two (Beta Install Doc).  This will have many benefits for the user over both the current Campus Factory and current Bosco.

Feature Campus Factory Bosco v0 Campus Factory + Bosco
Installation Large installation/configuration instructions. Install Condor and campus factory on every cluster. Install on a central submit node. Configuration is handled automatically.
Adding Resources Install Condor and the campus factory on every cluster. Configure to link to other submit and campus factory clusters. Run the command
bosco_cluster -add
Installation and configuration handled auto-magically.
File Transfer Using Condor file transfer, can transfer input and output. Manually with scp by the user. No stderr or stdout from job. Using Condor file transfer, can transfer input and output.
Job State Accurate user job state. Delayed user job state. No exit codes from the user jobs. Accurate user job state.
As you can see from the above table, the combined Campus Factory + Bosco takes the best from both technologies.