Thursday, November 29, 2012

BOSCO v1.1 Features: Multi-Cluster Support

This is the forth in the in the series of features in the 1.1 release of BOSCO.  The previous posts have focused on SSH File Transfer, Single Port Usage, and Multi-OS Support.  All of these features are new in the 1.1 release.  But now I want to talk about a feature that was in the previous 1.0 release, but is important enough to discuss again, Multi-Cluster Support.  This feature is very technically challenging, so I will start with why you care about Multi-Cluster in BOSCO.

Why do you care about Multi-Cluster?

On a typical campus, the each department may have it's own cluster for their use.  Physics may have a cluster, Computer Science has one, and Chemistry may have another.  Or a computing center may have multiple clusters reflecting multiple generations of hardware.  In either of these cases, users have to pick which cluster to submit jobs to, rather than submitting to which ever has the most free cores.

You don't care what cluster you run on.  You don't care how to submit jobs to the PBS Chemistry cluster or the SGE Computer Science cluster.  In addition, who wants to learn two different cluster submission methods.  You only care about finishing your research.

BOSCO can unify the clusters by overlaying each cluster with an on demand Condor cluster.  That way, you only learn the Condor submission method.  The Condor job you submit to BOSCO will then be run at whichever cluster has the first free cores for you to run on.

What Is Multi-Cluster?

In BOSCO, Multi-Cluster is the feature that allows for submission to multiple clusters with a single submit file.  A user can submit a regular file, such as:
universe = vanilla
output = stdout.out
error = stderr.err
Executable = /bin/echo
arguments = hello
log = job.log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
queue 1

When the job is submitted, BOSCO will submit glideins to each of the clusters that are configured in BOSCO.  The glideins will start on the worker nodes of the remote clusters and join the local pool at the submission host, creating an on-demand Condor pool.  The jobs will then run on the remote worker nodes through the glideins.  This may be best illustrated by a picture I made for my thesis:

Overview of the BOSCO job submission to a PBS cluster

In the diagram, first BOSCO will submit the glidein to the PBS cluster.  Second, PBS will schedule and start the glidein on a worker node.  Finally, the glidein will report to the BOSCO submit host and will start running user jobs.

The Multi-Cluster feature will continue to be a integral part of the BOSCO.

The Beta release of BOSCO is due out next week (fingers crossed!).  Be watching this blog, and the BOSCO website for more news.



Tuesday, November 6, 2012

BOSCO v1.1 Features: Multi-OS Support

This is part 3 of my ongoing series describing new features in BOSCO v1.1.  Part 1 covered file transfer over SSH, Part 2 covered single port usage.  This post will cover the Multi-OS Support in BOSCO.

What is it?

The Multi-OS feature is intended to allow users to submit to clusters that may not be the same operating system as the submit host.  This is especially useful when users are running a submit host on their personal computer that is running Debian, while the supercomputer in the next building is running Red Hat 6, a common occurrence.

The Multi-OS components follow a basic process in order to operate:
  1. Detect the remote operating system with the findplatform script.
  2. Download (from the cloud?) the appropriate bosco version for the platform.
  3. Transfer the files needed on the remote cluster.  This includes grabbing the libraries and binaries for the campus factory's glideins.  The glidein creation takes the most time of this process as it needs to compress the libraries and binaries before transferring.
  4. When BOSCO detects jobs idle on the submit host, it will start glideins appropriate for the platform to service the jobs.
The Multi-OS support required modification of the cluster addition and adding both a findplatform script and a glidein_creation script.  

Why do I care?

It is becoming increasingly common that what users run on their machines and what is run on supercomputers are different.  When this is true, it is difficult to install software from one onto the other.  The Multi-OS feature will greatly simplify the installation of BOSCO on clusters.

Our goal with the Multi-OS support is that the users may not know it is even working.  The users just say: "I want to run on this cluster", and BOSCO makes it happen.  No matter what operating system is running on the remote cluster.

One of my tests simulated a possible user scenario   I was running a updated RHEL 6 machine which I installed BOSCO.  I wanted to submit jobs to a RHEL 5 cluster located in our datacenter.  If I simply copied over the bosco install from the RHEL 6 submit host, none of the binaries would work.  But instead, I used the bosco_cluster -a to add the RHEL 5 cluster, and jobs ran seamlessly from the RHEL 6 machine to the RHEL 5.  

The Multi-OS support is available in the latest alphas available on the bosco download page.

Thursday, November 1, 2012

BOSCO v1.1 Features: Single Port Usage

Welcome to part 2 of my ongoing series of v1.1 features for BOSCO.  Part 1 was on SSH File Transfer.

This time, I'll talk about a new feature that we didn't planned on implementing at first, using only a single port for all communication.  After a small investigation, it was discovered that using a single port is very simple, and with no interruption to other components.  I talked briefly about it in a previous post.

What is it?

In 1.0 of BOSCO, the submit host needed a lot of ports open for connections originating from the remote clusters.  This was caused by 2 mechanisms:
  1. File transfer from the BOSCO submit host to the cluster login node before issuing the local submit call (qsub, condor_submit...).  This opens ports on the submit host because the cluster would call out to the submit host to initiate transfers.
  2. Connections for control, status, and workflow management between the cluster worker nodes and BOSCO submit host.  This is the Campus Factory, which gives BOSCO the traditional Condor look and feel.
In order for BOSCO to function, the submit host needs a large swath of ports in order to operate correctly.  Also, as you scale, you will need even more ports open.

The file transfers from the submit host to the login node are now being transferred using SSH, see my previous post.

With the new feature of single port usage, all control, status, and workflow management connections are routed through HTCondor's share_port_daemon on port 11000 (which is hardcoded, but I picked at random).

Why should I care?

Limiting BOSCO to using only 1 incoming port is very useful for users on systems not managed by them.  The node will only need 1 port open in order to run BOSCO, 11000.  If the system has a firewall, you only have to request port 11000 be opened, rather than huge swaths.  If you manage the system, then you will be happy that only 1 port needs to be opened in order to allow BOSCO submissions.

Administrators will like this feature as it is more in line with other applications that they may run.  For example, httpd only requires 1 port, 80.  Now BOSCO is in the same realm, only requiring 1 port, 11000.

Monday, October 29, 2012

BOSCO v1.1 Features: SSH File Transfer

I am hoping to write about a few of the new features of BOSCO 1.1 before it comes out in December or January.  This is part 1 of that series:

BOSCO v1.1 Feature: SSH File Transfer

What is it?
SSH File Transfer feature improves the method of staging input and output files to the remote cluster.  In 1.0, files are transferred by starting a daemon on the remote cluster that connects back to the submit host over a random port.  This required a lot of open ports on the submit host.  

The new SSH File Transfer will limit the number of ports required on the submit host.  BOSCO will now transfer files over a port that is forwarded over the SSH connection that BOSCO maintains with the remote cluster.  The transfers are inherently secure as they are over the SSH connection, as well as they are authenticated by the Condor daemons on either end of the connection (remote cluster and submit host).   

This fits into the BOSCO team's goal of lowering the amount of ports used by Condor.  Our eventual goal is using the Shared Port Daemon to limit the required ports to 1 for BOSCO on the submit host.

Why should I care?
This will greatly reduce the number of ports required if you are only using the universe=grid method of submitting jobs.  In fact, it will reduce the open ports on the submit host to 0.  That means no more configuring firewalls for BOSCO (no campus factory support, see below).  Additionally, there is no new configuration required for this feature, it 'just works' (famous last words?)

The Campus Factory, which adds features such as fault tolerant Condor file transfer and transparent multi-cluster support, still requires multiple open ports in the firewall.  Additional effort will be required to change the Campus Factory configuration and daemons to support the single port.  I hope that a single port will be all that is needed for v1.2.  

What's Next

Over the next couple weeks, I hope to write more about upcoming features such as:
  • Multi-Platform support (ie cluster and submit host are different platforms)
  • Mac OSX Support
  • Improved Multi-User support

Thursday, October 18, 2012

The Island Grid: Puerto Rico

In the last post, I discussed my trip to Puerto Rico.  Now that my trip is over, and I am back in Lincoln, I would like to share some of the successes of the trip.  I had a great time exploring old San Juan.  I have pictures as well.

Presenting the Holland Computing Center grid

Creating a new User

But on a more relevant note, the Campus Grid in Puerto Rico gained a new user while I was there.  I always feel that computing infrastructure is best built by user's demand.  That has certainly been the case at HCC, where we run between 95%+ utilization on our HPC clusters.  I met with Steve Massey to find how the Island Grid can help him.

Steve Massey is a bio-informatician at the University of Puerto Rico -- Rio Pedres.  His work is an ideal fit for High Throughput Computing.  His processing follows the model of using the same executable against many, many protein pdb files.  We talked for a while on Tuesday, both before and after the power was cut to the UPR campus, about how we can enable this work onto the UPR campus grid, flocking to UNL, and finally to the OSG.  While I was in PR, I worked with Steve to run one of his workflows on the OSG.

I'm not going to pretend to know what is really happening with the workflow, but it takes as input a set of DNA sequences that where pre-calculated (I believe somewhat random) and a pdb file, and calculates the robustness using an external application, Scwrl4.  The output is a robustness file that lists the robustness for the protein with the DNA sequences.

I was able to run this workflow on the OSG using Nebraska's GlideinWMS interface.  I created a submit.sh script that wrote out a simple Condor submit script, and wrapper.sh, that configured the environment on the worker node.  Both of the scripts are available on github.  Together, these two components create the workflow.

There is still work to be done.  The executable that Steve wrote does not properly detect the length of the strand of the amino acid, and therefore is not able to properly calculate robustness and/or send to Scwrl4.

Also, there is another workflow that Steve would like to run.  I hope to continue to work with him to enable these workflows.

Onto the Grid!

In addition to talking to Steve about creating workflows, I also discussed how to integrate his small cluster into the Island Grid in Puerto Rico.  The primary difficulty with adding his cluster is the university's very restrictive firewall.  Therefore, we will be unable to run a central submission host on his cluster, and flocking is out.  But, I believe between Jose and myself, we have a solution.  Either he can run BOSCO and submit through SSH to Steve's SGE cluster, or install Condor worker nodes on Steve's cluster to report to the primary campus cluster, nanobio.

I also talked with a Computer Science researcher to discuss putting his Mac Mini cluster on the Grid.  He is running Linux on the Mac Mini's, therefore it would be a good fit for the grid.  The frontend node to the Mac Mini cluster is not running linux, therefore BOSCO will not work as it needs a consistent platform on the frontend node and the worker nodes.  Instead we will have to install Condor worker nodes on the nodes themselves in order to access the Mac Mini's.

Conclusions

I believe that Puerto Rico has a lot to benefit from creating an Island (Campus) Grid.  They also have a very energetic administrator, Jose, who can spearhead the implementation, with close collaboration with myself and the campus grids team.

Monday, October 15, 2012

Building a Campus Grid on a Island

Hello everyone from San Juan, Puerto Rico.

Hanging Out
I joined my advisor, David, to the Track 2 external advisory board meeting.  My goals while here are:

  • Work with Steve Massey to enable his workflow to run on the UPR campus grid.
  • Enable HPCf to flock with other local Condor clusters (Steve's and CS's).
  • Continue to work with HPCf to enable gratia accounting on their resources.
  • Enable HPCf to Nebraska flocking.

Thursday, September 27, 2012

Flocking to the OSG behind a restrictive Firewall

On many campuses, restrictive firewalls are the name of the game, rather then the exception.  Here at Nebraska, we have the ability to define our own network policies.  But some universities / labs restrict firewall policies.  In this post, I want to talk about how you can still flock to the OSG with only 1 port open to the external world.

What you will need before beginning:
  • A RHEL 5 or 6 machine with root access.
  • The machine needs to have a public IP address, but does not need a lot of ports open.  Actually only needs 1 port open.  Make a note of what port is open.
  • An OSG Certificate: I know that most people do not like using certificates.  Actually, no one likes using certificates.  Here is the newer, easier place you can apply for a new(ish) digi-cert certificate.  Not sure if it's production yet.

First, we will start on the machine as root, we will install the OSG Repos.  The instructions are on the OSG Twiki.  For RHEL 5, here's the short instructions:
$ rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
$ yum install yum-priorities -y
$ rpm -Uvh http://repo.grid.iu.edu/osg-el5-release-latest.rpm

Next we will send install Condor and the OSG-Condor-Flock packages on the machine.  Documentation on setting up a Condor Flock is also on the OSG Twiki:
$ yum install condor osg-ca-certs -y
$ yum install --enablerepo=osg-development -y osg-condor-flock

After installing these tools, we need to configure condor to use our certificate, and to flock to our glideinwms provider of choice.

Next, we need to configure the host to use your certificate.  In order to do this, we make your certificate the 'host' certificate.  Copy your cert and key to: /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem.

Next, we select the glideinwms provider of choice to use.  This is covered more in-depth on the OSG Twiki.  You will also need to send your certificate to the OSG gateway provider you have chosen.  Both have pages on the OSG OIM with contact information.  OSG Gateway.  HCC Gateway.

Next, we need to set condor to use only a single port, and specify that port. In /etc/condor/config.d/99_osg_flock.conf, add the following lines.
USE_SHARED_PORT = True
SHARED_PORT_ARGS = -p 4080

The port number (given by the argument after -p) can be any arbitrary port.  Additionally, you will need to open the firewall.  In iptables, add a line like:
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 4080 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp --dport 4080 -j ACCEPT

And, that is all you need.  Startup condor with 'service condor start', and you're on your way to running on the OSG with only 1 port open to the world.

Friday, June 29, 2012

CMS with the Campus Factory


The Campus Factory is usually used by small research groups to expand their available resources to those on the campus.  Of course, that's not always easy for larger VO's, who tend to have more complicated software setups.  This is where the combination of Parrot and CernVM-FS comes in.
Source: http://cernvm.cern.ch/portal/filesystem

CernVM-FS is a HTTP based file system that serves many CERN based VOs software repositories.  In our case, we used a CernVM-FS server hosted at the University of Wisconsin - Madison (Docs)

Parrot is a program that will capture reads and writes from arbitrary executables and redirect them to remote resources.  For our use, we will redirect reads from the local file system to reads from the CernVM-FS server at UW.

Our T3, as usual, is over subscribed.  Sending our T3 jobs out onto the grid, much like overflowing Tier 2 jobs, would significantly decrease the time to completion for our CMS users.  But, our campus grid does not have CMS software available everywhere, therefore we must export the software to the jobs.  For this, we use Parrot and CernVM-FS.

Pilot submission of BOSCO


The BOSCO system is depicted in the above graphic.  First the user submit their job to their local Condor.  This instance of Condor could be tied to local resources that also can run their jobs, but for this picture, we only show the BOSCO resources.  The Factory periodically queries the user's Condor, and submits Pilot jobs to run the user's jobs.  Once the pilots start on the remote system, they begin executing the users' jobs.  The user does not have to specify any special requirements, nor use any special commands for this system to work.

We used BOSCO to flock jobs from our T3 to our other campus resources.  This process required no user interaction.  Matter of fact, the user had no idea that her jobs where not running on the T3.  This transparent interaction with the user is the primary goal of the Campus Factory design, and was clear in this experiment.
Tier-3 Connection to the UNL Campus Grid

We hope to make this a production service in the future.  In the meantime, this is being used as a prototype for what other Campuses can do with BOSCO.

Acknowledgments: Dan Bradley and the ccTools team for the CernVM-FS integration with parrot.  The AAA project for the file infrastructure to enable transparent data access.  And Helena Malbouisson for allowing me to play with her jobs, sending them to other resources.
Modifications to campus factory configs can be found on github.

Thursday, June 28, 2012

Day 4: Open Science Grid Summer School 2012

Yesterday, I taught the class storage on the OSG.  Since I was teaching, I was unable to write or take any pictures.

Today focuses on actual science on the OSG.  First, how actual science runs on the OSG with Greg Thain.
Greg teaching rules of thumb on the OSG

This afternoon was focused on success stories of running on the OSG.  For example, Edgar Spalding talk on his botany workflow that run successfully not only at Wisconsin, but ran on the OSG as well.

Edgar talking about plant genetics
Today is the last day of the summer school, and we are all feeling very exhausted.  It was a very successful summer school.  Many students learned how their science can be done with HTC.  I am starting to see students think in terms of HTC, such as how they can split their jobs into manageable sizes, what input data would be required?  What output?

Tonight is the final dinner, and then we are done.  I will be driving back to Fermilab, then I will be back in Lincoln next week.

Again, pictures from the summer school can be found here.


Tuesday, June 26, 2012

Video of Igor's Exercise

Igor had a very interesting exercise during the OSG Summer School.

It's a little difficult to explain, but:
Each student at the tables is a worker node.  The people walking around are the 'network', sending jobs to the scheduler which is near the podium.  The students in the stands are 'users'.

The worker nodes can report wrong results, and also mis-represent themselves.  This is a security exercise.


Day 2 of OSG Summer School

We had a great day yesterday at the OSG Summer School.  Not only was the weather great, but the exercises went very well (a testament to Alain).

Monday Evening Work Section
The evening work section was great as well.  We where able to debug some problems that we didn't have time to work on during the day.  Also, we where able to answer questions about the OSG in a much more informal setting.

Today is Igor's day, dealing with Glidein.  So far, the exercises have worked very well.  Blast has been an excellent example for the users.

Igor Presenting Tuesday Morning
Students exercises Tuesday morning

On a side note, parrot is finally working with blast on glidein.  So the Remote I/O talk tomorrow is a GO.

Again, pictures are on a public album on my Google Plus.

Monday, June 25, 2012

Summer School Pictures

I'm putting pictures from the summer school on my Google Plus.


Day 1 of OSG Summer School

I was asked to teach at the OSG Summer School.  I think teaching the next generation of OSG users is a great opportunity.  This is how I learned about the OSG, at the International Summer School for Grid Computing (ISSGC).
Anwar and I learning grid computing in Nice, France

This week we are at the OSG Summer School in warm Madison (not quite the same as Nice).

Alain working the room
Students hard at work on Alain's Exercises
More updates as they happen.

Saturday, June 2, 2012

Installing and Configuring glusterfs on EL6

I'm always interested in the newest technologies.  With the purchase of Gluster by RedHat, I figured it was time to give it a try.  We are always looking for new technology that can lower the operations of our sysadmins, maybe Gluster is that option.

This guide is heavily based on the administrator's guide for Gluster.

Installation

All of the gluster packages are in EPEL, so first we need to install that repo on our nodes.
$ rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-7.noarch.rpm

Then install the glusterfs server:
$ yum install glusterfs-server -y

Then start the server:
$ /etc/init.d/glusterd start

For demo purposes only, flush the firewall:
$ iptables -F

Configuration

And now add the nodes to the gluster system:
$ gluster peer probe i-0000011a
Probe successful
$ gluster peer probe i-0000011c
Probe successful

Now you can check for the nodes with the status command:

$ gluster peer status
Number of Peers: 2

Hostname: i-0000011a
Uuid: 5bdc4f02-4e08-4794-af03-fd624be2d2e0
State: Peer in Cluster (Connected)

Hostname: i-0000011c
Uuid: 248be1ba-c5aa-40d1-90e9-ca95a7e31697
State: Peer in Cluster (Connected)

In this demo, I decided to make a Distributed Replicated volume.  There are many options, but this seemed the best I could see.

To create the volume:
$ gluster volume create test-volume replica 3 transport tcp i-00000119:/exp1 i-0000011a:/exp2 i-0000011c:/exp3

Note, I didn't make the directories /expX on any of the nodes, they are automatically made for you.

To start the volume:
$ gluster volume start test-volume

To mount the volume, we don't have to modprobe fuse since it's built into the 2.6.32 kernel that comes with EL6.  You can also use NFS to mount gluster volumes, but I decided to use fuse.
$ mkdir -p /mnt/glusterfs
$ mount -t glusterfs i-0000011a:/test-volume /mnt/glusterfs

YAY! working glusterfs.  To confirm that it is working, I copied in a test file, mounted the test-volume on another node in the test cluster as well, and there was my file!

Summary

GlusterFS doesn't seem too advanced compared to Hadoop or Ceph.  If I look in the /expX directories I just see the whole file in there.  In the current release, I believe the closest volume configuration we could have to Hadoop or Ceph is Striped Replicated Volumes.  But, that volume type is only supported for use as a MapReduce backend.

I think GlusterFS would be really cool for a OpenStack back end.  Especially since it's so darn simple.  Easily recoverable since the files are stored in plain text.  Of course, you would probably want to do striping for the large image sizes of those files.

Overall, I feel this was the easiest of the file systems I have tried out.  Ceph was a little scary with all the configuration needed.  GlusterFS was as simple as just issueing a command to add another server.  Of course, does this mean it'll load balance the files if a server goes away?  Don't really know how that'll work.


Tuesday, May 29, 2012

Installing your own gratia webpage

In the OSG, we use a technology called Gratia for our accounting.  Every single job and data transfer on the OSG is accounted for in the database.  Of course, this is a lot of data, ~500,000 jobs and 1.2M transfers just today.  Therefore, we have a simple web interface to visualize this data.

Here's the quick instructions of how I setup my own version of it:

  1. Install a newer version of python-cherrypy from rpmforge.  EPEL version is not new enough.
    rpm -Uvh http://rpms.arrfab.net/rpmforge/packages/python-cherrypy/python-cherrypy-3.1.2-1.el5.rf.noarch.rpm 
  2. Install the OSG repos.
  3. Install the Metrics RPM:
    yum install OSG-Measurements-Metrics-Web -y
  4. Copy /etc/DBParam.xml.rpmnew to /etc/DBParam.xml
    cp /etc/DBParam.xml.rpmnew /etc/DBParam.xml 
  5. Now, you can edit DBParam.xml file to point to your own gratia databases.  For example, at Nebraska, we have an instance that points to our own Nebraska gratia server.  This way we can see only local usage.  To use the OSG's, you will need to use the readonly account.  Replace all of the ******'s with 'reader'.  In VIM, you can do:
    :%s/\*\*\*\*\*\*/reader/g
  6. The website relies on a set of static graphs that are updated every few hours.  They have to be saved and served by the systems http server.  So install the http server:
    yum -y install httpd
  7. Make the directory for the static graphs to be saved into:
    mkdir -p /var/www/html/gratiastatic
  8. Configure the static graph generator to put the images in this directory, and to generate the images from the gratia instance.  You will need to change both the Source and Dest.  The configuration is in /etc/osg_graphs.conf:
  9. Change the static graphs location in the DBParam.xml:
    <staticfilehostname value="http://10.148.2.148/gratiastatic"> </staticfilehostname>
  10. Start the services:
    service httpd start
    service GratiaWeb start
Then, you should have a functioning gratia web instance.

Running private instance of gratia web

Monday, May 21, 2012

In NY for CHEP

Hanging out in Times Square
I'm in New York this week for CHEP 2012.  I'll be writing about my experiences here.  Stay Tuned.

Friday, April 20, 2012

Developments at Nebraska

I thought I would do a quick post about recent developments at Nebraska.

Tusker & OSG

New Tusker Cluster
We recently received a new cluster, Tusker.  It is our newest in a line of clusters that prioritize memory per core, and cores per node over clock speed.  Therefore, the cluster is 104 nodes, 102 of which are 64 core, 256GB nodes.

The goal of this cluster is to enable higher throughput of local user jobs while enabling backfilling of grid jobs.  The current breakdown of local and grid jobs can be found on the hcc monitoring page.

A common complaint among our local users is interference between processes on the nodes.  To address this, we patched torque to add cgroups support for cpu isolation.  Memory isolation should come into production in the next few weeks.  This will affect grid jobs by locking down their usage to only a single core.

Nebraska's goal is to support all OSG VO's, and give them equal priority (albeit lower than local users).  All OSG VO's are welcome to run on Tusker.


Nebraska's Contribution to OSG Opportunistic Usage
Opportunistic usage by Site (source)
Nebraska resources have become the largest contributor to opportunistic resources.  Easily over 1/4 of opportunistic usage is happening at Nebraska.  We are #1 (Tusker), #2 (prairiefire), and after adding Firefly's different CE's, #7.  We are very proud of this contribution, and hope it continues.


Stay tuned, the next year should be exciting...

Monday, April 16, 2012

BOSCO + Campus Factory

Checklist while implementing CF + Bosco integration

For the last several months, the campus infrastructure team has worked on software that will help users create a larger, more inclusive campus grid.  The goal largely has been to make the software easier to install and expand.

Integrating the Campus Factory (which is already used on many campuses) with Bosco has been a key goal for this effort.  Last week, I finally integrated the two (Beta Install Doc).  This will have many benefits for the user over both the current Campus Factory and current Bosco.

Feature Campus Factory Bosco v0 Campus Factory + Bosco
Installation Large installation/configuration instructions. Install Condor and campus factory on every cluster. Install on a central submit node. Configuration is handled automatically.
Adding Resources Install Condor and the campus factory on every cluster. Configure to link to other submit and campus factory clusters. Run the command
bosco_cluster -add
Installation and configuration handled auto-magically.
File Transfer Using Condor file transfer, can transfer input and output. Manually with scp by the user. No stderr or stdout from job. Using Condor file transfer, can transfer input and output.
Job State Accurate user job state. Delayed user job state. No exit codes from the user jobs. Accurate user job state.
As you can see from the above table, the combined Campus Factory + Bosco takes the best from both technologies.

Tuesday, March 27, 2012

OSG AHM 2012



This years all hands meeting was a great success!  There where a few sessions that I really enjoyed.

AHM Pictures

Campus Caucus
There where many user engagement people there.  I believe that we reached a consensus that there isn't much an engagement community.

For example, Wisconsin has a great method for distributing and running MATLAB and R application on the OSG, but there has been no knowledge transfer to other engagement folks.  I know a few UNL users that have wanted to run MATLAB on our resources.  If we could move a HCC MATLAB workflow to the Grid, I believe that would be a great success.

I completely agree that there is no Engagement 'community'.  But I think that's true of most of the OSG.  Though, there have been recently many improvements.

  • I think the centralized Jira has helped tremendously.  It's very easy to see what other people have been working on and even the general direction of progress.  Though this only works for OSG 'employees' and OSG projects.
  • The OSG blogs have been successful for the technology group to explain what they are working on.  Though, I wish they had shorter and more often posts.
I hope that the blogs can be a way to spread the OSG Engagement activity.  It's also a great way to point to code and work that is being done.  Also, blog posts shouldn't be limited to things that the author is doing, but could point to what other people are doing.  For example, I knew nothing about the Rosetta people at Wisconsin until my poster was setup next to theirs and was able to have a conversation.  It would have been great to see some information that of what they where doing outside of the once a year meeting.


Science on the OSG

I thought this talk by Frank was great.  I felt he had the same feeling that we where all feeling, that Protein processing was becoming a very large user of the OSG.  We've seen this at HCC with both CPASS, CS-Rosetta, and Autodock.  

Walltime usage for non-HEP
Frank also pointed out a graph of usage.  At the end of the graph, there seems to be a plateau.  Possibly we are hitting opportunistic resource limits?

Here's an updated usage graph (source):
Walltime usage for non-HEP updated
 The thing to note is the explosive growth of GLOW VO.  Their usage has increased dramatically recently.


OSG in 2017

I really liked to see what people thought the OSG would look like in 2017.

Chander's prediction that people will come to us to use the OSG.  I believe this will take a critical mass of users.  I think we have a good product to sell, we just need publicity.  

Chander's comment on data is also important.  But I believe the problem with data isn't necessarily storage, but it's access to the data.  Take for example Dropbox.  For free, they offer very little storage.  The main advantage is that it's accessible from anywhere, laptop, desktop, iphone, web...  I think a uniform data access method can get us a lot further than distributed storage.

Alain's prediction that we will be using more community software.  This will take a large effort to be part of the distribution's community.  I foresee us contributing packages, patches, and effort to Fedora EPEL and possibly Ubuntu.  I think we are making great strides with the packaging, and would like to continue injecting us into the Fedora community.


Nebraska Campus Infrastructure

Of course, my talk is worth looking at.

This post ended up larger than I was hoping for.   Oh well.

Friday, March 16, 2012

Burning the LiveUSBs

In my last post, I talked about the OSG LiveUSBs.  Now that the conference is next week, I have started burning the USB with the image.

USB Key Piles

I burned 4 USBs at a time, using the script below.  Parted didn't work, never really found out why.  The symptoms where that the USB would not boot, but they where readable by Macs.  So I scripted fdisk.

Friday, February 24, 2012

Building an OSG-Client LiveUSB

Nebraska/OSG USB keys to be distributed at OSG-AHM 2012
Since we have started using RPM's for osg software, I've been interested if it where possible to make a LiveUSB of the client.  Due to the great documentation provided on the Scientific Linux LiveCD page, along with the CentOS LiveCD page, I've created a OSG Client LiveUSB that will be put on the keys.

Desktop of OSG Client Live CD
From the picture, notice links on the desktop to OSG User Docs, OSG LiveCD Docs, and How to get a certificate.

Live image creation
The live image creation was done using the livecd-tools package.  I used a Fedora 16 instance on the HCC private cloud to make SL6 images.  The kickstart file used can be found on github.

What's Installed
The goal of the LiveUSB is to give an easily deliverable demo of the OSG-Client, therefore only the OSG-Client and Condor are installed.  The LiveUSB has some persistant data storage area, but not much.

Tools are installed in order to install the live image, including the OSG-Client components, to the local hard drive.  Researchers can then easily have a node up and running.

Also, people that know how to run virtual machines on their computers can easily create a virtual machine with the OSG-Client from this USB.  Just boot from the USB, and click on the Install to Hard Drive icon on the desktop.

These keys will be distributed to attendees of the OSG All Hands meeting.

I am open for suggestions on what should be on the LiveUSB.  The image is not final yet.

UPDATE:
Link to Current ISO: OSG-SL6.2-x86_64-LiveUSB.iso

Friday, February 17, 2012

Ceph on Fedora 16

I've written before how to run ceph on Fedora 15, but now I'm working on Fedora 16.

Last time I complained about how much ceph tries to do for you.  For better or worse, now it attempts to do more for you!

For my setup, I had 3 nodes in the HCC private cloud.  First, we need to install ceph.
$ yum install ceph

Then, create a configuration file for ceph.  The RPM comes with a good example that my configuration is based on.  The example script is in /usr/share/doc/ceph/sample.ceph.conf

My configuration: Derek's Configuration

The configuration has the authentication turned off.  I found this useful because the ceph-authtool (yes, the renamed it since Fedora 15) is difficult to use.  And because all of the nodes are on a private vlan only reachable by my openvpn key :)

Then, you need to create and distribute ssh keys to all of your nodes so that the mkcephfs can ssh to them and configure.
$ ssh-keygen 

Then copy them to the nodes:
$ ssh-copy-id i-000000c2
$ ssh-copy-id i-000000c3

Be sure to make the data directories on all the nodes.  In this case:
$ mkdir -p /data/osd.0
$ ssh i-000000c2 'mkdir -p /data/osd.1'
$ ssh i-000000c3 'mkdir -p /data/osd.2'

Then run the mkcephfs command:
$ mkcephfs -a -c /etc/ceph/ceph.conf

And start up the daemons:
$ service ceph start

You should have the daemons running then.  If they fail for some reason, they tend to output what the problem was.  Also, the logs for the services are in /var/log/ceph

To mount the filesystem, find an ip address of one of the monitors.  In my case, I had a monitor on ip address 10.148.2.147.  The command to mount is:
$ mkdir -p /mnt/ceph
$ mount -t ceph 10.148.2.147:/ /mnt/ceph

Since you don't have any authentication, it should work without problems.

I've had some problems with the different mds, even had a OSD die on me.  It resolved itself, and I even added another OSD to take it's place, recreating the CRUSH table.  Since creating this, I have even worked with the graphical interface:

And here's a presentation I did about the CEPH Paper.  Note,  I may not be entirely accurate in the presentation, do be kind.

Tuesday, February 7, 2012

Fedora 16 on OpenStack

After following Brian's guide on installing Fedora 15 on OpenStack, I thought I would try my hand at Fedora 16.  There where a few differences.

Filesystem Differences
Brian's guide installed Fedora using LVM.  I installed Fedora without LVM (there's a little checkbox on the partition page of Anaconda).  Without LVM, I can skip the steps on listing the physical volumes and logical volumes to find the start and end of the partition.

Also, Fedora 16 uses gpt partition.  fdisk command cannot read the partition table, therefore I had to install gdisk (in epel).  Running it has very similar command and output:

$ /usr/sbin/gdisk -l /tmp/fedora16
GPT fdisk (gdisk) version 0.8.1

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /tmp/fedora16: 20971520 sectors, 10.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): A351197B-8233-4811-9B28-69A1DE121AD2
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 20971486
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            4095   1024.0 KiB  EF02  
   2            4096         1028095   500.0 MiB   EF00  ext4
   3         1028096        16777215   7.5 GiB     0700  
   4        16777216        20969471   2.0 GiB     8200  


Then, to extract the image:
dd if=/tmp/fedora16 of=/tmp/server-extract.img skip=1028096 count=$((16777215-1028096)) bs=512

SSH Key Differences
Brian's guide instructed you to create a /etc/rc.local.  Fedora 16 sees the introduction of systemd, which no longer executes rc.local.  Instead, it looks for the file /etc/rc.d/rc.local (possibly a symlink to /etc/rc.local?).  This file needs to be executable and be sure to include the shebang.

Also, Fedora 16's selinux doesn't label the root file system correctly (BUG), and simply making the .ssh directory doesn't not allow sshd to read it.  To solve selinux problem, I disabled selinux (bad, bad me).


Common Commands
After installing Fedora 16 into an image, and extracting the kernel and ramdisk, there where a few commands that where executed over and over as I debugged the image:

Make the changes to the image:
sudo /usr/libexec/qemu-kvm -m 2048 -drive file=/tmp/fedora16 -net nic -net user -vnc 127.0.0.1:0 -cpu qemu64 -M rhel5.6.0 -smp 2 -daemonize

Extract the partition:
dd if=/tmp/fedora16 of=/tmp/server-extract.img skip=1028096 count=$((16777215-1028096)) bs=512

Start the VM to change the label on the image:
sudo /usr/libexec/qemu-kvm -m 2048 -drive file=/tmp/fedora16 -net nic -net user -vnc 127.0.0.1:0 -cpu qemu64 -M rhel5.6.0 -smp 2 -daemonize -drive file=/tmp/server-extract.img 

Rename the image to something appropriate:
mv /tmp/server-extract.img /tmp/fedora16-extracted.img

Bundle the image for OpenStack:
euca-bundle-image --kernel aki-0000002e --ramdisk ari-0000002f -i /tmp/fedora16-extracted.img -r x86_64

Upload the image to OpenStack:
euca-upload-bundle -b derek-bucket -m /tmp/fedora16-extracted.img.manifest.xm

Register the image (this command completes fast, but openstack takes for ever to decrypt and untar the image):
euca-register derek-bucket/fedora16-extracted.img.manifest.xml


Now to build OSG packages for Fedora...  maybe not.


Tuesday, January 24, 2012

Testing an Globus Free OSG-Software (From EPEL(-testing))

As you may or may not know, there is a massive globus update pending in EPEL that will update globus to the version the OSG distributes.  What this means is much less work for the osg-software team since we will not have to build and support our own builds of globus.

Testing the globus from EPEL while installing some packages from osg repos is not a trivial matter.

  1. Disable the priority of the OSG repo
  2. Exclude globus and related packages that are already in EPEL from the osg repo.
Below is my final file /etc/yum.repos.d/osg.repo

Notice the many excludes in the file, the list may not be complete.

Installation is just:
yum install osg-client-condor --enablerepo=epel-testing

UPDATE!!!!
Testing Results
very good!

I ran 3 tests, all completely successful.
1. globus-job-run against a rpm CE.
$ globus-job-run pf-grid.unl.edu/jobmanager-fork /bin/sh -c "id"
uid=1761(hcc) gid=4001(grid) groups=4001(grid)
2. Condor-G submission
Condor-G Submission worked without problems.  The submission file is below:
3. And globus-url-copy worked:
$ globus-url-copy gsiftp://pf-grid.unl.edu/etc/hosts ./hosts

Friday, January 20, 2012

Initial EL6 Packages for OSG

Last night I completed initial packages for EL6 support.  Just like for EL5, the first OSG component I created is the osg-wn-client.

The osg-wn-client has a complicated dependency tree.  Easily some of the most difficult packages where form glite.

Just some quick tidbits that made the transition easier:

UUID Differences
uuid.h and the associated library is used by many applications.  In el5, uuid is provided by the e2fsprogs package.  In el6, it has it's own package, libuuid.  It was common for me to copy this tidbit into a few packages:
gsoap Differences
glite-fts-client and glite-data-delegation-api-c both use gsoap.  In the past, it was common to copy stdsoap2.c from the gsoap distribution and compile that into your program.  Now that gsoap is a regular library though, it should be linked into the system's version.  In order to do this, I had to add patches to the Makefiles for both packages to link against the system's gsoap.


What's next?  
The next step is the osg-client.  Since there are no more glite packages for the osg-client, this step should be easier.