Derek Weitzel: September Progress Report

Since September was a while ago, I'll keep this short. Most of Sept. was figuring out the class/work/research schedule. It had been a 2 years since I've taken anywhere near a full load of classes, and it's funny how quick you forget the work a class takes.

OSG Software
During September, I continued to help with the OSG software effort. By the beginning of September, I had handed most of my tasks off to others, but I still contributed here and there. I especially contributed to discussions on the software that I had built, this was especially evident for the Condor build in the OSG repositories.

For the OSG Condor build, I backported the 7.7 builds from the Fedora distribution. There was some discussion on the method I choose to build the Condor version was the best. I would argue it is the best of a bad situation. There is no 'proper' build of Condor for RHEL5. This is very much on purpose, since Redhat won't allow a EL5 build of Condor in EPEL since it's distributed in their MRG product. Also, the RPM's that the Condor team produce do not conform to the OSG software standards. They are statically linked against a lot of libraries. They don't have source RPMs. They also just recently started putting things in the right locations, /etc, /usr/bin...

But, by backporting the Fedora build, we didn't get CREAM support. We never had CREAM support before in the VDT, but we would still like it, since it's in the 'binary blob' Condor builds. We would need a properly packaged CREAM client in order to do this.

I'm also not a fan of removing Condor from the osg-client. Condor has been a member of the OSG client since the very beginning. Condor-G is the base method for submitting globus jobs for almost all systems we use today. I can imagine a user downloading the osg-client, then wanting to run a job. But they can't, because they need to install Condor too (which is easy, but still).

SuperComputing Conference Prep
In the month of September, I started developing and forming my idea for Supercomputing visualization. If you don't know, I LOVE visualizations. Especially when it explains a very complex system in an understandable way. Self promotion: Youtube channel

The first task for SuperComputing was to add GLOW usage to the google earth display. While I was at it, I added documentation on how to install and run the OSG google earth display on the twiki.

There where no great ideas on how to expand the current google earth display. We could add transfers, but we don't have good transfer accounting on a per-VO level. Plus, not many VO's do file transfers outside of CMS and ATLAS. Another idea was to incorporate globus online traffic. But again, I don't think the number of transfers, especially to and from OSG sites, is high enough to show that traffic. Maybe one day it will...

So, I turned to something that I had wanted to learn for a while, Android. In the course of a weekend, I built an OSG Android App. After demonstrating the app at HCC for a week, I was able to purchase a Android tablet that will be interactively displayed at SuperComputing. The goal of the application is to provide interactive status of the OSG. Either at the site level, or by VO.

Campus Grids
We got Bill from VT running through the Engage submit host. He is flocking to the Engage submit host, and then out to the OSG, all while using whole machine slots. Additionally, he's flocking to campus factories that are also running whole machine jobs. He was very quick to get it setup, and then start running actual science through the system. His usage is currently monitored by gratia. He hasn't been real active lately, you may need to expand the start and end dates. We should thank the Renci admins Brad and Jonathan for their with the setup, they have been super fast learners of the glideinwms system. And, maybe more importantly, willing to experiment!

There where a few issues this month with Gratia collection. It mostly boils down to mis-understandings on what we are accounting, what is possible (practical) to account (not the same as what we are accounting), and what counts as an 'OSG job'.

We enabled Florida running with the campus factory. They had it setup before, but something had changed and caused some held jobs. It turned out to be harmless, but still a sign of how multiple layers can complicate the system. At least the users only see 1, Condor.

Cloud Platform
When I got back after spending the summer at FNAL, there where a few changes at HCC. First, we had a 'cloud' platform that was just starting to take shape. HCC built a Openstack prototype on a few nodes. I helped beta test the cloud, figuring out a few bugs. I'm happy to report that it is now 'just working'. It has been especially useful when I want to try out new software quickly. For example, last week I quickly installed CEPH distributed file system, and found it usable. It also has been very useful for quick install tests of OSG software.

Misc.
Started using the GOC factory. Very happy to have some fault tolerance on our GlideinWMS install. Though in practice, UCSD has been very stable, it's a nice piece of mind.

New user on our glideinwms machine. Monica from Brown university running some Neutrino experiments (I'm not a physicist!) . For now she's running as HCC, but that may change in the future if usage increases.

Various Grid administration of the HCC site. I've been transitioning off of administrating most of the HCC resources that I used to maintain, freeing me for class work mostly.

Derek Weitzel

Tuesday, October 25, 2011

September Progress Report

No comments:

Post a Comment