Wednesday, October 14, 2015

The StashCache Tester

StashCache is is a framework to distribute user data from an origin site to an job.  It uses a layer of caching, as well as the high performance XRootD service in order to distribute the data.  It can source data from multiple machines, such as the OSG Stash.
StashCache Architecture (credit: Brian Bockelman AHM2015 Talk)
In order to visualize the status of StashCache, I developed the StashCache-Tester.  The tester runs every night and collects data from submitting test jobs to multiple sites.
The website visualizes the data received from these tests.  It shows three visualizations:
  1. In the top left is a table of the average download speed across multiple test jobs for each site.  They are colored with green being the best, and red the worst.  Also, if the test has not be able to run at a particular site for several days, it will show the last successful test, but fade the color accordingly as the test gets older.  An all-white background means that the test hasn't been conducted for three or more days.
  2. In the top right is a bar graph comparing the average transfer rates for multiple sites.  This method better visualizes the sites.
  3. On the bottom, we have a historical graph showing the last month of recorded data.  You can see that some sites have large peaks of download speeds.  Additionally, some sites are very infrequently tested, such as Nebraska (which is the CMS Tier-2).  Infrequent testing can be caused by an overloaded site that is unable to run the test jobs.
In the future, I want to add graphs comparing the performance of individual caches in addition to the existing site comparisons.  Further, I would like to add many more sites to be tested.

3 comments:

  1. Very cool. Looks like the perfSonar for data caching nodes (DCNs). Definitely need tools like this to provide visibility into caching networks.

    ReplyDelete
  2. Nice! I really haven't played with OSG stash, but I really need to learn about it.. before OASIS is replaced by it? (or will that ever happen?) Please let me know if there is a good documentation / tutorials.

    For OASIS specifically, apart from the raw speed, I'd also like to know the overall reliability and size of the local squid cache (not sure if that applies to stash), etc... I've also hit by IPv6 related issue with squid cache on some sites so I had to add it to do-not-submit list. I am wondering what I need to know in order to use OSG stash properly (I am hoping it works a bit more transparently than OASIS - so that I don't have to worry about squid, for one thing).

    ReplyDelete
    Replies
    1. Nothing like a late reply.

      I must admit, there is not really great documentation on how to use StashCache. It is used by few people, but those that use it, use it a lot.

      Basically, it works as a system of caches for the Stash filesystem on OSGConnect. You place file(s) in the stash space on OSG connect, then add "+WantsStashCache = true" to the submit file. You can copy the files from the same path using the 'stashcp' command on the worker node.

      We really need to add user documentation for StashCache. Also, as far as I know, there are no plans to replace OASIS with StashCache. They are mostly complementary.

      Delete