Monday, May 20, 2013

Submitting R jobs with Bosco

The Bosco team has been working on integrating with the R statistics processing language.  We have chosen to modify the GridR package in order to integrate with R.

How will the R user see Bosco?

The goal of the integration is to simplify the method of submitting processing, written in the R language, to remote clusters and grids.  The expected steps for the integration are:
  1. Install Bosco
  2. Install the Bosco'ified GridR package into your local R environment.
After installing the 2 pieces of software above, the user creates a R script, which includes the 'function' that is to be executed on the remote cluster.  The user can send any data as input, lists, tables, an entire CSV file (already read into a R variable).  The function output will be automatically imported into the environment when the remote job has completed.

Below is a demo of the GridR package working with Bosco to submit to a campus cluster here at Nebraska.

RStudio IDE showing demo of Bosco + GridR integration
The steps in the demo are:
  1. Load the GridR library
  2. Create the function, in this case named simply 'a' that doubles the value of the argument.
  3. Initialize the GridR integration to talk to Bosco
  4. "Apply" the function.  Run the function 'a', with the input 14, and write the result to the variable "x".  Also, wait for the remote job to complete.
  5. Finally, I printed out the value of x, which is 28, double the 14. 
This is a very simple demo.  You could imagine the function sent to the remote machine could parse the a CSV file, or more complex operations...

The Bosco team expects to have this integration done and in production by Mid-July for the R users meeting.

Bosco Download

No comments:

Post a Comment