The Bosco team has been working on integrating with the
R statistics processing language. We have chosen to modify the
GridR package in order to integrate with R.
How will the R user see Bosco?
The goal of the integration is to simplify the method of submitting processing, written in the R language, to remote clusters and grids. The expected steps for the integration are:
- Install Bosco
- Install the Bosco'ified GridR package into your local R environment.
After installing the 2 pieces of software above, the user creates a R script, which includes the 'function' that is to be executed on the remote cluster. The user can send any data as input, lists, tables, an entire CSV file (already read into a R variable). The function output will be automatically imported into the environment when the remote job has completed.
Below is a demo of the GridR package working with Bosco to submit to a campus cluster here at Nebraska.
|
RStudio IDE showing demo of Bosco + GridR integration |
The steps in the demo are:
- Load the GridR library
- Create the function, in this case named simply 'a' that doubles the value of the argument.
- Initialize the GridR integration to talk to Bosco
- "Apply" the function. Run the function 'a', with the input 14, and write the result to the variable "x". Also, wait for the remote job to complete.
- Finally, I printed out the value of x, which is 28, double the 14.
This is a very simple demo. You could imagine the function sent to the remote machine could parse the a CSV file, or more complex operations...
The Bosco team expects to have this integration done and in production by Mid-July for the
R users meeting.