With a (yet unapproved) pull request, the HTCondor-CE is able to add new resource types by modifying only 2 files, the routes table and scheduler attributes customization script. Previously, it required editing a third python script which had very tricky syntax (python, which spit out ClassAds...). In the following examples, I will demonstrate how to use this new feature with GPUs.
The Routes
Each job submitted to a HTCondor-CE must follow a route from the original job, to the final job submitted to the local batch system. The HTCondor JobRouter is in charge of translating the original job to the final job, according to rules specified in the router configuration. Crane's GPU route is:
default_remote_cerequirements = "RequestGpus == 1"
This attribute is used in the next section, the local submit attributes script.
Local Submit Attributes Script
The local submit attributes script translates the remote_cerequirements to the actual scheduler language used at the site. For Crane's GPU configuration, the snippet added for GPUs is:This snippet checks for the existence of the RequestGpus attribute from the environment, and if detected, will insert several lines into the submit script. It will first add the SLURM line to request a GPU, then it will source the module setup script and load the cuda module.
Next Steps
The next steps for using GPUs on the OSG is to use one of the many frontends that are capable of submitting glideins to the GPU resources at HCC. Currently, the HCC, OSG, OSGConnect, and GLOW frontends are capable of submitting to the GPU resources.
No comments:
Post a Comment