GSI support in the cloud scheduler allows a user to authenticate using his/her grid certificate when submitting a job to the cloud scheduler. The cloud scheduler will then use these credentials for authenticating the Nimbus workspace creation and workspace deletion.

Enabling GSI support in the cloud scheduler will also put some restrictions on the VM which will only allow jobs from the owner of that VM to be started on it. In other words, jobs owned by user B will not be started on a VM owned by user A. The rationale behind this is to prevent access to a delegated proxy on a VM to other users.

Requirements:

A cloud scheduler codebase with GSI support

if you want to renew your certificate via CDS, use the cloud scheduler codebase from the CDS branch

merged into dev branch on Sep 13, 2010

A working CA is required to sign the dummy VM host certificate.

Done. Running on alto.cloud.nrc.ca

A working Globus Toolkit is required on the host running the cloud scheduler.

The user requires a valid grid certificate (x509)

The VM images must have a recent version of the condor startup scripts (with generic local condor config support)

Install NEP-52 root CA package

If you already have your own CA that you can use to sign your own X509 certificates, you can install your CA package instead.

For the time being, the NEP-52 CA is hosted on myproxy.cloud.nrc.ca. Send the above certificate request to Andre.Charbonneau@nrc-cnrc.gc.ca
If you have your own CA that you can use, simply send this request to your CA to get signed.

Install the signed certificate in the VM-host-cert directory created above.

Configure GSI Authentication in Condor

GSI needs to be enabled at the Condor level. This is required in order to be able to authenticate users via their X509 certificate (proxies).

Configure the nimbus grid-mapfile on the cloud servers

Make sure that the authorized users DN are added to the Nimbus grid mapfiles on the cloud server that this user is allowed to use.
For example, on alto.cloud.nrc.ca, this is in the following file:

/usr/local/nimbus/services/etc/nimbus/nimbus-grid-mapfile

Testing

Restart the cloud scheduler

Create a user proxy (full legacy). Make sure it's lifetime will cover the duration of the job.

$ grid-proxy-init -old [-valid HH:MM]

Add x509 proxy info in your job description

In order to use GSI authentication, you need to specify your user proxy in your job description. This is done using the x509userproxy classad attribute. For example:

x509userproxy = /tmp/x509up_u20200

Submit the job

$ condor_submit <job-description-file>

Credential renewal

The cloud scheduler implements job credential renewal via a MyProxy server. The idea is simple: the user first puts a long lived proxy a MyProxy server prior to submitting a job and then puts the proxy information in the job description. Periodically, the cloud scheduler will scan all the jobs proxy certificates and attempt to renew those which are about to expire.

Note that this proxy renewal feature will only renew proxies that reside on the cloud scheduler. User proxies delegated to the worker nodes by Condor will not be automatically renewed.

To use automatic credential renewal, follow the instructions below:

Configure cloud scheduler to enable credential renewal

# job_proxy_refresher_interval specifies the amount of time, in seconds, between each job proxy
# credential expiry checks. To disable proxy refreshing altogether, simply set this
# value to -1
#
# The default value is -1
#job_proxy_refresher_interval: -1
# job_proxy_renewal_threshold determines the amount of time, in seconds,
# prior to proxy expiry date at which a proxy will be refreshed
#
# The default value is 900 (15 minutes)
#job_proxy_renewal_threshold: 900

Put a long-lived proxy to a MyProxy server

Prior to submitted one or more long lived job, the user should run a command like the following:

In the above command, replace with the FQHN of your cloud scheduler and with unique name for your credentials in the MyProxy server. Also, if needed, change alto.cloud.nrc.ca in the above command to the name of the MyProxy server for your cloud scheduler (contact your system administrator if you are not sure what value to use for the MyProxy server).

The default lifetime of the delegated credentials on the MyProxy server is one week. If you want a different lifetime, specify it using the -c command line argument to the myproxy-init command shown above.

Add MyProxy info to job description

The user must put the following information in his/her job description:

In the above job description attributes, replace with the unique name for your credentials in the MyProxy server. Also, if needed, change alto.cloud.nrc.ca in the above command to the name of the MyProxy server for your cloud scheduler.

Refreshing user proxy on worker node

Condor already will automatically sync the files between the submit machine and the worker, so no additional step is required to have the proxy on the worker refreshed. (see (http://www.cs.wisc.edu/condor/manual/v6.8.0/8_4Development_Release.html) If the user's job has proxy renewal via MyProxy properly configured as per instructions above, then the renewals should propagate automatically to the worker.

If for some reasons this does not work for you, then there is a way to do this using the condor_chirp mechanism, as shown below:

Refreshing a user proxy can be done in the job's script. An example of a job script that pulls a fresh proxy is shown below:

Note: Note that the refreshed proxy on the execute side will be slightly different than the original one put there by condor when the job started. This can be seen by looking at the output of the openssl x509 command, as shown below:

The reason is that when condor put it there when the job starts, condor actually 'delegates' it there. When we run condor_chirp to fetch the proxy directly from the submit machine, we get the proxy as-is on the submit machine, without doing any delegation. This is a technicality and should be transparent to the end user.

It is unclear if the data transfers done by chirp is encrypted or not. Removing the user proxy before doing the chirp call does not affect chirp's behavior; so we can conclude that with the default condor configuration, the user's proxy is not used for authenticating chirp's data transfers. So far, no information about data encryption could be found. It is important to determine this because the user proxy is unencrypted. Still investigating... (Andre)

I suspect this has something to do with the CS cleaning up the job's spooled files (including the user's delegated creds!) when the job is not in the Running state anymore. Probably the CS will have to be updated to recognize the 'C' state and not touch the job's files until it is actually removed from the queue.