Warning: This document describes an old release. Check here for the current version.

Cloud Guide (2.4)

This page describes a particular configuration of Nimbus
that allows the cloud-client to operate out of the box. If you've never
configured Nimbus before, you should be able to follow this
page conceptually but it is not meant to be a replacement for the
administrator guide which will still need
to be consulted.

This page is for deployers of the cloud configuration to
learn about it and configure the workspace service for it. This is
not necessary for cloud users to read and understand. If you
are a cloud user just looking to understand how to launch and manage VMs
on an existing cloud, start at the clouds page.

The service must be set up in
resource pool
mode, controlling any number of
VMM nodes. You may use the workspace
pilot to integrate with a
local resource scheduler. An image repository must be set up, this will
host workspace image files for each client. When a client runs a workspace,
the image to use is transferred from the repository to the VMM that will
be running it.

For the sake of discussion we will assume that the workspace service and
file repository setup are on different nodes. This does not necessarily
need to be the case but it is the recommended configuration because of the
heavy I/O traffic the repository can experience.

The workspace service should be installed as
normal on the service node and GridFTP
must be installed on the repository node.

The server addresses must be directly reachable from the Internet
or otherwise configured to deal with being NAT'd. The Globus container
(where the workspace service runs) and GridFTP can both be setup for
NAT or other port forwarding situations.

The diagram above depicts the basic setup.

A special workspace client called the "cloud-client" invokes operations
on the service and GridFTP server. A number of defaults are assumed
which makes this work out of the box (these defaults will be discussed
later).

Files are transferred from the cloud-client to a client-specific
directory on the repository node (manual or other types of GridFTP
based transfers are also possible if the user is comfortable with
using grid tools directly).

The service invokes commands on the VMMs to trigger file transfers
to/from the repository node, VM lifecycle events, and destruction/clean
up.

If the workspace state changes, the cloud-client will reflect this to
the screen (and log files) and depending on the change might also take
action in response.

Working backwards from the user's cloud-client experience is a
good way to understand how the service needs to be setup.

Here is an abbreviated depiction of a simple user interaction with a cloud,
to give you an idea if you've never used it. This does not depict an
image transfer to the repository node but that is similarly brief.

A grid credential is needed, there is an embedded
grid-proxy-init program if that is necessary.

Some time elapses as the image file is copied to the VMM node. Then
a running notification is printed:

State changed: Running
Running: 'vm-023'

The client had picked up your default public SSH key and sent it to
be installed on the fly into the VM's authorized_keys policy
for the root account. So after launching you can use the printed
hostname to log in as root:

$ ssh root@ahostname.cloudurl.edu

You can see an example of a cluster cloud-client deployment on the
one-click clusters page.

A number of things go into making the cloud client work out of the box,
but it is in large part accomplished by giving the user a downloadable
package with a number of default configurations.

These defaults limit functionality options in some cases, but that is the
idea: eliminate decisions that need to be made and set working defaults.
There are avenues left open for experienced users to do more
(for example, by overriding the defaults or even switching over to the
regular workspace client).

In the previous section, the first thing that probably stands out is that
there are no contact addresses being entered on the command line.

The service and repository URLs are derived from a properties file
that is included in the toplevel "conf" directory of the cloud-client
package. An example file is this
cloud.properties file which
is currently distributed for the Nimbus cloud.

Note: How properties files and commandline overrides work is covered
in a later section in detail, it is all designed to be
flexible under the covers. If you don't want to follow the conventions
laid out in this current "assumptions" section, it will be important to
understand the later section to know how to change things for a good
client package or properties file(s) that your users can use. Continue
reading this section first, though, to get the basic ideas.

There are three main groups of assumptions and defaults. The first is the
contact and identity information of the workspace service and GridFTP
server (see above for configuration sample where this are specified).
The other two groups make up the rest of this "Assumptions" section:

For GridFTP based commands (like --list, --delete, and
--transfer) the server to contact is based on the contact in
the cloud properties file. The X509 identity to verify is in the
cloud properties file. If that property was missing, identity checks
would be based on hostname.

Remember that we are not going to discuss the various ways of
getting options in this "Assumptions" section.

When you transfer a local file, the target of the transfer is the same
filename in your personal repository directory. When you refer to the name
of a workspace to run, this name must correspond to a filename in your
personal repository directory.

We know where the repository comes from but how is that directory derived?

There are two other components to derive the directory used: the configured
base directory property and the hash of the caller's X509
Distinguished Name.

The configured base directory property. The default
configuration for the base directory on the repository node is
"/cloud".

A hash of the caller's X509 Distinguished Name is used as
the subdirectory of the base directory. The algorithm for this
is based on MD5. It produces a string of eight characters, for
example "31ceb17f". The credential being used for the
call is inspected to get the user's DN.

The directories for each user are created by the administrator. Any
(unlikely) hash collisions would be detected at this point. You can
see the hash of any "Globus style" DN with the --hash-print option
of the cloud client. For example:

So with a hypothetical repository hostname "repository.cloudurl.edu",
"/cloud" base directory and DN hash of "a9bad55", the derived GridFTP URL
of the user's "my-workspace" file will be
gsiftp://repository.cloudurl.edu:2811//cloud/a9bad55/my-workspace

Note that there is a cloud-client option to input any name or local
file path and see what the derived URL is. See the --extrahelp
description of the --print-file-URL option.

As of TP2.2, you can auto-create the user directories using the
cloud-admin program.

The second set of assumptions to cover is how a given image file is going
to actually work. There are many options that you can specify in regular
workspace requests. For example, the memory size, the number of network
interfaces to construct, the pool name(s) to lease network addresses from,
and the partition name the VM is expecting for the base partition.

Some fixed assumptions are made:

There can be only one network interface

The network interface is expecting its address via DHCP

There can be only one partition file, for the root partition,
configured with an ext2/ext3 filesystem. Other filesystems may not
work correctly (this has to do with the cloud's default kernel as well
as its ability to edit the image's files before boot).

The rest of the launch request is filled by default configurations,
here they are:

../lib/certs is set as a directory to add to the trusted
X509 certificate directories for identity validations (the client
verifies it is talking to the right servers). Adding the CA cert(s)
of the workspace service and GridFTP host certificates to this
directory ensures that the user will not run into CA (trusted
certificates) problems.

The cloud client program respects settings from three different
places, listed here in the order of precedence:

Commandline arguments - If the client uses one of the
optional flags listed in ./bin/cloud-client.sh --extrahelp,
these values are used. Many things can be overriden this way,
including the service contacts.

User properties file - An example of this was given
above (the cloud.properties
file which is currently distributed for the
Nimbus cloud).

Note that you can include different properties files and have your
users switch between clouds using
./bin/cloud-client.sh --conf ./conf/some-file.

If no --conf argument is supplied, the default file
cloud.properties needs to exist. If you need to change
this in your client distribution for cosmetic reasons, you can
do so by editing the one relevant line at the top of
./bin/cloud-client.sh

The plugins page discusses the
"groupauthz" plugin which provides for many generally useful policies to
be enforced, but one in particular is necessary for the cloud configuration
to operate properly. The identity-hash based image subdirectories option
ensures that propagation source paths and unpropagation target paths are
specific to the caller using the hashing algorithm discussed above.

The workspace-control user account is empowered to run all workspaces,
so this authorization of specific requests is necessary before the "enactment"
command is sent out to workspace-control, work done on behalf of the client
but importantly not as the client.

For the repository node you currently need
GridFTP to handle remote transfers.
Each cloud user's DN must be in the GridFTP grid-mapfile (an access control
list that also maps each DN to a specific unix account). In order to
prevent users from maliciously overwriting each others files when talking
to GridFTP directly, currently each cloud user must be mapped to a unique
unix account which is part of a unique unix group on the repository node.

Say that the base directory on the repository node is "/cloud", you will
need to create a directory for each DN based on the hash. It is recommended that you use
the cloud-admin program for this (see next section).

As of TP2.2, there is a program installed here:
$GLOBUS_LOCATION/share/nimbus-autoconfig/cloud-admin.sh

This program can add new users for you with one command, including creating
the directories with the right hash names.
During its first "add-dn" invocation, you can set up many
default choices including what "sample images" get soft linked to the new
directory, etc.