Welcome to the world of grid computing! This field combines
ideas from high performance computing, web services, networking, and computer
security in order to provide the user with a powerful, virtual super-computer
for running large workloads across heterogeneous systems and clusters. Grid
computing can offer a uniform interface that glosses over the multifarious
details of the distributed computing systems that are brought together to form
the grid. Users can submit their jobs to grid queues that will automatically
parcel out the workload to available execution services. User’s computing jobs
can rely on data located around the world, brought together by the uniform
filesystem view provided by the grid.

The Genesis II GFFS (Global Federated File System) is the
topic of this reference manual. The GFFS is part of the XSEDE project (http://xsede.org) and provides a globally
accessible grid filesystem as well as grid queues that leverage the high
performance computing infrastructure of the XSEDE project.

InstallationDescribes both graphical and command-line versions of installers for
both the grid client and container.

Grid UsageSurveys the basics of authentication and authorization in the grid, running
jobs on compute resources, exporting local file system paths to the GFFS,
copying data files into and out of the grid.

ConfigurationDiscusses the deployment of the root GFFS container, the deployment of
secondary containers, creation of Basic Execution Services to run jobs,
creation of grid queues, and establishing campus bridging configurations.

ManagementCovers creating users and groups in the grid, removing users and
groups, stopping and restarting containers, backing up containers, and
restoring containers from a backup.

TestingDiscusses how to create a small bootstrapped grid for testing, how to
run the GFFS test scripts in the GFFS Toolkit, and what results to expect
from the tests.

AppendicesContains a FAQ & troubleshooting guide. Also provides a
detail-oriented reference for extended deployment issues and other
configuration considerations.

This document is intended for the following classes of users
(also known as personas):

1.XSEDE
System Administrators

2.Scientific
Users

3.Campus
Grid Administrators

4.Grid
Testers

5.XSEDE
Developers

Membership in a particular user class does not necessarily
limit an individual’s interest in any of the information documented here. That
said, the Installation and Grid Usage chapters will be especially relevant to
the Scientific User. The Configuration and Management chapters will be of more
interest to the XSEDE System Administrators and Campus Grid Administrators.
Finally, the Grid Tester and XSEDE Developer personas each have a chapter
devoted to their particular viewpoint.

This document is a group effort. It incorporates text from
many contributors who, over an extended period of time, wrote various documents
about the Genesis II grid functionality. These contributors include:

Bastian Demuth

Daniel Dougherty

Ashwin Raghav Mohan Ganesh

Andrew Grimshaw

John Karpovich

Chris Koeritz

Duane Merrill

Mark Morgan

Michael Saravo

Karolina Sarnowska-Upton

Salvatore Valente

Vanamala Venkataswamy

Muhammad Yanhaona

Editor: Chris Koeritz

This omnibus document was originally accumulated and edited
for “XSEDE Activity 43 – Genesis II Documentation” during the spring of 2012.
Chris has served as the Omnibus editor for ongoing edits through XSEDE
Increment 5.

The HOME variable is expected
to already exist in the user environment; this points at the home folder for
the current user on Linux and Mac OS X. On Windows, the home directory is
composed of two variables instead: ${HOMEDRIVE}${HOMEPATH}

The JAVA_HOME variable is used to specify the top-level of
the Java JDK or JRE. This variable is not widely used in the Genesis II
software but may be used in a few specific scripts. If the “java” executable
is found on the application path, then JAVA_HOME is not usually needed.

The GENII_INSTALL_DIR variable is a Genesis II specific
variable that points at the top folder of the Genesis II software
installation. This variable is not needed by the Genesis II Java software,
although it may be relied on by some scripts and is used extensively in this
document.

The GFFS_TOOLKIT_ROOT variable points at the top-level of
the GFFS tool and test scripts within the Genesis II installation package. It
is also not needed by the Java software of Genesis II, but will be relied on
heavily within the provided tool and test scripts. It is established
automatically by the set_gffs_vars script (described below in section B.4.7).

The GENII_USER_DIR variable points at the path where client
and container state are stored. This is also referred to as the “state
directory”. This variable is used within the Genesis II Java software and by
many of the tool and test scripts. The variable is optional in general and
will default to “$HOME/.genesisII-2.0”. However, if a Genesis II client or
container is intended to use a different state directory than the default, then
the variable must be defined before the client or container software is
started. It is recommended that any non-default value for the variable be set
in the user’s script startup file (such as $HOME/.bashrc) to avoid confusion
about the intended state directory.

For users on NFS (Network File System), it is very important
that container state directories (aka GENII_USER_DIR) are not stored in an NFS
mounted folder. Corruption of the container state can result if this caution
is disregarded. To avoid the risk of corruption, the GENII_USER_DIR variable
can be set to a directory location that is on a local hard disk.

Throughout the document, we will often reference the “grid” command from Genesis II. It is shown as just “grid” in example commands, which assumes that the grid command
is in the PATH variable. The path can be automatically updated for Genesis II
GFFS by running a script included with the install called “set_gffs_vars”. For
example, this loads the important Genesis II variables into the current bash
environment:

source
/opt/genesis2-xsede/set_gffs_vars

The above command assumes an XSEDE
production grid installation of the Gensis II GFFS RPM file; other
installations may have a different install path. The above command loads
GENII_INSTALL_DIR, GFFS_TOOLKIT_ROOT, and other important variables as well as
putting the Genesis II grid command into the PATH. This statement can be added
to .bashrc for automatic execution in each bash shell if desired. There are
many other methods for getting the grid command into the path, including
Environment Module files, or even just adding the environment variables
manually.

To add the GFFS grid command manually, one can set the value
of $GENII_INSTALL_DIR and add it into the PATH variable:

A security feature that specifies a set of rights for particular
users. Any object stored in the GFFS has three ACLs (one each for read, write,
and execute permissions). Each ACL can have zero or more rights in the set.

BES Basic Execution Services

The component that offers computational resources to a grid. A
BES can accept jobs, run them on some resource, and then provide the job's
results.

CRUD Create Read Update
Delete

An acronym for the four most common file
operations.

EMS Execution Management Services

The general category for grid computation services. This is
implemented by the grid's available BES components, which can all be of
different types.

EPI EndPoint Identifier

A short unique pointer (across time and space) to an EPR (see
below). EPIs provide for a simple identity comparison, such that if object A
has an identical EPI to object B, then they are in fact the same object.

EPR EndPoint Reference

A pointer to a web-service, including network location (such as
URL), security policies, and other facts needed for a client to connect to the
service.

Export

To “export” a file system directory structure is to make it
available (subject to access control) to other users in the grid. One exports a
local rooted directory tree, e.g., sourceDir and maps it into a target
directory in the GFFS directory space, e.g., /home/Alice/project1/soureDir. The
files and directories in “sourceDir” are still accessible using local
mechanisms and are also accessible via the grid.

Export Implementation

The
realization of the export functionality in the Genesis II container, as
implemented in source code.

Export Owner

The local user at the SP or campus who owns the data being
exported.

Export Owner User ID

The Unix user ID (also called ‘account name’) of
the export owner.

FUSE mount File system in User SpacE

FUSE is a file system driver for Linux and MacOS that allows users
to define and write their own user space (non-kernel) file system drivers.
Genesis II has a grid-aware FUSE driver that maps the GFFS into the users local
file system using a FUSE mount.

Genesis II The Genesis System
version 2

A grid computing project developed at the University of Virginia.
Genesis II provides the GFFS component for XSEDE.

Genesis
II GFFS Container

The
Genesis II GFFS implements a “Web Services” container architecture. The
container is the process running the Genesis II source code in Java with which
clients interact. The container receives requests to operate on exported data,
and the container‘s export implementation carries out those requests (subject
to authorization).

GFFS Global Federated File System

The filesystem that can link together heterogenous computing
resources, authentication and authorization services, and data resources in a
unified hierarchical structure.

GFFS
Container User ID

The
container currently executes with a non-privileged Unix user id. For example, a
normal Unix account named ‘gffs’ might be used to run the GFFS container. In
the remainder of the document, the Unix user that is running the container will
be referred to as “GffsUser”.

GORM Genesis II Omnibus Reference Manual

This reference manual.

GIU Grid Interface Unit

A Grid Interface Unit (GIU) is the hardware component on which the
Genesis II container runs. The required elements of the GIU are defined in the
XSEDE Architecture Level 3 Decomposition document (L3D) in section 8.1.2.

IDP IDentity
Provider

A service that can create or authenticate user identities.

L3D XSEDE
Architecture Level 3 Decomposition document.

PBS Portable Batch
System

A queuing service for job processing on computer clusters. PBS
queues can be linked to Genesis II grid queues.

PKCS#12 Public Key Cryptography Standard Number 12

A file format for storing key-pairs and certificates with password
protection.

PKI Public Key
Infrastructure

The general category of all services that rely on asymmetric
encryption where a key owner has two parts to their key: the public part that
can be shared with other users, and the private part that only the owner should
have access to. Using the public key, people can send the owner encrypted
documents that only he can decrypt. The owner can also create documents using
his private key that only the public key can decrypt, offering some proof of
the document's origin. With this one essential feature of enabling
communication without giving away private keys (unlike symmetric encryption
algorithms), a number of important authentication schemes have been developed
(such as SSL, SSH, TLS, etc).

Principle
of least privilege

The term
was first coined by Jerome Saltzer, “Every program and every privileged user of
the system should operate using the least amount of privilege necessary to
complete the job.” Saltzer, Jerome H. (1974). What this means here is that
software that does not need root should not have it. And that if it does need
it, then it should have it for the least amount of time in the most
encapsulated way (sometimes called privilege bracketing.)

RNS Resource Namespace Service

A web services protocol that provides a directory service for
managing EPRs.

Root Squash

Because of
the way many early distributed file systems handled trust and authentication,
processes running as root on file system client hosts may not actually have
root privilege with respect to a network mounted file system. When root squash
is in effect, the network file server squashes, or ignores, requests that
arrive from clients asserting root privileges. This is done to prevent
compromised clients from attacking the file system with root privilege. Note
that root squash can be selectively applied by adding exceptions in
/etc/exports.

SSH Secure SHell

A terminal emulation program that allows users to connect to
remote computers while providing an encrypted communication channel that keeps
their passwords, command history, and so forth private.

SSL Secure Socket Layer

A protocol for connecting to a web service or web site using
encrypted transmissions. This protocol is considered deprecated now in favor
of TLS.

STS Secure Token Service

The STS offers a method for a user to authenticate against a known
service in order to log in to the grid. Configuring an STS is usually a task
for the grid administrator.

Sudo
privilege

In Unix
there are two important privilege levels, user and root. Root can do anything.
Processes with “user” level privilege have very limited capabilities. For
example, root can read and write any file, change user id to any user, change
file ownership, etc. Users cannot. Sometimes though one wants a user level
process to have enhanced capability, without having the infinite capability of
root. This is consistent with the principle of least privilege. To support restricted
(and tempory) extension of privilege a user may be given sudo (pseudo root)
privilege to execute certain commands as either root or as another user.

TLS Transport Layer Security

A protocol for connecting to a web service or web site using
encrypted transmissions. TLS is the more modern incarnation of SSL.

Trust Store A set of certificates that
are “trusted”

A trust store can be a file (or directory) with one or more
certificates that are trusted for a particular purpose. For example, in Genesis
II as of XSEDE Increment 1, there is a trust store in a PFX format file that
contains the certificates that a grid client will trust connecting to using
TLS. If a container presents an identity that is not present in the trust
store and which is not signed by a certificate in the trust store, then no
connection will be made.

UNICORE UNiform Interface to COmputing REsources

The primary EMS for XSEDE is provided by the UNICORE software, an
EU open source grid computing project initially funded by the German Ministry
for Education and Research.

This is in contrast to other security mechanisms, such as
the myproxy server that uses proxy certificates for authentication and
authorization. A survey of proxy certificates and related technologies can be
found here:

Genesis II GFFS is a standards-based web-services
application. The Genesis II installers can provide both the client-side and
server-side of the software. The server side of the web service is called the
container. There are interactive installers available for both client and
container, and the interactive installer can function with a graphical user
interface or in console-mode (text only). The former is intended for most
users, while the latter is intended for users who wish to script the install or
who do not have access to graphical capabilities during installation.

Genesis II is also available in RPM and DEB package formats
on Linux. Unlike the interactive installer, these installation packages are
installed by the system administrator once per host. All client and containers
configured by users utilize the same installation.

Currently, the container installation is available for 32-bit
and 64-bit Linux, and for 32-bit MS-Windows. Client installers are available
for 32-bit and 64-bit Linux, for 64-bit Mac OS X (Intel Platform), and for
32-bit MS-Windows.

The Genesis II GFFS software relies on the Java Runtime
Engine (JRE) and officially supports Oracle Java 8 (aka version 1.8). The
interactive installers include a recent JRE version, but the RPM/DEB packages
do not provide a Java JRE.

The average user who wishes to use the Genesis II container
or client does not need to have administrative access to the computer where the
installation will occur. In general, a user who has a home directory with
write access can just run the installer as their own personal identity, and
there are no special permissions required for running either the container or
the client on one's own computer.

In some institutional or corporate settings, administrators
may prefer to install the software at a single location per computer or per
network. This is also supported by the Genesis II installer (in both
interactive and Linux package formats). The container owner needs to perform
additional tasks to configure Genesis II, which are documented in the sections
below.

A common requirement is for the grid tools to be available
in the user’s application path. One solution for this is a TCL Modules file,
and a sample file is provided in the GFFS Toolkit in the folder called
“tools/genesis_module”. There are many other ways to address the path issue,
including modifying environment variables (per user or per system).

The next dialog allows one to choose the type of
installation, client-only or full GFFS container. Leave it at the default
choice of client-only if you do not need a container installation that will
provide GFFS services of your own.

Once the Genesis II software files are stored in the target
location, the GFFS software will be used to connect to the configured grid. If
the grid connection does not succeed and an error message is printed, please
refer to the FAQ, Section J, for possible solutions.

When the
installation is finished, the completion dialog is displayed.

The console version of the installer is available from the
same install program that does the graphical installs. On Linux, the
command-line version requires passing a '-c' flag to the installer at run time:

bash {installer
filename} -c

For MS-Windows, run the installer as an exe instead. This
assumes the user is in the same directory as the installer:

genesis2-gffs-linux64-v2_7_503.exe
-c

This will begin an interactive install process where prompts
are displayed to request configuration input. The same prompts shown in the graphical
install dialogs are shown on the console instead.

Interactive install process for the
grid client

$ bash
genesis2-gffs-linux64-v2_7_503.bin -c

Unpacking JRE ...

Preparing JRE ...

Starting Installer
...

This will install
Genesis II GFFS on your computer.

OK [o, Enter], Cancel
[c]

(hit enter)

Where should Genesis
II GFFS be installed?

[/home/fred/GenesisII]

(type a
different location or use suggested one, then hit enter)

Please Select the
Type of Install to Perform

Installing grid
deployment for Freds internal grid

This installation can
provide GFFS client-only support or it can function as

a GFFS container.
Which would you prefer to install?

Client-Only GFFS [1,
Enter], GFFS Client and Container [2]

1 (type 1 for
client install or just hit enter)

Extracting files ...

(…filenames
flash by…)

Connecting to the
Grid (slight pause occurs while connecting…)

Setup has finished
installing Genesis II GFFS on your computer.

Finishing
installation...

If it is important to automatically script the grid client
installer, one technique that can help is called a 'here document'. The here
document, denoted by the << below, answers the installer prompts using a
list of canned responses. The word ‘eof’ below is used to end the stream of
commands:

The port number is where the container will reside on the
current host. This port number should not be in use by any other service, and
it must not be blocked by a firewall. The hostname should be a publically
visible DNS name (or IP address, although DNS names are preferred). This host
must be reachable using that name from potentially anywhere in the world, or
the container will not be able to be linked into a grid.

After the web service configuration, the installer will
attempt to connect to the configured grid. Once this completes, the container
specific configuration continues with a dialog requesting to know which grid
user will own the container.

The user specified must be an existing user in the GFFS grid
in question (the location of which is packaged in the installer). If you do
not currently have a valid grid user, you will need to request one that can own
your container.

The grid user in question will completely “own” the container
and will be given full administrative rights. This allows the user to add,
configure and remove resources on this container. The grid user can also link
the container into the grid’s RNS hierarchy in locations where the user has appropriate
access rights.

After a valid grid user is provided, the installation offers
to generate certificates for the container or to let the user provide her own
certificate. This certificate is used for TLS (SSL) communication by the
container; all outgoing and incoming web service calls use this certificate for
identification and all encryption is done with the associated private key.

If the keypair generating service is used, as depicted, then
a certificate for TLS communication is automatically generated. In that case,
the next dialog requests to know the password for the generated TLS
certificate.

The default password for the TLS keystore is ‘container’, but
this can be changed as desired. After the TLS keystore and certificate are
generated, the installer finishes with the final dialog.

Alternately, if one chooses not to use the keypair generator,
one must supply a TLS keypair in PFX format that can be used for the
communication. The keypair dialog prompts for the keypair and supports
browsing for it on the local computer.

The console mode installation process for the container is
very similar to the client install. There are just a few more questions to
answer than for the client, all regarding the container configuration.

Interactive container install in console mode

(Installation is
shown after container install type is selected and files have been installed…)

Port Number

By default the XCG
container listens for incoming messages on TCP port 18443. You can override
this behavior here.

XCG Container Port
Number

[18443]

(select a port number
and hit Enter)

Specify the hostname
or IP address where your container will run.
Hostnames should be globally resolvable via DNS.

Host Name

[]

(type in the
world-reachable DNS host name for the computer and hit Enter)

Connecting to the
Grid

Owner Information

Please select a user
to manage the container

User Name

[]

(enter an XSEDE portal
id or other grid user name here)

This service will
generate and sign your container keypair with your supplied credentials

Use Grid Keypair
Generating Service?

Yes [y], No [n,
Enter]

(choose and
hit enter. Remainder assumes keypair was not generated.)

Select path for
container keypair (.pfx) to be used for this container (will be copied)

Keypair Path

[]

(enter the path to a
key-pair to use as the TLS key for the container)

Keystore Password

[]

(enter the key-pair and
keystore password for the pfx file; these must both be the same password to use
the pfx with the GFFS installer.)

Start Container
Service?

Yes [y], No [n,
Enter]

(hit Y and then Enter
to start the container)

Configuring Container

Preparing GFFSContainer
Script

Starting Container
Service

Setup has finished
installing Genesis II GFFS on your computer.

Finishing
installation...

Note that the same approach used for scripting the grid
client (a 'here document') can be used to script this install.

Genesis II installations provide a sample init.d-style
service script in a file called “GFFSContainer”. This file can be deployed on
some Linux systems in /etc/init.d to automatically restart the container. Once
the file is installed, the system administrator must set the script at an
appropriate “run level” for starting on reboot.

Users who wish to automatically start their personal
containers can do so with a “cron job”. This method usually does not require
administrator assistance. The Genesis II installation provides a script called
GFFSContainer which can be used to restart the service if the computer is
restarted or if the service inadvertently stops. The following is an example
cron job that uses the container restart script to launch the container if it
is not already running.

Cron has a different environment than your normal users, and
thus it is important to provide the state directory (GENII_USER_DIR) to the
cron job. Otherwise the default state directory ($HOME/.genesisII-2.0) will be
used.

The installation of one of the Linux-based packages is much
simpler than the interactive process, but mainly because the configuration steps
have been moved out to script-based process. This is necessary because the RPM
and DEB package formats are intended to be installed once per host, and shared
between multiple users for the software. In these package formats, all user
and container state must reside in the state directory (unlike the interactive
installation, where some of the configuration can reside in the installation
directory).

To install or upgrade the Linux RPM for the Genesis II GFFS,
use sudo (or login as root) to call the rpm installer:

sudo rpm -Uvh genesis2-xsede-2.7.503-1.x86_64.rpm

To install the Linux DEB package for the Genesis II GFFS,
use sudo (or login as root) and run the dpkg program:

sudo dpkg -i genesis2-xsede-2.7.503-1.x86_64.deb

Each of these actions will install the Genesis II GFFS
software to “/opt/genesis2-xsede” by default when using the XSEDE production
grid install package. Installers for other grids will follow a similar form.
For example, the European Grid (GFFS.EU) which will install to
“/opt/genesis2-european” and the XCG installer installs to “/opt/genesis2-xcg”.

To install to a different location when using RPMs, add the “prefix”
flag to the command:

sudo rpm -Uvh --prefix
{new-location} genesis2-xsede-2.7.503-1.rpm

To uninstall the RPM or DEB package, use the appropriate
package manager’s removal procedure:

sudorpm -e genesis2-xsede

or

sudo apt-get remove genesis2-xsede

If needed, the RPM install can be forced to upgrade a
package with identical version information despite already being installed:

rpm -Uvh --force genesis2-xsede-2.7.503-1.rpm

The process of configuring container installations and
converting older installations is documented in the following sections. The
configuration scripts documented below can also be used with interactive
installs (on Linux only), which is especially useful when those are installed
by the root user for host-wide usage.

Multiple containers can be configured on a host using the
system-wide RPM or DEB package for Genesis II. This poses an issue at upgrade
time, since the running containers will become unavailable when the Java jar
files and configuration directories are replaced. The system administrator may
want to institute a procedure for alterting users to shut their containers down
before the installation and to restart the containers again afterwards. An
alternative is to require users to register their container installations in a
way that allows a site-implemented, sudo-based process to automatically stop
all of them before the installation and start them again afterwards. A
mechanism for automating this process may be developed in a future release.

The interactive installers provide what is termed a “Split
Configuration” installation mode, where the container configuration partially
resides in the installation folder itself. In the newer “Unified
Configuration” mode, the client-specific and container-specific configuration
is stored entirely in the state directory. This is a more flexible
configuration, which can operate based on the RPM/DEB packages as well as on
the interactive installer. The following sections describe the procedures used
for managing containers with the Unified Configuration.

In general, the Unified Configuration is the most useful on
Linux when the RPM or DEB package is installed. However, these same approaches
can be used directly on Mac OS X also. On MS-Windows, using a Linux
compatibility such as Cygwin is required (see section I.1.3 for more
information on Cygwin).

In all of the Unified Configuration scripts documented
below, the environment variables GENII_INSTALL_DIR and GENII_USER_DIR must be
set. The former variable specifies the install location (such as /opt/genesis2-xsede),
and the latter specifies the state directory for the container (such as $HOME/.genesisII-2.0).
The install directory and the state directory do not need to exist before
running the installer, but the two environment variables must be established.

To configure a new container, first install a version of the
Genesis II GFFS that provides the Unified Configuration (2.7.500+). Run the
configuration script with no parameters to get full help instructions:

bash $GENII_INSTALL_DIR/scripts/configure_container.sh

The instructions provided by the script should be complete,
if a bit terse. This section will explain some finer points of the required
parameters, but the script’s built-in help should be consulted as the most
authoritative and up to date reference.

There are potentially six parameters for the script, and it
requires at least five of these. They are:

1.The
container host name. This is the globally visible name at which the new
container can be reached over the internet. It is alright for this to be an IP
address, although a textual host name is preferred.

2.The
network port number where the container service provides TLS connection
services. This port number must not be blocked by a firewall. The GFFS
container must also be the only user of this port number.

3.An
existing grid user who will be given complete control over the new container.
This should be your grid user name, and it can be an XSEDE-style
MyProxy/Kerberos user or a GFFS X509 style user. If you do not have a user
name yet, contact your grid administrator to acquire one. In some rare cases,
the grid administrator may provide a special user name that will own your
container, rather than your own grid user.

4.The
keypair file in PFX format (that is, PKCS#12) which holds the TLS certificate
and key-pair that the container will use to communicate over the internet.
This can either be an already existing PFX file or it can be the word
“generate”, which will cause a new TLS keypair to be created by the grid’s
certificate generator (where available).

5.The
password on the keystore itself. This password is used to unlock the keystore
and get at the keypair inside it. If parameter 4 was “generate”, then this
password will be used to secure the newly generated keystore.

6.A
password for the TLS key within the keystore. This is optional and will only
be necessary when the keystore and key password differ. This parameter is not
used for the “generate” keystore option.

Note that the script will produce diagnostic output during
configuration which can include passwords, so it may be wise to run “clear” or
“cls” in that terminal afterwards.

After running the configuration script with the appropriate
parameters, the container’s configuration will be built in the GENII_USER_DIR
directory. The script prints out a command that will start the container
running. For example, the new container might be started up using the default
RPM package location:

/opt/genesis2-xsede/GFFSContainer start

After launching the container, its output can be watched
with the “tail” command (assuming the default logging location):

tail -f $HOME/.GenesisII/container.log

If that shows no errors, then the container is now
configured and could be linked into the grid provided by the installer (see
Section F.3.1 for more details about linking the container).

Users may want to free themselves from the Split
Configuration mode after they have previously configured a container with the
interactive installer. Typically, this will involve installing an RPM or Deb
package to provide the new installation. The existing container can be
converted into the Unified Configuration mode with a provided script, which
will acquire configuration items from the interactive installation (which must
still exist at conversion time). To see the built-in help for the conversion
script, run the following:

bash
$GENII_INSTALL_DIR/scripts/convert_container.sh

This will show the required parameters and some example
execution sequences. This script is considerably simpler than the configure
script (last section), as all of the configuration information should already
exist and just needs to be extracted from the old installation directory.

It is important to back up the container state before the
conversion process, in order to defend against any unexpected problems during the
conversion. Both the installation directory (pointed to by the
GENII_INSTALL_DIR variable) and the state directory (specified by the
GENII_USER_DIR environment variable or residing in the default location of
$HOME/.genesisII-2.0) should be archived. For example, this will create an
archive of both directories, assuming the environment variables are set:

tar -czf container_backup.tar.gz
$GENII_INSTALL_DIR $GENII_USER_DIR

The most common way to run the container conversion script is
to migrate an old interactive installation to using the RPM/DEB package
format. It is important to fix the GENII_INSTALL_DIR to point at the newer
install location before running the convert script, e.g.:

export GENII_INSTALL_DIR=/opt/genesis2-xsede

bash $GENII_INSTALL_DIR/scripts/convert_container.sh
$HOME/GenesisII

The script will produce diagnostic output during the
conversion which can include passwords, so it may be prudent to run “clear” or
“cls” in that terminal afterwards.

It is possible to convert to a Unified Configuration even if
there is only one installation of the newer interactive installer (e.g., if the
old installation was upgraded in place). In this situation, pass the current $GENII_INSTALL_DIR
as the parameter to the script.

After the conversion script has run successfully, the
container’s configuration will be unified under the state directory. The older
interactive installation can be removed, and the container will rely on the new
package location for the GFFS software.

During the execution of the script, you will be offered a
chance to create a copy of your deployment folder from the old installation.
This is only necessary if you have manually modified the deployment, or if the
deployment is non-standard. This is true for upgrading a source-based container
to use an RPM, which is further documented in section D.8.7.

After converting a container to the Unified Configuration,
it is sometimes necessary to adapt to changes in the installation location.
This may occur is if the container was initially converted from an older
interactive install to the newer interactive install, but then later the RPM
install is used instead. The install also might need to change locations due
to organizational or hardware changes.

In these cases where there is no other configuration change
required for the container, the location can be fixed with the “update_install_location”
script. Running the script prints out the built-in help:

bash
$GENII_INSTALL_DIR/scripts/update_install_location.sh

This is a very simple script. The GENII_INSTALL_DIR should
point at the new install location, and the older location is passed on the
command line. Below is an example of switching to the RPM package as the new
installation source, after having previously relied on the interactive
installation to support the container’s Unified Configuration.

# if the old installation is still active,
stop that container…
$GENII_INSTALL_DIR/GFFSContainer stop

# update the installation directory
variable to the new path…
export GENII_INSTALL_DIR=/opt/genesis2-xsede

A GFFS deployment provides the information needed to connect
to a grid, such as the grid location on the internet and the associated
certificates for that grid. Occasionally some characteristics of the grid deployment
are updated, and these are pushed out in a new deployment package or in a new
installer. For containers with a Split Configuration mode that are set up by
interactive installers, this usually poses no problem, as the installer can
update the deployment when the new version is installed. But containers with a
Unified Configuration are more independent from the installation directory and
are not automatically updated to the latest deployment. This is a consequence
of the RPM/DEB installation model, where the root user installs the package,
but many other users can base their container on the installed package. These
types of containers require a deployment update in order to use the latest grid
deployment.

If you have just updated your Genesis II installation by
using a new Linux package or installer (on any supported platform), then it is
important to update your container’s state directory by following the steps
below. As always with the unified configuration model, it is crucial that the
GENII_USER_DIR variable is set to the container state directory before managing
the container. Given the appropriate GENII_USER_DIR, the “deployment updater”
script does not take any parameters on the command line, and it can be started
using these commands:

The script will automatically find the deployment
information in the installation directory and update the container state
directory in GENII_USER_DIR to reflect the latest deployment information from
the system-wide installation.

If one is using a specialized deployment, then the current
deployments folder can be pointed at by the “$GENII_DEPLOYMENT_DIR” variable.
If that variable is not set, then the deployments folder falls back to the
default of “$GENII_INSTALL_DIR/deployments”. The use of a GENII_DEPLOYMENT_DIR
variable is uncommon but useful if one’s deployments are not located under the
GFFS installation directory.

There are two methods for using a different deployment than
the deployment provided by the Genesis II install package.

The first method is to set the variable GENII_DEPLOYMENT_DIR
in the environment before starting the container. This causes the container to
use that folder as the root of the deployments hierarchy, rather than the
default of $GENII_INSTALL_DIR/deployments.

The second method is to store the specialized deployment
hierarchy in a folder called “deployments” under the container’s state directory
(in $GENII_USER_DIR). If the container finds a folder named “deployments” in
its state directory at start-up, then it will use that one instead of the one
stored in the installation directory.

The order of precedence for finding the deployment folder is
first to check the GENII_DEPLOYMENT_DIR variable, then to look for
“deployments” in the container state directory (GENII_USER_DIR), and finally to
look for deployments under the GENII_INSTALL_DIR.

The unified configuration mode for the installer provides a
method for overriding values that were previously always provided by the
installed deployment. This allows all of a container’s unique information to
be managed in the container’s own state directory.

The unified configuration adds these files and directories
to the state directory:

installation.properties

certs/

webapps/

wrapper/

deployments/ (optional)

D.8.6.1.installation.properties file

The installation.properties file provides override values
for configuration properties that are otherwise provided by the “configuration”
directory of a deployment. This includes the files “security.properties”,
“server-config.xml” and “web-container.properties”. The following is an
example of a real “installation.properties” file for a container that relies on
the installed deployment:

Note that there will be significantly fewer fields if the
container installation carries its own “deployments” folder in the state
directory. In that case, the security properties come from the deployments
folder rather than the installation.properties file.

As the above shows, the installation.properties is formatted
as a java property file, and provides “name=value” definitions of variables.
Each of the above entries corresponds to a setting that would otherwise have
come from the deployment’s configuration files.

Generally this file should not be hand-edited, but that is
always an option if additional overrides are needed or if values must be
corrected to adapt to changes.

D.8.6.2.certs directory

This directory is used to store container specific
certificates and Kerberos keytab files for authentication and authorization.
It has a structure mirroring the “security” folder from the installed
deployment, and thus can contain a “default-owners” and a
“trusted-certificates” directory.

The container configuration and conversion scripts
automatically store the container’s certificate files in PFX format in this
directory when using the unified configuration mode.

D.8.6.3.webapps directory

This directory supports the Apache Axis web services
software and provides a storage place for temporary files.

D.8.6.4.wrapper directory

Used by the Java Service Wrapper for the container’s service
management. This provides the wrapper configuration file in “wrapper.conf”.
It also is the location where the service wrapper will track the container’s
active process id in “GFFS.pid”.

D.8.6.5.deployments directory

If a directory called deployments is found in the state
directory, and there is no GENII_DEPLOYMENT_DIR environment variable established,
then this folder is used as the deployments folder, rather than the default of
$GENII_INSTALL_DIR/deployments. The convert_container script offers to create
this directory (as a copy of the previous installation’s deployments folder)
during conversion.

Converting a container that is built from Genesis II source
code is a special case of the conversion process in Section D.8.2. This usually only applies to the bootstrap container for a grid, or to experimental
containers used by developers. For these cases, the conversion script should
perform the proper actions, but there are a few important choices to make
during this process.

To convert the source-based container, follow the steps
described above in Section D.8.2 to convert the source folder from “split
configuration” to “unified configuration”, but with the following additions:

1.If
the source-based container is still running when executing the convert_container
script, then the script will show text regarding “There are still Java
processes running…” If the script finds any of these processes, then answer
“Y” to the question of whether to shut them down. This will only stop Java
processes that are detected as running Genesis II containers or clients. Care
should be taken if the same user account is running more than one Genesis II
container; in that case, stop the source-based container manually.

2.When
the convert_container script asks whether to copy a specialized “deployments”
folder, tell it to do so by answering “Y”. This is crucial for a root
container's specialized deployment to be preserved and is also needed in cases
when the deployment generator was used to create the deployments folder.

Both of these choices can be automated by using optional
flags to the convert_container script, as in the following script execution
example (replace the path for {genesis2-trunk} with your container’s source
code location):

The “stop” phrase will cause any Genesis II Java processes to
be stopped. The “depcopy” phrase causes the deployments folder to be copied
from the installation directory into the container state directory.

After the conversion is successful, the source code should no
longer be needed to run the container, and it can be removed.

This section describes how to get computational work done
with a grid based on Genesis II GFFS software (such as the XSEDE and XCG
grids). It is assumed that the grid is already configured, and that the user
has already been issued a grid user account by the grid administrator.

Genesis II has built-in help available for most commands. The
command grid help prints a list of the available commands. Additionally, each
individual grid command has a short help description for usage and also a
longer man-page style description.

Running the grid command requires having previously loaded
the appropriate Genesis II environment variables, as per section B.4.7. The required command looks like this (using the bash shell on any supported
platform):

source
/opt/genesis2-xsede/set_gffs_vars

After the grid command is available in the path, the built
in help can be accessed:

In the grid, a user's capabilities are based on who they are
and what they've been granted permission to do. Authentication is the process
that a user goes through to show who they are, at least in terms of an identity
that the grid will accept. This proof is limited; the user has merely
presented a certificate or a valid login that the grid recognizes. It is not
proof that the user actually is a particular person; it just proves that she
possesses the credentials associated with that person.

On the other hand, authorization is the full set of
capabilities that specify what a particular identity is allowed to do. In the
case of the GFFS, the user's authorization is specified by access control lists
on resources that the user has the right to use in some particular manner. For
example, the user may have authorization to submit a compute job to a
particular queue.

The following sections detail the processes of grid
authentication and grid resource authorization.

Genesis II uses what is termed a “credentials wallet” to
store user identity for grid operations. The wallet contains all the
identities that a user has “authenticated” with the grid using a supported
protocol, such as by providing a username and password, or by logging into a
Kerberos domain.

Users may require a collection of identities for their work,
rather than just one. For example, the user may have allocations at a
Supercomputing Center as well as having a local campus identity. The
credentials wallet allows the user to present all of her valid identities with
a single login.

A grid client instance that is not connected to a grid
container initially has no identity at all. As part of making the secure
connection to a grid container, the client creates a self-signed certificate to
represent its own identity. Upon attempting to connect to a grid container,
the grid client examines the identity of the container and compares it with the
client's own “trust store”. The trust store is a set of server certificates
that the grid administrator has instructed the client to “trust”. Trust here
just means that the client will connect to containers that identify themselves
via one of those certificates, and it will not connect to any containers that
are not in the trust store. More details about the trust store are available
in the section on GFFS Deployments.

# show the initial
certificate on a client that has never
# authenticated as a user before.
grid whoami

When the client has no previously cached identity, this
command shows just the certificate that the grid client created to represent
its side of the secure TLS connection. This is an example of the “whoami” output
for a client in that state.

Once the client has decided to trust the container (and
possibly, based on configuration, the container has made a similar decision to
trust the client), the secure TLS connection is made and services can be
requested by the grid client. The first of the requested services is generally
a login request, because the client must authenticate as an identity of some
sort to obtain any authorization for grid resources. Different methods for
logging in are discussed in the next section.

Genesis II supports a variety of authentication mechanisms,
including username & password, Kerberos, MyProxy, InCommon, and direct use
of a key-pair. Each of these methods may be appropriate for a different
reason. Thanks to the credentials wallet, the user does not need to pick just
one approach, but can attain whatever collection of identities that are needed
to get the work done.

Although it may seem counter-intuitive to log out before
having logged in, this can be done and is not a null operation; logging out
always clears at least the self-signed client certificate. If the user had
previously authenticated to any grid identities, those identities are dropped
as well.

# logout of all
identities.
grid logout --all

It is possible to log out of just one identity by specifying
its “alias” on the command-line. Identities each have a unique alias name, and
the alias is shown in the whoami listing.

The grid's identity provider (IDP) supports standard
username and password authentication for users to log in to the grid. The
username and password in question must already have been set up by the grid
administrator. To log in with a grid user identity, use:

grid login
--username={drake}

In a graphical environment, this will pop up a dialog for
filling in the password. In a console environment, there will be a prompt
asking for the password at the command shell.

Note that the password can be included in the login command
if it is absolutely required. This may be needed for scripting a grid login,
but it is not generally recommended because the password will be visible in
script files or in command history:

For users to log in using a Kerberos STS, the STS must
already have been created according to the instructions in the section “Using a
Kerberos STS”. Once the Kerberos STS exists, users can log in with the
following command:

In some cases, user identity may need to come from a
key-pair stored in a file. This is often the case when a user needs to
authenticate as a grid administrator. It is also possible that a key-pair will
be issued by a resource owner to control access to the resource. In order to
obtain authorization on that resource, merely being logged in as a known grid
user would not suffice and the user must add the key-pair credentials to the
wallet.

To authenticate using a
keystore file (such as a PKCS#12 format PFX file):

# using a keystore on
a local disk.
grid keystoreLogin local:{/path/to/keyFile.pfx }

# or using a keystore
in the grid.
grid keystoreLogin grid:{/home/drake/keyFile.pfx }

The xsedeLogin command is a special
purpose login for users of the XSEDE grid. It authenticates to the XSEDE
Kerberos server and the XSEDE MyProxy server in order to obtain both types of
identities for grid services. It is very similar to the simple login command

In the case of the XSEDE-style login, there is no
self-signed certificate for the client. The client's identity is instead
dependent on the Kerberos authentication using a real XSEDE portal ID for
login.

The iclogin command uses the Enhanced Client or Proxy
protocol (ECP) to authenticate to an InCommon identity provider (IDP), and then
use that authentication to acquire grid credentials. Any of the previous STS
types may be the target of an InCommon login, as long as it has been set up
according to the section “Setting up an InCommon STS” (Section G.1.11).

Once the InCommon STS link exists, users can log in with the
following command:

grid iclogin

There are five parameters to log in using InCommon:

1.The
URL of the IDP's ECP service endpoint,

2.The
user id and

3.The
password for the user at that identity provider, and

4.(optional)
An SSL public/private keypair and

5.(optional)
An associated SSL certificate signing request (CSR).

In a graphical environment, dialogs will be displayed to
retrieve these parameters. In a console environment, the user will be prompted
in the command shell. Alternatively, all of these parameters, or any subset may
be specified at the command line, such as follows:

If the user does not wish to specify an existing SSL
keypair, a new keypair and CSR will be generated by the client. If the user
does specify a keypair file, he may also choose to provide a CSR as well or
have one generated which contains the provided public key.

The iclogin tool uses the InCommon authentication service at
CILogon.org to generate an authentication request for the provided or generated
CSR, forwards the request to the selected IDP with the provided credentials for
a signed assertion of identity, and then returns the assertion to CILogon.org
to retrieve a X.509 certificate. As in the xsedeLogin, the self-signed session
certificate is discarded, and the certificate from CILogon.org becomes the
current client session certificate. Finally, the iclogin tool contacts the STS
corresponding to the InCommon credentials provided to acquire additional grid
identity certificates, which are delegated to the CILogon.org session
certificate.

Upon authentication, the user may perform all actions she is
“authorized” to perform. In Genesis II, authorization is implemented using a
technique called Access Control Lists. Every resource in the Genesis II GFFS
has three access control lists which are called Read, Write, and Execute ACLs.
Each type of ACL can have from zero to an arbitrary number of grid identities
listed. This associates the decision-making information about whether a
resource is accessible onto the resource itself, rather than associating it
with a user or a group (as might be done in a capability model rather than an
ACL model).

There are a few generally applicable attributes for the
Read, Write and Execute ACLs, but specific resources can vary how these ACLs
are interpreted. In general though, Read access grants a user identity the
right to see a resource. Without Read access, the user cannot even list the
contents of that resource in the GFFS.

Generally speaking, Write access often is considered to
grant administrative access to the resource. For example, a queue that lists a
user X in its Write ACL is granting user X the right to completely control the
queue, even to the extent of removing queued jobs of other users or changing
the properties of the queue.

The general interpretation of the Execute ACL is to make a
resource available to a user for whatever primary purpose the resource
provides. For example, a user with Execute access on a queue is allowed to
submit jobs to it, and to cancel her own jobs. That user cannot however manage
the jobs of other users or change the attributes of the queue.

Genesis II provides two ways to display the ACL lists for a
resource: the console grid client and the graphical client UI. The graphical
client provides a summary of the permissions for user ids, whereas the console
client displays the full authorization data (including the EPIs that uniquely
describe user identities in the ACLs).

You can show the authorization information for any resource
in the GFFS using the grid authz command.

Note that this particular resource allows “everyone” to read
it. This is often the case for top-level GFFS folders and other assets that
are part of the “grid commons” available to all users. Also of interest are
the EPIs (listed after urn:ws-naming:epi: that
uniquely specify a particular grid identity.

To use the client-ui for viewing ACLs, launch the client (grid client-ui) and navigate to the file or directory of
interest in the RNS Tree. Once an item has been selected (by left-clicking
with the mouse), the ACL pane on the right will show the Read, Write and
Execute permissions for that resource.

For ByteIO files and RNS directories in the GFFS, the write
permission simply indicates that a user can change the contents of the file or
directory. The execute permission is not really used internally for files and
directories, but could be set for use within FUSE mounts (to make a grid file
executable when mounted on a Linux filesystem).

Having write permission on BES resources indicates that the
user is an administrator of the BES. Having execute permission gives the user
the ability to directly submit jobs to the BES. Queues also need execute
permission on the BES before they can successfully submit jobs to it.

Having write permission on an IDP or other STS object in the
GFFS indicates that the user is an administrator of that particular entry (but
not necessarily of the server providing security services). Having execute
permission enables a user to behave as “a member” of an IDP, which is
especially relevant for users being members of groups.

Data files that feed into computational results are an
integral component of any grid computing software. Genesis II provides a
variety of methods for specifying the locations of data files. Most jobs can
rely on stage-in and stage-out files that are available via the GFFS. This
section describes a number of methods for loading data into, and retrieving data
from, the GFFS.

The need to access data files arises when a user's job needs
input files for computation and when the job produces output files. There are
three main approaches for copying resources in and out of the GFFS: using the
command-line grid client, using the graphical grid client, and using a FUSE
mounted filesystem.

Similar to cp in the UNIX operating system, the grid’s cp
command can copy the contents of multiple source files and directories to a
target location. The source files can be any mix of local and grid locations.
The target must be a directory, unless the source is a single file to copy to
another location.

# copy a file from
the local filesystem.
grid cp local:/home/drake/File1.txt grid:/home/drake/File2.txt

# copy a grid file to
a local file.
grid cp grid:/home/drake/File2.txt local:/home/drake/File1.txt

# copy a folder from
the local filesystem to the grid.
grid cp –r local:/home/drake/myDir grid:/home/drake/newPlace

# copy a folder from
the grid to the local filesystem.
grid cp –r grid:/home/drake/myDir local:/home/drake/newPlace

Note that many commands, such as cp, assume the “grid:”
prefix if is not provided. For local paths, the “local:” prefix (or the
synonym of “file:”) must be used.

The grid client-ui tool has recently been updated for a
variety of methods of copying data files, including drag&drop
functionality. These may be helpful for users more familiar with graphical
user interfaces.

To copy files into the grid with the client-ui, first start
the GUI:

grid client-ui

When the graphical client is running, a window similar to
the one below is displayed. The window shows a view of the grid filesystem
(labeled as RNS Space) and a view of the ACLs for the object currently focused
in the tree.

The client-ui supports dragging and dropping files into the
grid using the standard file browser application for the user’s operating
system. On Windows, Windows Explorer (explorer.exe) is the recommended
browser, and on the Mac, the Finder is recommended. For Linux, the Nautilus or
Konqueror applications can be used for file browsing.

Once the file browser has been opened, one performs drag and
drop copying by dragging the file or directory of interest out of the file
browser and into the grid tree (in the RNS Space tab of the client-ui) at the desired
location. A progress dialog will open and show as the files and directories
are copied.

The grid client-ui can also copy files to the operating
system's file browser via drag&drop. In this case, the user drags the file
or directory of interest from the RNS tree view in the client-ui into the desired
folder in the file browser.

There is an important caveat for dragging files out of the
grid. Drag&drop defines that the drop may only occur when all the files to
be dropped are available locally. In the case of the grid’s client-ui, making
the files available locally involves copying them to a temporary location in
the local filesystem. Once copied, the files can be dropped into the desired
location.

This impacts the behavior for drag and drop significantly.
The user must wait until the icon changes to the operating system’s “drop okay”
icon before letting go of the mouse. If the contents to be dropped are
sizeable, then the copy process can take quite a while, and the user must hold
the mouse button down that entire time. In the case of larger transfers, it is
recommended to use the “Save To” technique from the next section instead of
drag&drop.

Due to the potential for large data files to cause
unacceptable delays in a drag&drop operation, the grid client provides
another method to copy files and directories in and out of the grid. This
feature is used by right-clicking on a grid path (e.g. a directory) that is to
be copied and selecting either the “Copy to Local File System From GFFS” or the
“Copy From Local File System to GFFS” option. The former will open a directory
browser for the local file system. The user selects the target location and
hits “save”. When copying to the GFFS a GFFS directory browser is opened and
the user selects the target location in GFFS. When the target location is
selected, a dialog opens and shows the copy operation’s progress.

The advantage of this feature is that the contents do not
need to be copied locally before the operation can be started, unlike
drag&drop. The user simply selects where the data files should be saved,
and the client-ui manages the copying process after that point.

Directory Operations

When a directory is highlighted, the follow options are
available from the drop-down Directory Menu:

FUSE is a method for mounting the grid filesystem onto a
local path, so that a portion of the grid namespace is available on the user's
computer. This enables the user to copy data to and from the mounted grid
directory as if it were present in the local filesystem.

Creating a FUSE mount is detailed in the next section. But
using a FUSE mounted GFFS to copy data files is very simple. Assuming the grid
has been mounted at /home/drake/gridfs, the
following will copy a directory tree in or out of the grid:

# copy a directory
hierarchy up into the grid.
cp -r {/a/directory/tree/} {/home/drake/gridfs/home/drake/newDir}

Note that when the gridfs is mounted at the root folder of
the grid, the extra /home/drake path is necessary to get down to the user's
home directory.

# copy a hierarchy
down from the grid to local filesystem.
cp -r {/home/drake/gridfs/home/drake/toCopy} {/local/path/for/directory}

Note that the commands above use just cp
and not grid cp, because in these cases the
operating system’s native copy command is used.

The GFFS provides a feature called “exports” for sharing
data into the grid. Exports allow data to reside on one’s own machine, but be
shared with other users and used as staging data for job processing. This may
be very helpful for large data sets, where one does not want to make a
secondary copy of the data; the original data can be served on demand within
the grid.

A simple export command to share a path under one’s local
home folder might resemble this:

In the above, my local path on the Mason machine is “/home/xd-fred/myData”. This folder will show up in the GFFS
grid at the path “/home/xsede.org/fred/mason-data”. This is relying on a
container that is already established at Mason and which is linked in the grid
at “/resources/xsede.org/mason.iu.xsede.org/containers/mason-gffs”.

The GFFS exports feature is supported
by two different web services with varying properties and is a fairly large
topic. The exports feature is covered in detail in Appendix M.

Genesis2 provides a technique for mounting a portion of the
grid namespace onto a local computer. This relies on the FUSE subsystem, which
allows user-space drivers to manage filesystems, rather than needing the kernel
to manage the filesystem. FUSE enables the user to copy files in and out of
the mounted directory as if it were simply another directory in the local
filesystem.

To fuse mount the top level of the GFFS onto a local path:

grid fuse --mount
local:{/local/path} &

This makes the root folder of the GFFS available as the
local path specified.

To fuse mount a specific folder in the GFFS locally, use the
“sandbox” flag.

grid fuse --mount
--sandbox={/path/in/grid} local:{/local/path} &

The “--sandbox=X” portion of the command specifies where the
fuse mount should be rooted in the GFFS RNS tree.

After the fuse mount is created, the user can copy files
using the /local/path. Most file and directory operations provided by
the operating system can be used on the contents of the path.

E.3.3.1.How FUSE Mounts Are Different From Unix Filesystems

The FUSE mounted grid filesystem does not behave exactly
like a standard Unix filesystem. It does support most standard operations (copying
files & directories, deleting them, and so forth), but there are a few
caveats described in the next sections.

One important distinction is that the Genesis II FUSE
filesystem does not currently support overwriting a directory with a move (mv)
operation. Due to the GFFS representation of files and directories as EPRs,
the meaning of substituting out an RNS folder in that way is not well defined.
Genesis II requires that a directory can only be moved onto a target location
in a FUSE mount if that location does not already exist. This may require some
special treatment in scripts using FUSE such that the existing directory is
deleted before a directory with the same name is moved into that location.

The standard Unix filesystem feature of symbolic links does
not operate as expected inside of FUSE mounts. This is due to the basic
difference in mechanisms providing the filesystem between the Unix local
filesystems and the mounted grid filesystem. Links do exist in the grid, but
they are an entirely different creature from the filesystem symbolic links.

Due to that implementation difference, making a link from
the FUSE client side between grid asset A and grid asset B will not work. Linking
local asset A into grid asset B also will not work, because the grid still does
not interpret a symbolic link properly in the FUSE mount. But it is possible,
however, to link from grid asset A in a FUSE mount into a local filesystem
asset B. Asset B will remain usable as long as the FUSE filesystem is mounted.

Another important distinction between Genesis II FUSE
filesystems and the standard Unix filesystem is that not all permission
attributes are used. In the standard filesystem, permission attributes are
usually structured as User/Group/Other triples of Read/Write/eXecute ACL
settings (e.g. rwx|rwx|rwx for user|group|other). These control what the user
owning the file can do to it, what other members of the file's group can do to
it, and what the general populace can do to the file.

In Genesis II FUSE, the “group” RWX is not used at all. The
group portion of ls listings will always show up
as '---' for the group portion. This is due to the different interpretation in
Genesis II of groups versus the Unix interpretation. Group access control is
managed uniformly with user access control in Genesis II.

The “other” portion of the permissions is also slightly
different. Genesis II uses the other permissions to describe the rights for
“everyone” on the file, so that is quite similar to the Unix interpretation.
But Genesis II only allows the permissions to be changed if the user who
mounted the grid with FUSE has write permissions on the file, whereas merely
being the file's owner enables changing permissions in Unix file systems.
Because of this difference, users should never take away their write
permissions on their own files and directories in FUSE mounts, or they lose the
ability to give write permissions back again.

Many compute jobs can rely directly on the GFFS for staging
data files. However, there are cases where the data must remain at the original
location rather than being copied to or exported from the GFFS. For these
cases, the grid’s job-tool application supports other stage-in and stage-out
server types. These types include using web servers, ftp servers, and
ssh-based servers (with either scp or sftp protocol) for staging in data
files. These types also support data file stage-out except for web servers,
which only support data file stage-in operations.

More information about creating JSDL files is available in
the section on Submitting Jobs.

The Genesis II software offers a number of methods for
issuing commands to the grid. One method is to run the grid client program
(called “grid”) and enter commands manually or via a script. Another method to
issue commands is to write an XScript file with grid commands in an XML format.

There are quite a few commands available to users in the
grid client. A list of the available commands can be printed by issuing the
command grid help. Many of the commands will be
familiar to Unix and Linux users, but some are very specific to the Genesis II
grid.

Before discussing the various ways commands may be executed
through the Genesis II client interface, it is important to understand the
distinction between local resources and grid resources. The grid client can
perform many analogous commands on grid resources (like ByteIO and RNS
services) and local resources (files and directories). For example, the catcommand,
which is used to output the contents of a ByteIO resource, can also output the
contents of a local file. Similarly, using the ls
command on an RNS service will list the RNS entries contained by that service,
while that same ls command used on a local
directory will list that directory's contents.

Distinguishing between grid and local resources is
accomplished by prefacing the path of the resource with a prefix to denote its
location.

For resources on the local system, preface the path (in the
local file system) with local:or file:, as in the following example:

ls
local:/home/localuser

This will cause the ls tool to
list the contents of the directory /home/localuser
on the local file system. The prefixes local:and
file: are interchangeable; that is, they have the same semantic
meaning, and users may use either or both according to preference.

For resources in the grid namespace (the GFFS), preface the
RNS path with grid: or rns:,
as in the following example:

ls
grid:/home/griduser

This will cause the ls tool to
list the contents of the RNS entry /home/griduser
in the grid namespace. As with the local equivalents, the prefixes grid: and rns: are
interchangeable; that is, they have the same semantic meaning, and users may
use either or both according to preference.

Some commands available to the grid client require multiple
arguments, and in such cases it may be useful to mix grid and local resource
prefixes. For example, suppose the user wishes to copy a file example.txt from the local file system into the grid,
creating a new ByteIO resource with the contents of that file. The cp command can be invoked for this purpose as follows:

This will instruct cp to copy the contents of /home/localuser/example.txt on the local file system
into a grid ByteIO resource named example-grid.txt
listed in the RNS resource /home/griduser. The
semantics of the command will adjust to reflect the locations of the source and
destination provided.

Note that the default is the grid namespace, i.e., /home and rns:/home are
equivalent.

One of the features of the grid client is the ability to
invoke the client to execute a single grid command and then exit without
further user interaction. For example, from the local command line, the user
may enter

grid ls
/home/griduser

This will start the grid client, execute the command ls /home/griduser, and then print the results of the
command to the screen and return to the local command line prompt. If the
command requires user interaction, the standard input and output streams will
work in the standard way; this means that the standard input can be redirected
to a file using the local operating system's existing semantics.

This feature is particularly helpful for performing multiple
non-interactive commands in succession through scripting on the local command
line. The user may write useful scripts, which can invoke commands on both the
local system and in the grid, in whatever scripting dialect is already available.
Take the following example, written for Linux's bash:

#!/bin/bash

# example.sh: scripting local and grid commands

echo "This is a local command"

for I in {1..5}; do

str="This is grid command number $I"

grid echo "$str"

done

echo "End of script"

In the example script, the local operating system is
instructed to print a message, then to loop over the values 1 to 5, assigned to
the variable I. For each of these loop iterations, a string variable str is composed, and a grid
command to echo the contents of that variable is invoked. Finally, the local echo command is used to signal the end of the script.

In this fashion, command-line scripting may be employed to
create arbitrarily complex series of commands, mixing local and grid commands
as needed.

The XScript scripting language is an XML-based scripting
language developed by the Virginia Center for Grid Research (then the Global
Bio Grid research group) at the University of Virginia for use with Genesis
II. Originally the language was designed to support only minimal capabilities
– enough to get the project started until something better could be developed –
but it has since grown into a more sophisticated and fully featured language in
its own right. Today, the XScript language supports many of the language
features that are expected from a real programming language, including loops,
conditionals, and exceptions.

XScript is used to script commands from within the
grid client, as opposed to the previous section which discussed running scripts
that repeatedly invoked the grid client to execute commands. This section will
provide an overview of the features and use of XScript; a complete
documentation of XScript is available in the Documentation section of the
Genesis II wiki (which is available athttp://genesis2.virginia.edu/wiki/Main/XScriptLanguage
Reference).

In XScript, every XML element (other than the root document
element) represents a single language statement. These statements may or may
not themselves contain other statements depending on the element type in
question. For the most part, those statements which can support inner
statements are the language feature elements such as conditionals and loops,
while those that cannot generally represent simple statement types like echoes,
grid commands, and sleep statements.

In XScript, every XML elements falls into one of two
categories. The first category is for language elements and uses the first
namespace shown in the figure below, abbreviated as gsh. The second category
is for Genesis II grid commands and uses the second namespace shown in the
figure, abbreviated as geniix. We will use the first of these, gsh, as the default
namespace for all XML in this section and thus assume that the root element of
all XScript scripts looks like the following:

XScript has been designed to include most of the control
flow structures used in modern programming languages. There are also command
elements common to many scripting languages, such as “echo” and “sleep”. The
following is a list of the basic control elements and commands available in
XScript. Note that this list is subject to change as the language matures or additional
features are added.

For usage of a specific element, or the particular semantics
of its use, see the external documentation on the Genesis II wiki.

The simplest form of statement in an XScript script is a
grid command. Grid commands are identified by belonging to the geniix namespace. Any time an XML elements exists in
this namespace, the XScript engine attempts to find a grid command with the
same name as the element's local name. If it finds such a command, the
statement is assumed to represent that command, otherwise an exception is
thrown. Parameters (command-line arguments to the grid command) are indicated
with XScript param elements. Below we show example
grid commands in the XScript language for the grid commands ls (list the
contents of a RNS directory) and cp (copy files/resources to another location).

Every attribute value and text content node of an XScript
script can include a reference to a variable. If included, the value of this
variable will be inserted at run time as a macro replacement. Further,
variables are scoped by their statement level. This makes it possible to write
scripts that contain multiple variables of the same name without additional
variable definitions interfering with outer definitions.

Variables in XScript documents are indicated by surrounding
the variable name with ${ and }. Thus, to indicate the value of the NAME variable, the string ${NAME}
should appear anywhere that text was expected (such as for an attribute value
or as the text content of an appropriate XScript statement).

Arrays are also supported in the XScript language, though at
the time of the writing of this document, only for accessing parameters passed
in either to the script itself, or to functions. The length of an array in
XScript is indicated with the ${ARRAY_VARIABLE}
expression syntax, while the elements inside of the array are indicated with
the ${ARRAY_VARIABLE[INDEX]} syntax. Thus, to echo
all elements of the ARGUMENTS array, the following
XScript code can be used:

...

<for param-name=”i” exclusive-limit=”${ARGUMENTS}”>

<echo message=”Argument ${i} is
${ARGUMENTS[${i}]}.”/>

</for>

...

Arguments passed in to the script as well as those passed in
to functions are contained in the ARGV array
variable (for command-line arguments passed in to the script, the first element
is the name of the script file itself).

Below is a complete example XScript script. The
functionality of the script is trivial, but the file is syntactically correct,
and provides a concrete example of some of the concepts discussed previously in
this section. The script takes a single argument from the command line, which
it compares to a set of switch
cases, and then executes a different grid command based on that input (along
with a few echo
statements for good measure). Note the if test at the offset to determine if a
command-line argument was provided. We will call this example file example.xml.

Before we describe how to execute a script, a word about
Genesis II's script handling is in order. Genesis II supports multiple
scripting languages through the use of the Java Scripting API. In order to
differentiate between the various scripting languages, Genesis II uses filename
extensions to determine the correct language to use when running scripts. Thus,
to run a JavaScript script, the filename must end
in the .js
extension. Similarly, to run an XScript script
file, the filename must end with the .xml
filename extension.

To execute a script within the Genesis II client, use the scriptcommand, passing in the path to the script
and any parameters to the script. For example, if the example script above were
located at the RNS path /home/griduser/example.xml,
the following command would launch the script with an input parameter of who:

The main point of any grid software is to provide a means
for processing computational jobs on the compute resources that are available
in the grid. This is true for Genesis II also; many features are provided for
creating jobs in JSDL, sending them to a grid queue or BES, and managing the
jobs while queued. This section discusses the basics of creating a job and
submitting it for processing.

The purpose of a JSDL file is to specify a compute job in
terms of the executable that the job should run, the resources that it will
consume in terms of memory and CPU, and any special requirements for processor
type or other attributes. The JSDL specification requires that the file be
stored in XML format with particular elements and attributes for specifying job
attributes. This makes it fairly difficult and unpleasant to write JSDL files
from scratch. One common way to generate a new JSDL file is to change an
existing well-formed JSDL file to fit the purpose under consideration.

A better way to generate a JSDL file is to use the Genesis
II JSDL file creation tool to specify the job's requirements. This is available
as a standalone install called the Grid Job Tool (located at http://genesis2.virginia.edu/wiki/Main/GridJobTool).
This provides versions for most common operating systems. Alternatively,
the job-tool is also provided by the Genesis II client installation, and can be
executed this way:

grid job-tool

It can also be executed by
right-clicking on an execution service such as a BES or Grid Queue and
selecting “create job”.

From within the client-ui RNS Tree view, select the
directory where the JSDL project file should be located, or select the
execution container (BES or queue) where the job should be executed. Right
click on that location and select 'Create Job'. The tool has provisions to
give the job a name and description. Any arguments that the executable or
script needs for running the job can be provided in the first tab (under Basic
Job Information).

In the data tab, the data to be staged in/out can be
provided (see figure below). It is worthwhile noting here that data files
being staged in&out are usually done via the GFFS, and thus some BESes that
do not support the GFFS may need to use other stage-out types than grid: paths
(such as data files on a local file system or web server). These can also be
specified in the data tab.

The other major component for the job-tool is the resources
tab, where any specific expectations of the job in terms of hardware
configurations and preferred operating system can be specified. This is
depicted in the figure below.

The qsub command is used to
submit a new job to a queue for processing. Although jobs may be submitted to
a BES (and bypass a queue), submitting to queues is recommended since it allows
better resource allocation and job handling.

# submit a job to the
queue, with a job description file.
qsub {/queues/queuePath} local:/path/to/job.jsdl

The qsub command returns a job ticket number after
successfully submitting the job. This ticket number can be later used to query
the job, kill it, and so forth.

The qkill command allows grid
users to terminate any managed job (not already in a final state) that they
previously submitted. To kill one job in the queue, use:

grid qkill
{/queues/theQueue} {jobTicket#}

The ticket here is obtained when a job is submitted using
anyone of the recommended methods.

The qreschedule command is used to return an already-running
job back to the queue and ensures it is not rescheduled on the same BES. The
slot count for this resource must be manually reset later. This command is
useful when the Queue consists of BESes which interface to a queuing system
like PBS. A job may be in the Running state on the grid, but in a Queued state
on the back-end PBS. Such a job can be moved to an alternate BES where it can
be executed immediately. To reschedule a job:

grid qreschedule
{/queues/theQueue} {jobTicket#}

Both qkill and qreschedule have variants that allow multiple
job tickets to be killed or rescheduled with one command.

The queue manages all jobs that are submitted to it from the
time that they are submitted until the time that they are executed, or have
failed, or are cancelled by the user. Even jobs in the final states of
FINISHED, CANCELLED, or FAILED are held onto by the queue until they are
cleaned up. The process of cleaning a no-longer active job out of the queue is
called 'completing' the job. Completing a job performs the garbage collection
of removing the job from the queue.

# Removes all jobs
that are in a final state
# (i.e., FINISHED, CANCELLED, or FAILED) from the grid queue.
grid qcomplete {/queues/queuePath} --all

# Removes a specific
job from the queue, where the ticketNumber
# is the job-identifier provided at queue submission time.
grid qcomplete {/queues/queuePath} {ticketNumber}

After the client-ui has been launched, the “Queue Manager”
can be opened to control jobs in the queue or to change the queue's
characteristics given sufficient permissions. The figure below shows the client-ui
about to launch the queue manager on a selected queue:

In the first tab, called Job Manager, the queue manager
shows the current set of jobs that are in the selected queue. The jobs can be
in a number of non-final states, such as QUEUED and EXECUTING, or they may be
in a final state, such as FINISHED or FAILED.

The second tab of the Queue Manager, called Resource
Manager, shows the resources associated with the queue. The view presents what
is known about the BES resource, in terms of the operating system and other
parameters. This tab can only be modified by a user with permissions on the
queue, and the “Max Slots” is the only part of the tab that is modifiable. The
number of slots controls how many concurrent jobs the resource is expected to
handle, and the queue will allow at most that many jobs onto that particular
resource. An example Resource Manager is shown below:

To control jobs that are in the queue, look at the Job
Manager window again. When a job is selected in that view (with a
right-click), a context menu for controlling that specific job is displayed. This
is shown in the figure below:

Using the choices available, a user can stop the job with
“End Jobs”, clear up finished jobs with “Remove Jobs”, and examine the “Job
History” for the job. Job History brings up the following window with information
about the job selected:

A user can also submit jobs by copying the jsdl files into
the “submission-point” directory under the queue. This is an extremely simple
method for job submission, and the jobs submitted this way still show up in the
qstat command.

Another method to run a job is to submit the job directly to
the BES. This is a helpful method for testing jsdl files as they are being
developed, or when the user is sure that the BES supports the requirements of
the job:

grid run --jsdl={local:/home/drake/ls.jsdl}
{/bes-containers/besName}

The above command is synchronous and will wait till the job
is run.

There is an asynchronous variant that will allow job status
notifications to be store into a file in the grid namespace. Note that this
feature is only available for Genesis II BES currently, and is not supported on
the UNICORE BES. The user can check on the status of the job by examining the
status file. This is an example of an asynchronous direct submission to the
BES:

In the above, the command returns immediately after
submission. The job’s status is stored in the file specified by the grid path /path/to/jobName. Eventually this file should list the
job as FINISHED, FAILED or CANCELLED depending on the circumstances.

To run an MPI job, the JSDL file needs to specify that the
job requires MPI and multiple processors. The job executable file needs to
have been compiled with an MPI library (i.e. MPICH, MVAPICH, OpenMPI).

When using the Genesis II JSDL file creation tool, these job
requirements can be specified under the “Resource” tab (depicted in the figure
below). The “Parallel Environment” field permits selection of the MPI library
(i.e. MPICH1, MPICH2) that the executable was compiled with. The “Number of
Processors” field lets the user specify how many total processes are needed to
run the job. The “Process per Host” field lets the user specify how many of
these processes should be run per one node.

If the user manually creates a JSDL file, the JSDL SPMD
(single program multiple data) Application Extension must be used to define the
requirements of the parallel application in JSDL. Please consult the
specification document for details. The SPMD application schema essentially
extends the POSIX application schema with four elements: NumberOfProcesses,
ProcessesPerHost, ThreadsPerProcess, and SPMDVariation. The NumberOfProcesses
element specifies the number of instances of the executable that the consuming
system must start when starting this parallel application. The ProcessesPerHost
element specifies the number of instances of the executable that the consuming
system must start per host. The ThreadsPerProcess element specifies the number
of threads per process. This element is currently not supported by the Grid Job
Tool. The SPMDVariation element defines the type of SPMD application. An
example of a parallel invocation using the “MPICH1” MPI environment is provided
below.

A
user can manage their grid credentials by clicking on Credential Management
button in the client-ui window and selecting appropriate options (Login, Logout
or Logout all). Click on Credential Management->Login->Standard Grid
User tab, a separate window will pop up prompting for username, password
and grid path. This will log you into grid using your grid credentials (not
same as grid xsedeLogin), refer to Figure 24. If you select
Credential Management->Login->Local keystore tab, you can login
using a keystore file. Select the keystore file (usually .pfx form) from your
local file system and enter password for it. You can also login using
username/password Token by selecting Credential
Management->Login->Username/password tab.

You
can logout of grid by Selecting Credential-Management->Logout option
where you can select which credentials you want to logout as. This is helpful
if you have multiple credentials in your credential wallet and you want to logout
of specific credential. Refer to Figure 26.

E.6.3.1.RNS
Space (Left Panel)

Here
grid name space is presented as tree structure with root of the name space
represented by '/' and other sub-directories below it. You can browse the tree
by clicking on toggle symbol next to the resource. You can select a resource
simply by clicking on the resource. Clicking on a resource highlights the
resource and you can see security information in Right panel change
accordingly. You can also view Resource Properties and EPR Display
of the resource on the Right panel. You should at least have 'Read' permissions
on a resource to view its security and other information. If you do not have at
least Read permissions you will get an error in error panel below (Ex. No
Authorization info for target path: {grid resource name}). Launch grid
client-ui, login as grid user and then browse the RNS tree (highlighted using
red box in Figure 27) by
clicking on the toggle symbol next to root directory '/' (and then descend down
expanding toggle symbol). This will expand the tree; you can now browse to your
grid home directory or any or any other grid resource that you have permissions
on (at least read permissions). You can also minimize the tree (if already
expanded) by clicking on the toggle symbol next to the grid resource. If you
try to browse a resource without read permissions on that resource, you will
get an error message in Error Messages Box (Highlighted using Blue box) in Figure 27.

E.6.3.2.Right
panel

Here
you will find 3 tabs; Security, Resource properties and EPR Display. This is Highlighted
using Green box in Figure 27.

Security
tab:
This is selected by default when you first open the client-ui. This tab will
display read/write/execute ACLs for selected resource. More information on grid
ACLs can be found in section E.2.3. If you
grant read/write/execute ACLs on a resource and refresh the Client-ui, the new
permissions will be seen in respective ACL text box after refresh.

There
is also Username/Password Token sub-panel, this is used to issue
username/password access on selected resource to users. These users may or may
not have a grid account, all they would need is to have Genesis II client
installed and username/password information to access that resource (of course
if the resource is in the subtree, they should have access to browse to that
part of tree structure).

You
can give permissions to everyone (grid and non-grid users) on a selected
resource by dragging and dropping Everyone icon onto that ACL box ie.
read/write/execute text box. You can grant access to individual grid users
using two methods, using grid chmod command in grid shell or
drag-and-drop method in UI.In the client-ui window select the resource you want
to grant permissions on. You should have write (or admin) permissions on that
resource to be able to grant R/W/X access to other users. Locate 'Tear' icon on
the left panel (tear icon on right top corner, looks like torn piece of paper),
left click on it and drag it while you are still clicking on it. This will
create another window showing the tree structure, browse to /users directory in
the new window and select by left clicking on the username you need. Now drag
that username and drop it onto read/write/execute text box in the main
client-ui window. In Figure 29, the tear
icon, grid resource (hello file) and username (/users/andrew) on new window and
write ACL text box are highlighted. You can select the resource in the RNS tree
and it should now have the new credentials listed in the corresponding
credential text box in the right panel.

Dragging ACLs to trash

Browse the RNS tree structure
and select the resource (File/Directory) on which you want to modify the ACLs.
Then on the right security panel, select the ACLs from read/write/execute and
still holding the mouse click, drag the mouse to recycle bin and release.

E.6.3.3.Menus

File
Menu:
Drops down to present multiple selection options and most options here are
intuitive. Selecting File->Preferences option will open another frame
where you can set your client-ui and shell preferences. After every change,
make sure you refresh the client-ui by selecting View->Refresh. Some
of the options include Font size and style in the client-ui's shell window.
File->Quit menu option will quit the client-ui window. Selecting File->Make
Directory will create a new directory and File->Create New File
will create a new file in grid name space. Select File->Preferences->Shell
and select the font style and change the font size (up arrow to increase size
and down arrow to decrease size) and click on OK. Launch grid shell (Tools->Launch
Grid Shell) and type a grid command (Ex. grid ls). You can see the
changes in font style and size in this grid shell window. Refer to Figure 31.

You
can view a resource’s security ACL information at low, medium or high level.
Select File->Preferences->Security->HIGH. Refresh client-ui by
selecting View-Refresh (or F5 button on your keyboard). Refer to Figure 32. Select any grid resource that you
have at least read permissions on, in the right panel ACL text box you can now
see ACL information in more detail.

If
you select low level, your ACL information will just list the users in the ACL
box. If you select medium you can see additional information like what type of
resource it is and some additional information on user's ACLs. By selecting
High Level, you can see more information about ACLs like Identity type,
Resource Type, DN etc. This is shown in Figure 33.

Select
File->Preferences->Resource History, Set the job's history
information level to desired option Trace, Debug, Information, Warning or
Error. Select Trace option, this will provide maximum information about
the job. If you just want to see errors or warnings only, select those options.
This is useful when a user wants to debug his jobs after submitting them to
queue resource. Select a queue resource in the grid RNS name space, select Jobs->Queue
Manager. Select a job from the jobs list that you submitted, Right click
and select Job History option. In the new window Minimum Event Level
is set to Trace (or option that you selected earlier in Step 2).

To
get XML diplay Select File->Preferences->XML Display, Either
select to view grid resource information as flat XML file or as tree structure.
In Figure 34, the Resource
Properties are displayed as a tree structure. If you selected File->Preferences->XML
Display->Display XML as a tree in step 2 earlier, the information
will be displayed as shown.

To
Create new file and directory, In RNS tree, select a directory where you have
write permissions (Ex. /home/joe). Select File->Create New File
option, this will pop up a new window and promt to enter file name. After
entering the file name (Ex. abc.txt), click OK button. New file should
be in the RNS directory you selected (Ex. /home/joe/abc.txt). Refer to Figure 35.

View
Menu: View->Refresh option can
be used to refresh the client-ui window. Click on this option after you make
changes to the client-ui window (Ex. changing preferences) to reflect the
changes in UI or you create/delete/move/copy new files/directories to grid name
space and after refresh client-ui window to reflect the changes. You can also
refresh a particular resource by highlighting it and then hitting F5 button in
your keyboard but this may depend on how F5 button is configured.

Jobs
Menu:
From this menu, user can create a new job, view the existing job in the queue
resource or view saved job history of a job from the queue. The Jobs->Queue
Manager and Jobs->Create Job options will be inactive until you
select a queue resource from the tree structure in the left panel. Information
on how to create job, submit job and check the job status in queue manager can
be found in section E.5. You can
also create a job using “grid job-tool”, this will open a new window
where you can enter you job information. Most fields in the job-tool are
intuitive.

To
Create Job Select
Jobs->Create Job option. In the new window, create a new job or open
an existing one. Submitting a job from this window will submit job to the
selected queue resource. Refer to Figure 36 for an example job.
Saving new job as project will save the file on your local file system with .gjp
file extension.

Figure 37 below shows
how you can enter a project number (such as an allocation number you will get
on Kraken) or some other xsede-wide project number. In job-tool, click on the
'Job Projects' text box and you will get a pop-up, click on the '+' sign and
you will get another pop-up. Enter the project number and click 'ok', click
'ok' again and your project number will be in the main job-tool window. Also if
you forget to enter a necessary field, such as executable name or data file,
you will get a warning/error in the bottom pane of the job-tool window.

In Basic
Job Information Tab, Job Name can be any meaning name you want to
give for your job. Executable is the executable file that your job will
use to run. This can be a system executable like /bin/ls or shell script
or MPI program executable, or any other form of executable that can be run
(Others include, Java class file, C/C++ executable, Matlab, Namd etc).
Arguments list is the list of arguments your executable may need, here it is
'-l' option for /bin/ls (essentially /bin/ls -l). You can add
arguments by clicking in the '+' button in the arguments frame. You can also
pass environment variables to your job and this can added by clicking in '+'
button in Environment frame. If you decide to delete one or more
arguments or environment variables after adding them, select that argument/environment
variable and click on '-' button in respective frame.

You can
save the job output and error information to files either in grid name space
(using grid protocol) or use other protocols (scp/sftp, ftp or mailto) and save
them in Data tab of job tool. To save the standard output and standard
error from a job, enter the file names in Standard Output and Standard
Error text boxes. Refer to Figure 38. Then to save these files to
grid or other locations, add the files in Output Stages section and
select appropriate Transfer Protocol and corresponding Stage URI
path. Note, the file name you enter in Standard output and Standard
Error text boxes should match Filename area in Output Stages
but these names can change in Stage URI area. The '+' and '-' buttons
are used for adding or deleting an entry. Similarly you can stage in file
needed to execute your program in Input Staging frame.

Once
a queue resource is selected, you can select Jobs->Queue Manager to
view the jobs you submitted to queue and manage resources (if you are the owner
of those resources). Selecting Jobs->Queue Manager will open a new
window displaying your jobs and resources information. Selecting Jobs->View
Job History will open a File browsing frame displaying your machine's local
file system (machine where Genesis II client software is running). You should
have saved job's history prior to this to be able to select the job history and
view it.

To
View Jobs, Queue Manager and Job history select the queue resource in the RNS
tree and Selects Jobs->Queue Manager, this will open a Queue Manager window
showing all the jobs you own on that queue. A new window listing the
jobs you submitted or jobs you have permissions to view. Refer to Figure 39.

To see job history of a
particular job, select a job in the Queue Manager window and right click
on it. Select the Job history option and you will get a new window with job
history for that job. Here you can select different level of history
information from Minimum Event Level menu (Trace, Debug, Information,

Error
or Warning). This can also be set via File->Preferences->Resource
History tab. Refer to Figure 40.

Parameter Sweep Job

To create and submit a
parameter sweep job, open the job-tool either by clicking on queue resource or
by typing job-tool in grid shell. This will bring up the job tool shown as
below.

By default, the tab “Grid Job
Variables” is disabled. To
add a parameter sweep variable, just use ${var_name} ($ sign followed by open
curly brace followed by variable name and close curly brace) in any of the
following fields in job-tool.

Once you specify ${var_name} in any one of the above locations, 'Grid Job
Variables' tab will be activated and you can define your var_name to be either
an integer or double or string. You can also specify the starting value, end
value and step values (interval) for your variable.

After Submitting the job, the actual
values for ${var_name} will be substituted for the var_name in all the places
(Job name, arguments, File Name) specified in the job. Here's the screen shot
of the queue where job-name is substituted for actual integer values (starting
from i=1 to i=20). For above example, output files generated will also be
/home/xsede.org/vana/ls-out-1.txt, /home/xsede.org/vana/ls-out-2.txt …
/home/xsede.org/vana/ls-out-20.txt.

Tools
tab:
Selecting Tools->Launch Grid Shell will open a shell window where you
can run grid shell commands like ls, cat, cp etc. Refer to Figure 45. You can
also invoke grid shell directly in command line using grid shell. The UI
shell interface supports tab completion where as command line shell interface
does not support tab completion. More information on grid commands can be found
in section E.4.

To
launch Shell and list your home directory Login to the grid using your grid
credentials, Launch a grid shell from Tools->Launch grid Shell option. Run grid
pwd to make sure you are in your home directory (by default you will be in
your home directory after you login to grid). Run grid ls command and
this should list all the files/directories in your grid home directory.

This
method can be used for copying data files using the GUI Client. You can select
a particular File or directory in the left panel tree structure to copy out of
grid, then simply drag it while still clicking and release the mount button to
drop File/Directory on to your local computer's File system. Reverse also
works, where you can select a file/directory from your local machine and drop
it in to your grid name space. For this you will need appropriate permissions
on that grid resource i.e write permissions to copy files. Refer to section E.3.1.2 for a detailed
explanation.

This
helps you to set up the GUI to open files with a particular application. Note that
the client-ui has recently been updated to use the launching capabilities of
the Operating System. In most cases, the default behavior is sufficient to
edit and open assets in the grid. For situations where the default is not
sufficient, this section documents how to override the default applications.

The file
called .grid-applications.xml should go in the user's local file system
home directory. This file has the list of programs to launch for some mime
types that extend the basic launching support in the client-ui. On Windows-XP,
the home directory will be "c:\Documents and Settings\myUserName" and
on Windows7, the home directory will be "c:\Users\myUserName". Note
that this file is currently using short names for the first argument, which
should be a program name. If you do not have your PDF editor or your Word
document editor on the path, you will need to put the full path to the
executable for the appropriate editor. The file called .mime.types should
also go into the user's home directory. This gives Java an association between
file extensions (like .DOC) and the mime type that will be reported for files
of that type.

To open
a PDF file in grid namespace create .grid-application.xml and .mime.types
file, copy them to your $HOME directory or equivalent locations in Mac and
Windows. Launch client-ui. Browse the grid RNS space and select a PDF file you
wish to open. Double click on the file. File will open Acrobat viewer.

The GFFS “grid” command is implemented in Java and loads
several libraries at startup. Thus it can take a few seconds to start “grid”
on some platforms. This leads to fairly annoying slowness when repeatedly
running the grid command at the command line or when using it within scripts,
if one invokes “grid” for every separate command.

One can also start the grid command once, leave it running,
and enter commands into the same grid prompt to avoid repeatedly waiting for
Java to load. This approach works fine, but there is now a command called
“fastgrid” that can make even separate invocations of the “grid” command very
speedy.

The fastgrid script is available in the Genesis II “bin”
directory of the installation. Once the “set_gffs_vars” script has been
loaded, fastgrid can be invoked in place of the normal “grid” command, for
example:

Fastgrid is implemented by starting the regular “grid”
client in the background, and passing the user’s commands to that background
grid process via named pipes. The same “grid” process is used for all
subsequent command invocations, and the status of each command is gathered from
the named pipe to return as the fastgrid command’s exit value.

The --quitServer (or -q) flag can be passed to
fastgrid to terminate that “real” grid client’s background process.

Fastgrid can also be run on a command stream of multiple
lines by passing the --stdin (or -s) flag to it. For example,
this invocation relies on the Unix “here document” to pass several commands to
fastgrid:

This section describes how to create a distributed grid
using Genesis II components. The main components of such a grid are: (1) the
GFFS, which provides the file-system linking all the components together, (2)
the Grid Queues, which support submitting compute jobs to the computational
elements in the grid, and (3) the BESes, which represent each computational
element. Each of these services lives inside a container, which is a Genesis
II installation that provides one or more services to the grid via a
web-services interface.

Every GFFS grid has one “root container” that provides the
root of the GFFS file system, similar to the traditional Unix file system root
of “/”. The remainder of GFFS can be distributed across other containers which
are then “linked” into the root container. Usually, the root container serves
all of the top-level folders such as /home, /users and /resources.

This chapter will describe the overall structure for the
GFFS filesystem and will provide steps for building a new grid, starting with
the root container.

There is no definite requirement for any particular
structure of the GFFS. It starts as a clean slate, with only the root node
('/'). All of the top-level directories are defined by convention and
generally exist to provide a familiar structure around the grid resources.

This section will describe the purpose of each of these
directories. Note that most of these are created by the GFFS root container
deployment, which is described later in this chapter.

Stores
the RNS resolvers for the grid that enable fail-over and replication.

/groups/xsede.org

Stores group identities for the grid.

/home/xsede.org

Stores the home folders for users.

/resources/xsede.org/queues

Stores the queues that are available
within the grid.

/users/xsede.org

Stores
the user identities in a convenient grid-wide location.

Other grids can use the XSEDE namespace design for their
structure, but the portions of the namespace that mention “xsede.org” are
replaced by a more locally appropriate name. For example, the new XCG (Cross-Campus
Grid) namespace at the University of Virginia has folders for
/users/xcg.virginia.edu and /resources/xcg.virginia.edu and so forth. The
European GFFS grid has “gffs.eu” in those second tier names. This approach
supports federating multiple grids within the same structure; for example, the
XSEDE grid can provide a link to /resources/xcg.virginia.edu within the XSEDE
grid in order to reach the resources of the XCG grid.

Assuming a Genesis II container is installed (either via the
interactive installer or using the unified configuration scripts, both
discussed in section D), then this container can be “linked” into the grid. A
container that is linked in the grid can then be used for services, based on
the path where it has been linked.

Linking a container into the grid requires knowing the
“service URL” for that container. The interactive installer creates a
“service-url.txt” file in the install folder for the new container, whereas the
unified configuration model creates a “service-url.txt” file in the container’s
state directory (pointed at by GENII_USER_DIR). The contents of this file look
similar to this:

If instead only the name “MyContainer” is printed out, then
the link has failed. This can be due to a number of reasons, such as the
container not having been started or the container port being blocked by a
firewall.

The deployment of the GFFS requires two major components; a
set of containers that are deployed on a host or set of hosts, and a deployment
configuration package that enables a grid client or container to connect to the
GFFS. A “grid deployment package” is a directory of configuration items that is
required to connect to an existing grid as a client. This package is also
required for configuring a Genesis II Container as a server that allows secure
connections. The deployment package is constructed when building the root
container. The client is provided a limited version of this package which does
not contain any of the private keys used by the root container.

There is a “default” deployment shipped with the source code
that contains a basic set of configurations necessary to run Genesis II. A new
deployment is created when “bootstrapping” a grid that inherits the “default” deployment.
This enables the basic security configuration of “default” to be extended to
provide a secure grid.

Below are instructions to create the “Bootstrap” container
that serves as the root of the RNS namespace and the primary source of GFFS
services. Secondary containers (i.e., not the GFFS root) are created using an
installer that contains the deployment package produced during the Boostrap
configuration process. Using the installer enables new containers to be
deployed very quickly.

Note that the following steps for the Bootstrap Container assume
that the grid administrator is working with the Genesis II software as source
code, rather than via an installer. When using the Genesis II installer, these
steps are not required for setting up clients or secondary containers. Building
the installer requires some working knowledge of Install4j, an Install4j
license, and the root container’s deployment package (created below). If you would
like an installer built for your grid, it is recommended to contact xcghelp@cs.virginia.edu for
assistance.

The deployment generation process requires a copy of the Genesis
II source code (see Section H.2 if you need to obtain the source code and
Section H.1 about installing Java and other prerequisites). These steps use the
GFFS Toolkit for the root container deployment, especially the deployment generator tool (see Section I for more information about the GFFS Toolkit). The source code includes a copy of
the GFFS Toolkit (in a folder called “toolkit”).

F.2.1.1.Configuration
Variables for Bootstrapping

The deployment generator uses the same scripting support as
the GFFS Toolkit, although it requires a smaller set of configuration items. This
section will describe the critical variables that need to be defined for the
bootstrapping process.

The first choice to be made is which namespace the grid
should support. In the description of the process below, we will assume the use
of the XSEDE production namespace for bootstrapping the grid. This step copies
the example configuration file for the XSEDE namespace into place as the
configuration file for the GFFS Toolkit:

·GENII_INSTALL_DIR: point this at the location of the Genesis II
source code.

·GENII_USER_DIR: set this if you want to store the grid state in a
different location than the default. The default state directory is “$HOME/.genesisII-2.0”.

·JAVA_HOME: specifies the top-level of the Java JDK or JRE. This
is required during deployment for running the keytool file, which is not always
on the application path.

·NEW_DEPLOYMENT: set this to the intended name of the root
container’s deployment. This name should be chosen carefully as a unique and
descriptive name for bootstrapping the root container. For example, it could
be called “xsede_root” for the root container of the XSEDE grid. It should not
be called “default”, “current_grid” or “gffs_eu” which are already in use
within the installer or elsewhere.

Important: For users on NFS (Network File System), it is
critical that container state directories (pointed at by the GENII_USER_DIR
variable) are not stored in an NFS mounted folder. Corruption of the container
state can result if this caution is disregarded. Instead, the GENII_USER_DIR
should be pointed at a folder that is on local storage to avoid the risk of
corruption.

Modify the new gffs_toolkit.config
for the following variables:

·DEPLOYMENT_NAME: ensure that the chosen NEW_DEPLOYMENT from above
is also stored in the configuration file for this variable.

The other variables defined in the gffs_toolkit.config
can be left at their existing values (or can remain commented out) when
generating the new grid deployment.

The remainder of the chapter will refer to
GENII_INSTALL_DIR, GENII_USER_DIR and NEW_DEPLOYMENT as variables defined in
the bash shell environment. It is very convenient to load the required environment
variables using a script rather than typing them again. Often it makes the
best sense to add the variables to the user’s shell startup script, such as
$HOME/.bashrc. Here are some example script commands that set the required
variables:

The six main steps to create the root container for GFFS
are: (1) setup the trust store for the deployment, (2) generate key-pairs for
the various identities needed for a grid container, (3) start up the GFFS root
container, (4) create the root of the RNS name space, (5) archive the
deployment, and (6) package the deployment for others. These steps are
documented in the following sections.

Prerequisites for Generating a Deployment

·These procedures assume that the Genesis II code has been
acquired and is already compiled. To build the Genesis II code, refer to Section
H.3 on “Building Genesis II from Source on the Command Line” to compile the
codebase. It is very important that the unlimited JCE jars are installed on
any machine running the GFFS; refer to section H.1 for more information.

·In the following steps, it is crucial that no user state
directory exist before the GFFS container creates it. If you have $HOME/.genesisII-2.0,
then delete it beforehand. (Or if $GENII_USER_DIR points at a different state
directory, be sure to delete that.)

·The user state directory must not be stored on an NFS file system.
One should point the GENII_USER_DIR at a directory on a local file system.

The basic GFFS security configuration for the root container
is established in the deployment generator. This involves setting up a
resource signing keypair, a TLS keypair, an administrative keypair and the
container’s trust store.

The first configuration feature is the “override_keys”
folder, which allows the deployment to be built with a pre-existing “admin.pfx”
and/or “tls-cert.pfx” file. These files should be in PKCS#12 format with
passwords protecting them. If “admin.pfx” is present in “override_keys”, then
it will be used instead of auto-generating an administrative keypair. If
“tls-cert.pfx” is present, then it will be used for the container’s TLS keypair
rather than being auto-generated. The passwords on these PFX files should be
incorporated into the “passwords.txt” file discussed in a later section.

The next trust store component is the “trusted-certificates” directory in the deployment_generator. This should be populated with the
most basic CA certificates that need to be present in the container’s trust
store. The CA certificate files can be in DER or PEM format. Any grid
resource whose certificate is signed by a certificate found in this trust store
will be accepted as valid resources within the GFFS. Also, the GFFS client
will allow a connection to any TLS certificate that is signed by a certificate
in this trust store. For example:

The third component of the GFFS trust store is the
“grid-certificates” directory, where the bulk of well-known TLS CA certificates
are stored for the grid. This directory will be bound into the installation
program for the GFFS grid, but at a later time, the automated certificate
update process may replace the installed version of those certificates for
appropriate clients and containers. The “grid-certificates” directory can be
populated from the official XSEDE certificates folder when building an XSEDE
grid as shown:

The deployment generator will use the given configuration to
create the complete trust store. This includes generating a resource signing
certificate (“signing-cert.pfx”) for the grid which is built into the trust
store file (“trusted.pfx”). If not provided, the deployment generator will
also automatically create the root container’s TLS certificate (“tls-cert.pfx”)
and administrative certificate (“admin.pfx”). The trusted-certificates and
grid-certificates folders are included verbatim rather than being bound into
trusted.pfx, which permits simpler certificate management later if changes are
needed.

Building an XSEDE compatible GFFS root requires additional
steps. Because the XSEDE grid uses MyProxy authentication (as well as
Kerberos), the deployment generator needs some additional configuration to
support it.

MyProxy Configuration

To authenticate MyProxy logins, an appropriate
“myproxy.properties” file must reside in the folder
“deployment-template/configuration” in the deployment generator. Below is the
default myproxy.properties file that is compatible with XSEDE’s myproxy servers;
it is already included in the configuration folder:

A directory called “myproxy-certs” should also exist under
the deployment generator. This directory should contain all the certificates
required for myproxy authentication. The provided configuration template
includes a myproxy-certs directory configured to use the official XSEDE MyProxy
server; this should be replaced with the appropriate CA certificates if the
grid is not intended for use with XSEDE MyProxy.

The deployment can be created automatically using the script
“populate-deployment.sh” in the deployment_generator folder. Do not do this step
unless it is okay to completely eliminate any existing deployment named $NEW_DEPLOYMENT (which will be located under
$GENII_INSTALL_DIR/deployments).

Edit the passwords specified in “passwords.txt”. These
passwords will be used for newly generated key-pairs. These passwords should
be guarded carefully.

Edit the certificate configuration in “certificate-config.txt” to match the internal certificate
authority you wish to create for the grid. The root certificate created with
this configuration will be used to generate all container “signing” certificates,
which are used to create resource identifiers inside of containers. Container
TLS certificates can also be generated from that root certificate, or they can
be provided manually (and their CA certificate should be added to the trust
store as described above). Consult the sections “Container Network Security”
and “Container Resource Identity” for a discussion of TLS and signing
certificates.

The next step generates the necessary certificate files and
copies them into the deployment. Again, this step will *destroy* any
existing deployment stored in the $GENII_INSTALL_DIR/deployments/$NEW_DEPLOYMENT
folder.

The container port number is at the installer’s discretion,
but it must be reachable through any firewalls if grid clients are to connect
to it. The hostname for the GFFS root must also be provided, and this should
be a fully qualified DNS hostname. The hostname must already exist in DNS
records before the installation.

These steps can be used to get the GFFS root container
service running. They actually will work for any container built from source:

cd $GENII_INSTALL_DIR

bash runContainer.sh
&>/dev/null &

Note that it can take from 30 seconds to a couple minutes before
the container is finished starting up and is actually online, depending on the
host. A container is ready to use once the log file mentions the phrase “Done
restarting all BES Managers”. The log file is located by default in $HOME/.GenesisII/container.log.

Once the bootstrap process has succeeded, it’s important to
clean up the bootstrap script, since it contains the admin password:

rm deployments/$NEW_DEPLOYMENT/configuration/bootstrap.xml

At this point, a very basic grid has been established. The
core directories (such as /home/xsede.org and /resources/xsede.org) have been created. Any standard
groups are created as per the definition of the namespace; for example, the XSEDE
bootstrap creates groups called gffs-users for normal users and gffs-admins for
administrators. However, there are no users defined yet (besides the
administrative keystore login).

The steps above also generate a crucial file called “$GENII_INSTALL_DIR/context.xml”. The context.xml file needs to be made available to grid
users before they can connect clients and containers to the new root container.
For example, this file could be uploaded to a web-server, or it could be manually
given to users, or it could be included in a new Genesis II installation package.

The grid administrator can make a package from the root
container's deployment generator directory that other grid clients and
containers can use to connect to the grid. The package will provide the same
trust store that the root container uses, and it provides new containers with a
TLS certificate that will be trusted by grid clients:

cd $GFFS_TOOLKIT_ROOT/tools/deployment_generator

bash
package-deployment.sh

This will create a file called deployment_pack_{datestamp}.tar.gz
in the user’s home folder. This archive can be shared with other users who
want to set up a container or a grid client using the source code. The package
includes the container’s context.xml file, the
trust store (trusted.pfx and other directories),
and the admin certificate for the grid.

It is preferred to use a Genesis II installer for all other
container and client installations besides the root (bootstrap) container. The
above deployment package should be provided to the person building the installer.
The package building script uses the deployment package to build an installer
that can talk to the new root container.

There is an administrative certificate provided by the installation
package for grid containers. Changing the admin certificate has wide-ranging
effects: it controls who can remotely administer the grid container and it
changes whether operations can be performed on the container by the grid’s
administrator (such as accounting data collection). Changing the admin
certificate for a grid container should not be undertaken lightly.

A related concept to the administrative certificate for a
container is the “owner” certificate of the container. The owner is a
per-container certificate, unlike the admin certificate that is usually
distributed by the installation program. The owner certificate can also be
changed from the choice that was made at installation time.

Clients whose credentials contain either the admin or owner certificate
are essentially always given permission to perform any operation on any of that
grid container’s services or on grid resources owned by the container.

For the discussion below, we will refer to the container’s
security folder as $SECURITY_FOLDER. It will be explained subsequently how to
determine where this folder is located.

The grid container admin cert is located in $SECURITY_FOLDER/admin.cer. The .cer file ending here corresponds to a
DER-format or PEM-format certificate file. Replacing the admin.cer file changes the administrative keystore for
the container.

The container owner certificate is instead located in $SECURITY_FOLDER/owner.cer, and can also be in DER or PEM format.

The owner and admin certificates are also commonly stored in
the $SECURITY_FOLDER/default-owners directory. The
default-owners directory is used to set default access control for a grid
resource during its creation when no user security credentials are present.
This is a rather arcane piece of the Genesis II grid container and is mostly
used by the grid container during certain container bootstrapping operations.

However, if either certificate is to be changed, then it
makes sense to change default-owners too. Otherwise some resources created
during container bootstrapping will be “owned” (accessible) by the original
certificates. Because of this, if you wish to change the admin or owner certificate
for a grid container, it is best to prevent the grid container from starting during
installation and to immediately change the admin.cer
and/or owner.cer files before starting the grid container for the first time.

If the container has inadvertently been started already but
still has no important “contents”, then the default-owners can be changed after
the fact. The container should be stopped (e.g. GFFSContainer stop) and the
GENII_USER_DIR (by default stored in $HOME/.genesisII-2.0) should be erased to
throw out any resources that had the prior administrator certificate associated
with them. Again, only do this if there is nothing important installed on this
container already! Once the admin.cer and/or owner.cer file is updated,
restart the container again (e.g. GFFSContainer start).

If the container has been inadvertently started but does
have important contents, then the ACLs of affected resources and services can
be edited to remove the older certificate. The easiest method to edit ACLs is
to use the client-ui (documented in Section E.6) to navigate to the affected
resources and drag the old credential into the trash bin for ACLs it is present
in.

Occasionally, a user certificate that owns a container may
become invalid or the administrative certificate may need to be swapped out. To
swap the certificates into the proper location, we need to resolve the
SECURITY_FOLDER variable to a real location for the container. This has become
more difficult than in the era when there were only interactive Genesis II
installations, because containers can be configured differently when using a
host-wide installation of the software. To assist the grid maintainers, a new
tool called “tell-config” has been added that can report the security folder
location:

grid tell-config security-dir

Given that one has located the proper SECURITY_FOLDER (and
has set a shell variable of that name), these steps take a new certificate file
($HOME/hostcert.cer) and make that certificate both the administrator and owner
of a container:

If only
the admin or only the owner certificate needs to be updated rather than both,
then just perform the appropriate section of commands from above.

F.2.4.XSEDE Trust Store Customization

When a GFFS container or client is deployed on a host that
supports the official XSEDE CA certificates, it is desirable to use the
official certificates directory rather than the static copy of the certificates
provided by the install package. This affects two configuration items: the
myproxy certificates and the grid’s TLS certificates.

To use the official certificates location for MyProxy,
update the “security.properties” file in the container’s deployment
configuration folder. The full path to the file should be
$GENII_INSTALL_DIR/deployments/current_grid/configuration/security.properties
for most installations. Editing this file requires root or sudo permissions if
the installation is system-wide (e.g. installed from the RPM). For the root
container or other containers with specialized deployments, the path will be
based on the active deployment name, such as
$GENII_INSTALL_DIR/deployments/xsede_root/configuration/security.properties.
The active deployment folder can be shown by running “grid tell-config active-deployment-dir”.

A helper script called “use_official_trust_store.sh” has
been developed and is available in “$GFFS_TOOLKIT_ROOT/tools/xsede_admin”.
This script performs the necessary edits on the security.properties file given
that the GENII_INSTALL_DIR variable is set. Run it without any flags to cause
it to point the security.properties at the official certificate locations:

bash $GFFS_TOOLKIT_ROOT/tools/xsede_admin/use_official_trust_store.sh

After this script is run, the remainder of the edits below
are not needed, but the container should be restarted so that it will start
using the modified trust store (see section G.2.2 regarding restarting a
container). There is a periodic trust store refresh also for containers and
clients, but restarting the application will use the new trust store
immediately. This command restarts the local container:

$GENII_INSTALL_DIR/GFFSContainer
restart

These re-configuration steps are required again after an RPM
install is upgraded, since the deployment’s security.properties file will be
replaced with the default version.

If for some reason using the script is not appropriate (or
if one has an older installation without the script), the trust store
modification can also be performed manually with the following steps. Modify security.properties
to change this entry from the default:

Similarly, the TLS trust store can be configured to use the
official XSEDE CA certificates. The grid-certificates folder is defined in the
same security.properties file, where the original entry looks like this:

To create a new deployment from scratch, a directory should
be created under “$GENII_INSTALL_DIR/deployments”
with the name chosen for the new deployment. That directory should be populated
with the same files that the populate-deployment.sh script puts into place.
The deployment should inherit from the default deployment.

There are a few important requirements on certificates used
with Genesis II:

·Signing certificates (set in security.properties in
'resource-identity' variables) must have the CA bit set. Container TLS
certificates do not need the CA bit enabled.

·Clients can only talk to containers whose TLS identity is in
their trust stores (i.e., the CA certificate that created the TLS certificate
is listed).

·When acting as a client, a container also will only talk to other
containers whose TLS certificates are in its trust store.

The deployment directory consists
of configuration files that specify the container's properties and trusted
identities. For an interactive install using the Split Configuration model (see
Section D.8.6), these files are rooted in the deployment directory provided by
the installer:$GENII_INSTALL_DIR/deployments/{DEPLOYMENT_NAME}. The following
files and directories can usually be found in that folder (although changing
properties files can change the name expected for particular files):

oThis is the certificate for a grid user who has complete control
over the container.

oOwnership can be changed by swapping out this certificate and
restarting the container.

·security/default-owners: A directory holding other administrators
of this container.

oCan contain DER or PEM encoded .cer certificates.

oAny .cer file in this directory is given default permissions on
creating services for this container.

·security/signing-cert.pfx: Holds the container’s signing key that
is used to create resource identifiers.

·security/tls-cert.pfx: Holds the TLS certificate that the
container will use for all encrypted connections.

·security/trusted.pfx: Contains certificates that are members of the
container’s trust store.

oThis file encapsulates a set of certificates in PKCS#12 format.

·security/trusted-certificates: A directory for extending the
container’s trust store.

oCertificate files can be dropped here and will automatically be
part of the container trust store after a restart.

oThis is an easier to use alternative to the trusted.pfx file.

·security/grid-certificates: Similar to trusted-certificates, this
is a directory that extends the container’s trust store.

oThese certificates are part of the automatic certificate update
process.

oIn the XSEDE grid, this directory often corresponds to the
/etc/grid-security/certificates folder.

·security/myproxy-certs: Storage for myproxy certificates.

oThis directory is the default place myproxyLogin and xsedeLogin
use as the trust directory for myproxy integration.

·configuration/myproxy.properties: Configuration of the
myproxy server.

oThis file is necessary for the myproxyLogin and xsedeLogin commands.

Overrides for the above locations in the Unified
Configuration model (for more details see Section D.8.6):

·$GENII_USER_DIR/installation.properties: Property file that
overrides some configuration attributes in the Unified Configuration model.
These include certain elements from security.properties,
web-container.properties and server-config.xml.

·$GENII_USER_DIR/certs: Local storage of container
certificates in Unified Configuration.

oThe certificate (CER) and PFX files for a container with the
Unified Configuration are stored here (unless the container uses a specialized
deployment folder, see below).

oThe “grid-certificates” folder can be located here, and overrides
the “security” folder of the deployment.

oA “local-certificates” folder can be stored here to contain
additional elements of the container’s trust store.

The Genesis II clients and containers will process
Certificate Revocation Lists (CRLs) according to the official certificates
directory provided by XSEDE. Genesis II will use CRL files if they are found
in the “grid-certificates” trust store folder (see section F.2.2.1). This folder can be pointed at an absolute path, such as the official XSEDE certificates
directory (see section F.2.4). The CRL files must end in the characters “.r0”
to be recognized as CRL files, and they are expected to be in PEM format as
encoded by the fetch-crl tool (http://linux.die.net/man/8/fetch-crl).

The CRL files found in the configured grid-certificates
folder will be loaded and used to block connections to containers that are
found to be running one of the revoked certificates. This applies to both a
Genesis II client connecting to a container, and also to a container connecting
to another container for services.

F.2.6.1.Certificate
Package Uploader

On official XSEDE hosts, the grid-certificates configuration
should be pointed at the official location, which relies on regular updating of
the CRL lists using the fetch-crl tool. On non-XSEDE hosts, the
grid-certificates will initially be provided by the install package, but can be
caused to update automatically using a copy of the certificates package from
within the GFFS grid. The certificates package can be built using the “upload_grid_certs.sh”
script provided by the install package. This script creates a copy of the
certificates and CRL files found in the official location
(/etc/grid-security/certificates) and uploads that package to the GFFS grid (in
grid:/etc/grid-security/certificates/grid-certificates-X.tar.gz, where X is
replaced by a timestamp). Example usage of the script:

bash $GFFS_TOOLKIT_ROOT/tools/xsede_admin/upload_grid_certs.sh

The script requires that the logged-in grid user has
permission to write the new certificate file as well as permission to create
the /etc/grid-security/certificates folder if it does not already exist. The
simplest way to obtain the proper rights is to log in as a member of the
gffs-admins group, or to request that a grid administrator enable the
permission for the particular grid user that will run the upload process.

The upload script can be added to a cron job in order to
regularly update the certificate package in the grid. Here is an example cron
file that runs the upload script every day at 3am:

This cron job logs in as a grid user with appropriate
permissions for running the upload script and then runs the upload script.
Afterwards, a new copy of the certificates package should be stored in the
grid, and grid clients will periodically update their own copy of the
grid-certificates as described in the next section. If any errors occur during
the upload process, messages will be printed to the console and the script’s exit
code will be non-zero.

F.2.6.2.Automated Certificate
Download and Update

The Genesis II client will periodically check for the
presence of a new certificate package file in the grid, and if it is found, the
client downloads that file locally and updates the state directory’s copy of
the grid-certificates folder (in $GENII_USER_DIR/grid-certificates). This
folder automatically overrides the shipped version of the grid-certificates in
order to use the latest CRL lists.

When the grid-certificates configuration is pointed at an
absolute path, the certificate update process will not be performed by the
client. This allows the containers and clients to use the official
certificates in a local filesystem, as described in section F.2.4. If the grid-certificates configuration is left as a relative path (by default it is just
set to “grid-certificates”), then the automatic certificate update process is
enabled.

The file “$GENII_USER_DIR/update-grid-certs.properties”
tracks the last runtime of the update process and the last package file that
was used. To force an update of the grid-certificates for the client, remove
that file and run a new instance of the Genesis II client. The state
directory’s copy of the grid-certificates in
“$GENII_USER_DIR/grid-certificates” will have a recent timestamp after the
certificates have been updated successfully. One can also examine the client
log (in $HOME/.GenesisII/grid-client.log) to see information from the update
process.

It is important for the grid-certificates to also be kept up
to date on Genesis II containers, if they are not running on official XSEDE hosts.
Due to the implementation differences between Genesis II clients and containers,
the automated certificate update processing used in the client code cannot be
re-used in the container. However, if the grid client updates the local
certificates folder, then a container running as the same Unix user can take
advantage of this; that is because the state directory is shared between the
client and container running on the same account, and the container handles any
CRL files found just as the client does. The following cron job uses the grid client
to regularly update the grid-certificates for the container:

This job just runs the client every day at 5am to list the
root directory of GFFS. The side effect is that the client will also test
whether there is a new certificates package and update its local copy of the
grid-certificates if a new package is found. Afterwards, the container will start
using the new grid-certificates, which include the latest CRL files. The
container will start using the files after its process is restarted, but running
containers will also periodically reload the trust store (every 4 hours by
default).

The notion of a container is a simple idea at its root. A
container encapsulates the services that a particular Genesis II installation
can provide on the web. In other service models, the container might have been
referred to as being a “server”. However, in the field of web services, any
program that can host web services may be called a “container”. This includes
application servers like “tomcat”, “glassfish” and other programs that can host
web services.

In the case of Genesis II, commodity container features are
provided by the Jetty Application Server and Apache Axis 1.4, which route
requests from clients to the specific services in Genesis II. But in general
terms, we refer to any separate Genesis II install that can provide web
services to the grid as a “container”. If a Genesis II installation just uses
containers and provides no services of its own, then it is referred to as a
“client”.

Grid containers based on Genesis II have a representation in
the GFFS as “resource forks”. The resource fork provides a handle for
manipulating the actual internals of the container, but the resource fork
resembles a normal file or directory. The notion of resource forks provides an
easy to use mapping that represents the capabilities of the container (and
other resource types) within the GFFS filesystem.

The top-level resource fork is VcgrContainerPortType, which
provides access to the container itself. Once a container is linked into the
grid via the VcgrContainerPortType, the other services can be viewed under its
Services folder. This command shows the basic step for linking a container
into the grid. The target location where the container resides must be
writable by the user creating this link:

# show the services available on the
container:
grid ls /home/xsede.org/fred/MyContainer/Services

The ‘ln’ command links the container into the grid, at which
point the owner or administrator of the container can make the container’s
services available to other users.

As an example of a container service, the X509AuthnPortType
service on a container is where basic X509 grid identities are created. A user
on the XSEDE grid (with appropriate permissions) can list the directory
contents for that port-type in the resource fork for the primary STS container by
executing:

However,
these user entries are not simple files that can be copied to someplace else in
the grid and used like normal files are used. They only make sense in the
context in which the container manages them, which is as IDP entities. That is
why we create a link to the user identity in the /users/xsede.org
folder (see the Creating Grid Users section for more details) rather than just
copying the identity; the link maintains the special nature of the identity,
whereas a copy of it is meaningless.

There are several resource forks under the container's
topmost folder. Each item's name and purpose are documented below. Note that
visibility of these items is controlled by the access rights that have been
set, and not every user will be able to list these folders.

resources This
is a directory that lists all the resources, categorized by their types, that
have been created in this container. The directory is accessible to only the
administrator of the container.

filesystem-summary.txt This
file reports the current free space of the container's filesystem.

Services This
directory holds all of the port-types defined on Genesis II containers. Each
port-type offers a different type of service to the grid.

container.log This
is a mapping for the file container.log under the GENII_INSTALL_DIR. Note that if that is not where the
container log is being written, then this GFFS entry will not be accessible.
This is mainly a concern for containers that are built from source, as the
graphical installer sets up the configuration properly for the container log.

Within the Services directory, one will see quite a few port
types:

ApplicationDeployerPortType Deprecated;
formerly part of preparing an application to use in the grid (e.g. unpacking
from zip or jar file, etc).

BESActivityPortType OGSA
port-type for monitoring and managing a single activity that has been submitted
into a basic execution service.

CertGeneratorPortType A
certificate authority (CA) that can generate new container TLS certificates for
use within a grid. Used in XCG, not expected to be used in XSEDE.

EnhancedRNSPortType The
Resource Namespace Service (RNS) port-type, which provides directory services
to the grid. In addition to the standard RNS operations, this port-type
supports a file creation service (createFile) that creates a file resource in
the same container that the RNS resource resides in.

ExportedDirPortType Port-type
for accessing and managing a directory that lies inside an exported root. It
is not needed when using the light-weight export mechanism.

ExportedFilePortType Port-type
for accessing a file inside an exported directory. Like the
ExportedDirPortType, it is not needed for the light-weight export.

ExportedRootPortType Port-type
for exporting a directory in local file-system in the GFFS namespace. This
port-type is extended by its light-weight version, LightWeightExportPortType,
and we recommend using that for exports instead of ExportedRootPortType.

FSProxyPortType Similar
to the LightweightExportPortType, but allows any
filesystem-like subsystem that has a Genesis II driver to be mounted in the
grid. Examples include ftp sites, http sites, and so forth.

GeniiBESPortType The
BES (Basic Execution Management) Service provides the capability to execute
jobs submitted to the container.

GeniiPublisherRegistrationPortType An extension of
the WS-Notification web service allowing data providers for subscriptions to
register with the grid. Interoperates with GeniiResolverPortType,
GeniiSubscriptionPortType and GeniiWSNBrokerPortType.

GeniiPullPointPortTypeUnimplemented.

GeniiResolverPortType A
service that provides WS-Naming features to the grid. Clients can query the
location of replicated assets from a resolver.

GeniiSubscriptionPortType Port-type
for managing a subscription for web-service notifications. All Genesis II
resources extends WSN NotificationProducer port-type, and, therefore, can
publish notifications. This port-type is used to pause, resume, renew, and
destroy any subscription for a Genesis II resource's notifications.

GeniiWSNBrokerPortType Deprecated;
implements a form of subscription forwarding.

LightWeightExportPortType Port-type
for exposing a directory in local file-system into the GFFS namespace. Users
with appropriate credentials can access and manipulate an exported directory
just like a typical RNS resource. The port-type also support a quitExport
operation that detaches the exported directory from the GFFS namespace.

PipePortType A
non-standard port-type for creating a unidirectional, streamable ByteIO
communications channel. Once the pipe is created, the client can push data
into one end and it will be delivered to the other end of the pipe. This is a
less storage-intensive way to transfer data around the grid, because there does
not need to be any intermediate copy of the data stored on a hard-drive or
network location.

QueuePortType A
meta-scheduler port-type for submitting and managing jobs in multiple basic
execution services (BES). The user can submit, query status, reschedule, kill,
etc. one or more activities through this port-type as well as can configure how
many slots of individual BESes will be used by the meta-scheduler.

RExportDirPortTypeAn
extension of the ExportedDirPortType that supports
replication.

RExportFilePortTyp An
extension of the ExportedFilePortType that
supports replication.

RExportResolverFactoryPortType This port-type
creates instances of the RExportResolver
port-type.

RExportResolverPortType A
port-type whose EPR is embedded into an exported directory or file's EPR to
support resolving to a replica on failure.

RandomByteIOPortType Port-Type
for accessing a bulk data source in a session-less, random way. The user can
read and write blocks of data starting at any given offset. In other words,
the port-type exposes a data-resource as a traditional random-access file in a
local file-system.

StreamableByteIOPortType Port-type
for accessing a bulk data source via a state-full session resource. It
supports the seekRead and seekWrite operations.

TTYPortType An
earlier implementation of the PipePortType that was used at one time for
managing login sessions.

VCGRContainerPortType The
container port-type provides the top-level handle that can be used to link
containers into the GFFS. This port-type represents the container as a whole.
When this port-type is linked into the grid, users can see the container
structure under that link (including, eventually, the VCGRContainerPortType).

WSIteratorPortType Port-type
for iterating over a long list of aggregated data, instead of retrieving it in
a single SOAP response to its entirety. This port-type is used in conjunction
with other port types such as in entry listing of an RNS resource or job
listing in a queue. Its interface has exactly one operation: the iterate
operation.

X509AuthnPortType This
port-type is the Identity Provider (IDP) for the container. New identities can
be created under this resource fork, and existing identities can be listed or
linked from here.

When a user looks at the contents of the GFFS, the real
locations of the files and directories are hidden by design. The EPR for files
(ByteIO) and directories (RNS resources) can be queried to find where they
really reside, but usually this is not of interest to users during their daily
activities in the grid. It is much more convenient to consider the files as
living “in the grid”, inside the unified filesystem of the GFFS.

However, this convenient view is not always sufficient, and
a user may need to be very aware of where the files really reside. For
example, streaming a large data file from the root container of the GFFS to a
BES container that is half-way around the world is simply not efficient. In
the next section, we describe how to store files on whatever container is
desired. But first, it is important to be able to determine where the files
really reside. For example, a user is given a home directory when joining a
grid, but where does that home directory really live?

To determine which container is providing the storage
location for a directory, use the following command:

# Show the EPR for a
directory:
grid ls -e -d {/path/to/directory}

This will produce a lengthy XML description of the EPR, and
included in that will be an xml element called ns2:Address
that looks like the following:

This provides several useful pieces of information. The
hostname server.grid.edu in this example is the
host where the container really lives. The port number after the colon (18230
in this example) is the port where container provides its web-services. In
addition, the unique id of the RNS resource itself (the directory being
queried) is shown.

A file's presence in a particular directory in the GFFS
(i.e., an RNS path) does not necessarily mean that the file actually resides in
the same container as the directory. That is because files linked into the
directory from other containers are still actually stored in the original
location. To show which container a file or directory is really stored on, use
the following command to display the item’s EPR:

# Show the EPR for a
file.
grid ls -e {/path/to/file}

This produces another EPR dump, which will again have an ns2:Address entry:

By default, the mkdir command
will create directories using the GFFS root container as the storage location.
It is desirable to store files on other containers to reduce load on the root
container. It is also faster to access data files when they are closer
geographically. To create a directory on a particular container, use the
following steps. Afterwards, any files or directories stored in the new folder
will be physically stored on the {containerPath} specified:

All connections between GFFS containers use TLS (Transport
Layer Security) to encrypt the SOAP communications and avoid exposing critical
data on the network. The basis of the TLS connection is a certificate file
called the “TLS certificate”, which is configured in a container’s deployment
in the “security.properties” file.

The TLS certificate represents the container’s identity on
the network. Incoming connections to a container will see that certificate as
“who” they are connecting to. When the container makes outgoing connections,
it will use this certificate as its outgoing identity also.

In the case of some grids, the TLS certificates can be
created automatically for a container at installation time using the grid’s
Certificate Generator. This is handled automatically by the Genesis II GFFS
installer.

Other grids may have stricter security requirements, such
that they provide their own TLS certificates from a trusted CA. The installer
can support such a container when the user is already in possession of an
approved TLS certificate; there is an install dialog for adding the
certificate. If the certificate is not yet available, the user can go ahead
and generate a temporary TLS certificate, and replace that later with the
official certificate when available.

The TLS certificate for a container can be replaced at any
time. After switching to a different TLS certificate and updating the
configuration for the container, one must restart the container to cause the
new TLS certificate to take effect.

A GFFS grid client will only connect to a container if the
TLS certificate of the container is known to the client, by its presence in the
client’s trust store. This ensures that the container is intentionally part of
the grid, rather than being from some unknown source. GFFS containers also
follow this restriction when they act as clients (to connect to other
containers for services).

Grid clients for a given grid will automatically trust the
TLS certificates generated by the grid’s Certificate Generator. If specific
TLS certificates are used for each container, then each of the CA certificates
that created the TLS certificates must be added to the installation’s trust
store. Once those CA certificates are present, grid clients and containers
will then allow connections to be made to the affected container. Further
information on configuring the TLS certificate is available in Section F.2.5 as well as in the internal documentation in the deployment’s
security.properties file.

Besides the TLS certificate described in the last section,
there is another type of certificate used by containers. This certificate is
called the “Signing Certificate” and it is used for generating resource
identifiers for the assets owned by a container.

The signing certificate is always created by the grid’s
Certificate Generator. It is an internal certificate that will not be visible
at the TLS level, and so does not participate in the network connection
process. Instead, the Signing certificate is used to achieve a
cryptographically secure form of GUID (Globally Unique IDentifier) for each
resource in a container. Each resource has a unique identity generated for it
by the container using the Signing certificate. This allows a container to
know whether a resource was generated by it (e.g., when the resource is owned
by the container and “lives” inside of it), or if the resource was generated by
a different container.

All of the Signing certificates in a grid are “descended
from” the root signing certificate used by the Certificate Generator, so it is
also clear whether a resource was generated inside this grid or generated
elsewhere.

The resource identifiers created by a container’s Signing
certificate are primarily used in SAML (Security Assertion Markup Language)
Trust Delegations. Each resource can be uniquely identified by its particular
certificate, which allows a clear specification for when a grid user has
permitted a grid resource to act on her behalf (such as when the user delegates
job execution capability to a Queue resource, which in turn may delegate the
capability to a BES resource).

The Signing certificate thus enables containers to create
resources which can be described in a standardized manner, as SAML assertions,
in order to interoperate with other software, such as UNICORE EMS services.

It is possible to restrict the ability of grid users to
create any files on a container. It is also possible to permit file creations
according to a quota system. Either approach can be done on a per-user basis.

All files stored in a grid container (that is, “random byte
IO” files) are located in the $GENII_USER_DIR/rbyteio-data folder. Each grid
user’s name is the first directory component under the rbyteio-data folder,
allowing individualized treatment of the user’s ability to create files.

F.3.6.1.Blocking user
ability to create files on a container:

If a user is to be disallowed from storing any byteio
type files on the container, then it is sufficient to change the user’s data
file folder permission to disallow writes for the OS account running the
container.

For example: The container is running as user “gffs”. The
user “jed” is to be disallowed from creating any files in that container. The
user’s random byte IO storage folder can be modified like so:

chmod 500 $GENII_USER_DIR/rbyteio-data/jed

To enable the user to create files on the container again,
increase the permission level like so:

chmod 700 $GENII_USER_DIR/rbyteio-data/jed

F.3.6.2.Establishing
quotas on space occupied by user files:

Limits can be set on the space occupied by a user’s random
byte IO files, enabling the sysadmin to prohibit users from flooding the entire
disk with their data. The following is one approach for establishing a
per-directory limit for the user’s data files.

Assuming that a user named “jed” is to be given a hard quota
limit of 2 gigabytes, the following steps will restrict jed’s total file usage
using a virtual disk approach:

The Genesis II GFFS containers rely on the Apache Derby
Embedded Database implementation for their database support. Much of the time
the database engine is trouble free, but occasionally it does need
maintenance. This section covers topics related to the management of the GFFS
database.

The Derby software provides another way to set CLASSPATH,
using shell scripts (UNIX) and batch files (Windows). This tutorial shows how
to set CLASSPATH explicitly and also how to use the Derby scripts to set it.

Change directory now into the DERBY_INSTALL/bin directory.
The setEmbeddedCP.bat (Windows) and setEmbeddedCP (UNIX) scripts use the
DERBY_INSTALL variable to set the CLASSPATH for Derby embedded usage.

You can edit the script itself to set DERBY_INSTALL, or you
can let the script get DERBY_INSTALL from your environment. Since you already
set DERBY_INSTALL, you don't need to edit the script, so go ahead and execute
it as shown below:

The output on your system will probably be somewhat
different from the output shown above, but it should reflect the correct
location of jar files on your machine and there shouldn't be any errors. If you
see an error like the one below, it means your class path is not correctly set:

The Genesis II system provides a queuing feature for
scheduling jobs on a variety of different types of BES services. The queue
matches the job requirements (in terms of number of CPUs, required memory,
parameters for matching types of service required, and other factors) with a
BES that is suited to execute the job. When other jobs are already executing
on the necessary resources, the queue keeps the job waiting until the resources
become available. Queues also provide services to users for checking on their
jobs' states and managing their jobs while in the queue.

A queue in the GFFS generally does not do any job processing
on its own. It does all of the processing via the BES resources that have been
added to the queue. The following shows how to create the queue itself; later
sections describe how to add resources of different types to the queue:

The computational elements in the grid are represented as
Basic Execution Services (BES) containers. These will be discussed more in the
next section, but assuming that a BES is already available, it can be added as
a resource on a grid queue with the following steps. Once added as a resource,
the queue can start feeding appropriate jobs to that BES for processing.

To configure Genesis II BES on a Linux machine, the grid administrator
should install a Genesis II container on that machine first. Usually this machine
will be a submit node on a cluster, and it should have a batch job submission system
(such as UNICORE, PBS, SGE, etc) set up on the cluster. Once the BES is configured,
it will talk to the underlying batch job submission system and submits users' jobs
to the nodes on the cluster. Grid admin can also configure attributes specific
to that machine while setting up the BES.

The native BES type provided by Genesis II is a fork/exec
BES. This type of BES simply accepts jobs and runs them, and offers no special
functionality or cluster support. It offers a very basic way to build a
moderately-sized computation cluster, if needed.

Adding a BES service requires that a Genesis II container
already be installed on the host where the BES will be located. To create a
BES on that container, use the following steps:

In the previous example, the BES was created on a machine to
run jobs submitted by grid users. These jobs execute on the local machine
(Fork/Exec) or on the system's compute nodes through the local queuing system
interface (PBS, Torque, etc). From the local machine's standpoint, all of these
jobs come from one user; that is, all of the local processes or job submissions
are associated with the same local uid. The (local) user that submits the jobs
is the same (local) user that owns the container.

This situation can lead to a security vulnerability:
depending on the local configuration of the disk resources, if a (grid) user
can submit jobs to the BES to run arbitrary code as the same (local) user as
the container, that job will have access to all of the state (including grid
credentials, files, etc) stored on the local file system.

To protect the container and other grid resources from the
jobs, the BES may be configured to run the jobs as a unique local user account,
which has limited permissions within the file system and execution
environment. This account can be configured to have access to only the files and
directories specifically for that job, and thereby protect the container and
the local operating system. This is accomplished using Linux's built-in command
“sudo”, which changes the effective user for the process as it runs.

In the following, we will assume that the container was
installed and runs as the user “besuser”. The effective user for running jobs
will be “jobuser”. Setting up the local system (sudo, users, etc) requires
administrative rights on the local machine, so it is assumed that “besuser” has
the ability to execute administrative commands. All commands that require this
permission will start with “sudo” (e.g: sudo adduser
jobuser). Some aspects of configuration are common among any deployment
using this mechanism, while other aspects depend on the type of BES (Fork/Exec
or Native Queue) or the capabilities of the local operating system. These are
described below.

To enable execution of jobs as a unique user, first that
user must exist on the local operating system. It is recommended that a new
user is created specifically for the task of running jobs, with minimal
permissions on system resources.

Within the local operating system, create “jobuser”:

sudo adduser jobuser

Set jobuser's default umask to enable group access, by adding
umask 0002” to $HOME/.bashrc or similar. This will
ensure that any files created by a running job can be managed by the container
once the job has terminated.

Grant jobuser access to any shared resources necessary to
execute jobs on the current system, such as job queues, or shared file-systems.

When a BES container sets up the working directory for a
job, the files it creates/stages in are owned by the besuser. The jobuser must
have access permissions to these files to successfully execute the requested
job. There are two mechanisms by which the system may grant these permissions:
groups or extended access control lists (Extended ACLs).

Extended ACLs are the preferred method for extending file
permissions to another user, and are available in most modern Linux
deployments. They provide a files owner the ability to grant read, write, and
execute permissions on a per-user basis for each file. Compare this to Linux
Groups, where every user in the group receives the same permissions.

In the following commands, we assume the Job state directory
for the BES will be the default location at $GENII_USER_DIR/bes-activities.
If it is configured to be located somewhere else in the file system, adjust the
commands below accordingly.

If the BES is to be configured using Extended ACLs:

# Set the default
access on the Job state directory and its children,
# so permission propagates to new job directories:
sudo setfacl -R --set d:u:besuser:rwx,d:u:jobuser:rwx
$GENII_USER_DIR/bes-activities

Once the Job state directory has been configured, either
with groups or Extended ACLs, the ability to execute jobs as the jobuser must
be granted to the besuser. This is accomplished using Linux's built-in “sudo”
command. To enable a user to user “sudo”, an administrator must add an entry
into the “sudoers” file. This entry should limit the set of commands that the
user may execute using “sudo”, or no actual security is gained by creating
another user. It is recommended that “sudo” only be granted for the specific
commands required to launch the job.

If the BES is to be a Fork/Exec flavor BES, the ability to
run with “sudo” should be granted only to the job process wrapper executable.
This executable is included with the Genesis II deployment and is located in
the Job state directory. The executable used depends on the local operating
system, but the filename will always begin with “pwrapper”. To grant “sudo” ability
for this executable, add an entry like the following to the file
“/etc/sudoers”:

besuser ALL=(jobuser)
NOPASSWD: {jobStateDir}/{pwrapperFile}

Where {jobStateDir}
is the full path to the Job state directory, and {pwrapperFile} is the filename of the process
wrapper executable. Note that “sudo” does not dereference environment
variables, so the full path must be specified in the entry. For example, if the
Job state directory is located at “/home/besuser/.genesisII-2.0/bes-activities”
and the operating system is 32-bit Linux, the process wrapper executable will
be “pwrapper-linux-32”, the “sudoers” entry should be:

Once “sudo” has been granted to besuser, it may be necessary
to restart the operating system before the changes take effect.

Once the local configuration is complete, a BES should be
created which utilizes the “sudo” capability. This is accomplished by specifying
a “sudo-pwrapper” cmdline-manipulator type in the construction properties for
the new BES. An example construction properties file is included below, which
we will call sudo-pwrapper.xml.

Note that there are two parameters in the above example that
require system-specific values: the “target-user” element which is shown with
value {jobuser} and the “sudo-bin-path” element
which is shown with value {sudo}. {jobuser} should be the user name of the account under
which the jobs will execute (the “jobuser” in all of the examples provided),
and {sudo} should be the absolute path to the sudo
executable (e.g. “/bin/sudo”).

If the BES is to be a Native Queue flavor BES, the ability
to run with “sudo” should be granted only to the queue executables, e.g.
“qsub”, “qstat”, and “qdel” on PBS-based systems. To grant “sudo” ability for
these executables, add an entry like the following to the file “/etc/sudoers”:

Where {bin-path} is the full
path to the directory where the queuing system executables are located, and {qsub}, {qstat}, and {qdel} are the filenames of the queuing system
executables for submitting a job, checking a job's status, and removing a job
from the queue, respectively. Note that “sudo” does not dereference environment
variables, so the full path must be specified in the entry. For example, if the
queue executables are installed in the directory “/bin”, and the native queue
is PBS, the “sudoers” entry should be:

besuser ALL=(jobuser)
NOPASSWD: /bin/qsub, /bin/qstat, /bin/qdel

Once “sudo” has been granted to besuser, it may be necessary
to restart the operating system before the changes take effect.

Once the local configuration is complete, a BES should be
created which utilizes the “sudo” capability. This is accomplished in the
construction properties for the BES by prefacing the paths for the queue
executables with the sudo command and parameters to indicate the jobuser. An
example snippet from a construction properties file is shown below.
Substituting these elements for the corresponding “pbs-configuration” element
in the cons-prop.xml shown above will result in a
construction properties for a sudo-enabled variant of the native-queue BES,
which we will call sudo-native-queue.xml.

Note that there are three parameters in the above example
that require system-specific values: {jobuser}, {bin-path}
and {sudo}.{jobuser}
should be the user name of the account under which the jobs will execute (the
“jobuser” in all of the examples provided), {bin-path}
should be the absolute path to the directory where the queue executables are
located (same as the “sudoers” entry above), and {sudo}
should be the absolute path to the “sudo” executable (e.g. “/bin/sudo”).

One of the strengths of the Genesis II GFFS software is its
ability to connect heterogeneous resources into one unified namespace, which
provides access to the full diversity of scientific computing facilities via a
standardized, filesystem interface. This section describes how to link
resources into the GFFS from other sources, such as the UNICORE BES
implementation of EMS and PBS-based queues for job processing.

To set up the BES wrapper on a machine that will submit to a
queuing system, the user should know properties for the cluster configuration
such as memory, number of cores on each node, the maximum slots that can be used
to submit jobs on the cluster, and any other relevant options. These properties
need to be specified in the construction-properties file which is used while creating
a BES resource. Also the grid administrator should have already installed a Genesis
II container on the head or submit node of PBS or similar job submission system.
A sample construction properties file is below, which we will call cons-props.xml:

In the above construction-properties file, the element <ns2:nativeq
shared-directory>specifies
the shared directory where job state of all the jobs submitted to that BES will
be stored. A unique directory gets created for each job when a job gets scheduled
on the BES and it is destroyed when the job completes. This path should be visible
to all the nodes on the cluster and hence should be on a cluster wide shared directory.

To configure Scratch space on BES, a special file called ScratchFSManagerContainerService.xml specifying path to scratch
space should be created in the deployments configuration's cservices directory
($GENII_INSTALL_DIR/deployments/$DEPLOYMENT_NAME/configuration/cservices).

When users submit jobs that stage-in and stage-out files, the
BES download manager downloads these files to a temporary download directory. If
it is not explicitly configured while setting up the BES, it is created in container's
state directory $GENII_USER_DIR/download-tmp. Usually container state directory is stored
in local path and download directory should be on a shared directory like Job directory
and Scratch directory. Also if $GENII_USER_DIR/download-dir and scratch directory are not on the same partition, BES may
not copy/move the stage-in/stage-out files properly between download and Scratch-directory.
It is highly advised they be on the same partition.

To configure the download directory, the path should be specified
in a special file called DownloadManagerContainerService.xml
that is located in the deployment's cservices directory ($GENII_INSTALL_DIR/deployments/$DEPLOYMENT_NAME/configuration/cservices).

A BES can be configured with specific matching parameters to
direct jobs that might need these properties specifically to execute the jobs. Example
some clusters may support MPI while some clusters may be 32-bit compatible while
others can be 64-bit compatible. If jobs need certain requirement to be met, then
those jobs will specify the requirements in the JSDL. The queue will match those
jobs to BESes where these attributes are available. To set matching parameter user
grid command 'matching-parameters' command.

For example, to add or specify that a particular BES supports
MPI jobs, run this command on the queue:

The main consideration for adding a PBS Queue to a Genesis
II Queue is to wrap the PBS Queue in a Genesis II BES using a construction
properties file. The example cons-props.xml above shows such a file for a PBS
Queue with MPI capability. Given an appropriate construction properties file
for the system, these steps will create the BES and add it as a resource to the
queue.

F.6.3.1.UNICORE
Interoperation with XSEDE approved certificates

The following assumes that the Genesis II Queue container is
using an approved XSEDE certificate for its TLS certificate(s) and that only
XSEDE MyProxy-based users need to be supported. This also assumes that the
UNICORE gateway, UNICORE/X and TSI components are already installed and
available for use.

Note that the maximum SOAP header size may need to be
modified, if the grid users sending jobs to UNICORE will be in a substantial
number of groups. When the maximum size is too small, there will be complaints
in the UNICORE gateway log. The header size can be adjusted by changing the line
in the gateway/conf/gateway.properties file as follows (below permits
approximately a 400 kilobyte SOAP header):

gateway.soapMaxHeader=409600

Additionally, the user configuring the UNICORE BES must be logged
in with a grid account that permits creating the necessary links in the GFFS
(such accounts include the keystore login for the grid, grid administrator
accounts, or groups created for this purpose by the grid administrators).

Add the CA certificate of all valid TLS identities for SAML
credentials into the UNICORE/X directory-based trust store. This should
include the MyProxy CA certificates, at the very least:

A simpler method than the above can be used to point the UNICORE/X
trust store at all valid XSEDE CA certs. This involves editing “unicorex/conf/uas.config”
to add this line:

genii.trusted.dir=/etc/grid-security/certificates

Given that the container’s own TLS certificate is an
official XSEDE-approved certificate, the UNICORE gateway trust store should already
allow connections from the container. However, the UNICORE/X component
requires one of the above methods to trust the XSEDE user identity certificates
before the queue can successfully submit jobs on behalf of XSEDE users.

XSEDE users are mapped to local operating system users as
part of UNICORE authentication. To enable a new XSEDE user, add the XSEDE
portal certificate into the grid map file. (This may already have been done on
official XSEDE hosts.)

# edit the
grid-mapfile and add a line similar to this (the grid-mapfile is
# usually found in /etc/grid-security/grid-mapfile):"/C=US/O=National
Center for Supercomputing Applications/CN={XSEDE_NAME}" {UNIX_USER_NAME}

The {UNIX_USER_NAME} above is
the identity that will be used on the local system for running jobs. This
should be the same user that installed the UNICORE software. The {XSEDE_NAME} above is the XSEDE portal user name for
your XSEDE MyProxy identity. This information can be obtained by
authenticating with xsedeLogin with the grid
client and then issuing a whoami command:

# user and password
will be prompted for on console or in graphical dialog:
grid xsedeLogin
# alternatively they can both be provided:
grid xsedeLogin --username=tony --password=tiger

# try this if there are weird problems with console version of login:
unset DISPLAY
# the above disables a potentially defective X windows display; try logging
# in again afterwards.

# finally… show the XSEDE DN.
grid whoami --oneline

Acquire the CA Certificate that generated the certificate being
used for the UNICORE Gateway and for UNICORE/X (this can be one certificate, or
two if they are generated separately by different CAs). Add that into the
trusted-certificates directory on the Genesis Queue container. Repeat this
step on any containers or clients where you would like to be able to directly
connect to the UNICORE BES. If all users will submit jobs via the queue, then
only the queue container needs to be updated:

# Example with
unicore container using u6-ca-cert.pem and a Genesis
# deployment named ‘current_grid’. In reality, this may involve more than
# one host, which would lead to a file transfer step and then a copy.
cp $UNICORE_INSTALL_DIR/certs/u6-ca-cert.pem \
$GENII_INSTALL_DIR/deployments/current_grid/security/trusted-certificates

Alternatively, since the UNICORE TLS certificate is assumed
to be generated using XSEDE CA certificates, then the following step is
sufficient (rather than copying individual certificates):

The unicorex-tls-cert is the
certificate used by the UNICORE/X container for TLS (aka SSL) communication.
Note that this is different from the UNICORE CA certificate in the last step;
this should be the actual TLS certificate and not its CA. Also, be sure to
provide the UNICORE/X TLS certificate rather than the Gateway TLS certificate
(if these are different); otherwise trust delegations cannot be extended by UNICORE/X
(in the uas-genesis component) and grid stage-in and stage-out will not work in
submitted jobs. The bes-url has the following form (as documented in the
UNICORE manual installation guide, at http://www.unicore.eu/documentation/manuals/unicore/):

The resource-name above can be chosen freely, but is often
named after the BES that it links to.

Adjust the number of jobs that the queue will submit to the
BES simultaneously (where jobsMax is an integer):

grid qconfigure
{/queues/theQueue} {resource-name} {jobsMax}

To remove the UNICORE BES at a later time, it can be
unlinked from the queue by calling:

grid unlink
{/queues/theQueue}/resources/{resource-name}

F.6.3.2.UNICORE
Interoperation with non-XSEDE Certificates

The steps taken in the previous section are still necessary
for setting up a UNICORE BES when one does not possess XSEDE-approved
certificates. However, to configure security appropriately to let users and
GFFS queues submit jobs, there are a few additional steps required for
non-XSEDE grids.

For each TLS identity that will connect directly to the BES,
add the CA certificate that issued the certificate into the Gateway and UNICORE/x
trust stores. If there are multiple certificates in the CA chain, then each
should be added to the trust stores.

Once the users’ CA certificates and the queues’ CA
certificates have been added, the UNICORE BES can be configured as described in
the prior section, and then it should start accepting jobs directly from users
as well as from the queue container.

F.6.3.3.Debugging UNICORE
BES Installations

If there are problems inter-operating between the GFFS and UNICORE,
then it can be difficult to determine the cause given the complexity of the
required configuration. One very useful tool is to increase logging on the UNICORE
servers and GFFS containers involved.

For UNICORE, the “debug” level of logging provides more
details about when connections are made and why they are rejected. This can be
updated in the gateway/conf/logging.properties
file and also in the unicorex/conf/logging.properties
file. Modify the root logger line in each file to enable DEBUG logging as
follows:

log4j.rootLogger=DEBUG,
A1

For the Genesis II GFFS, the appropriate logging
configuration files are in the installation directory in lib/production.container.log4j.properties
and lib/production.client.log4j.properties. For
each of those files, debug-level logging can provide additional information
about job submissions by changing the rootCategory line accordingly:

Create a Native Queue BES using a construction properties
file (for use with BES creation) that specifies the MPI types supported by the
cluster along with the syntax for executing MPI jobs and any special
commandline arguments. The “manipulator-variation” structure under the
“cmdline-manipulators” structure specifies the MPI related details for the
cluster. The “supported-spmd-variation” field gives the SPMD type as per the
specification. The “exec-command” filed specifies the execution command for running
MPI jobs on the cluster. The “additional-arg” field specifies any additional
command-line arguments required to run an MPI job on the cluster. An example
construction properties file for the Centurion Cluster is provided below.

Educational institutions may wish to participate in the XSEDE
grid to share computational resources, either by utilizing the resources XSEDE
already has available or by adding resources from their campus computing
clusters to the XSEDE grid for others’ use. There are a few requirements for
sharing resources in this way, and they are described in the following
sections.

One primary requirement for using XSEDE resources is to
obtain an XSEDE portal ID. The portal ID can be obtained from the XSEDE
website at http://xsede.org. Once the ID is
obtained, the user’s grid account needs to be enabled by a grid admin. The
XSEDE ID can then be used to log into the XSEDE grid.

A grid user can create files and directories within the
GFFS, which is required for adding any new resources to the grid. Further, the
XSEDE grid account enables the grid user to be given access to existing XSEDE
grid resources.

Another primary requirement for campus bridging is to link
the campus identity for a user into the XSEDE grid. After the campus user has
obtained an XSEDE grid account, she will have a home folder and a user identity
within the XSEDE grid. However, at this point the XSEDE grid has no connection
to the user’s identity on campus. Since the campus identity may be required
to use the campus resources, it is important that the user’s credentials wallet
contain both the campus and XSEDE identities.

For example, campus user identities may be managed via a
Kerberos server. By following the instructions in the section on “Using a
Kerberos STS”, an XSEDE admin has linked the STS for campus user “hugo” at
“/users/hugo”. Assuming that the user’s XSEDE portal ID is “drake” and that
identity is stored in “/users/drake”, the two identities can be linked together
in the XSEDE grid with:

# give drake the
right to use the hugo identity.
grid chmod /users/hugo +rx /users/drake

The XSEDE user drake will thus automatically attain the
identity of the campus user hugo when drake logs in. After this, drake will
seamlessly be able to utilize both the XSEDE grid resources as drake and the
campus resources as hugo.

Campus researchers may wish to share their local compute
resources with others in the XSEDE grid. In order to do this, the campus user
should wrap the resource as a BES service and link it to the grid as described
in the section on “How to Create a BES with Construction Properties”. That
resource can then be added to a grid queue or queues by following the steps in
the section “Linking a BES as a Queue Resource”.

Assuming that the BES is successfully linked to a grid
queue, users with rights on the grid queue should be able to send compute jobs
to the linked campus resource automatically. If it is desired to give an
individual user the privilege to submit jobs directly to the BES, this can be
done with the “chmod” tool. For example, the user “drake” could be given
access to a newly-linked PBS-based BES as follows:

Campus researchers may wish to use the resources already
available in the XSEDE grid. At its simplest, this is achieved by adding the
user to a grid group that has access to the queue possessing the desired
resources. The user can also be given individual access to resources by using
chmod, as detailed in the last section.

This situation can become more complex when the resources
are governed by allocation constraints or other jurisdictional issues. This
may require the user to obtain access through consultation with the resource
owners, or to take other steps that are generally beyond the scope of this
document.

Once a grid configuration has been established and the
queuing and computational resources are set up, there are still a number of
topics that come up for day to day grid operation. These include managing
users and groups, performing backups and restores on container state, grid
accounting and other topics. These will be discussed in the following
sections.

User and group identities exist in the GFFS as they do for
most filesystems. These identities can be given access rights on grid resources
using some familiar patterns from modern day operating systems. It is useful
to keep in mind that a user or a group is simply an identity provided by an IDP
(Identity Provider) that the grid recognizes. IDPs can be provided by Genesis
II, Kerberos and other authentication servers.

Creating and managing user identities in the GFFS requires
permissions on the {containerPath} in the
following commands.

It is much more serious to remove a group than a simple
user, because groups can be used and linked in numerous places. This is
especially true for resources, which administrators often prefer to control
access to using groups rather than users. But in the eventuality that a group
must be removed, here are the steps:

Unlink any users from the group:

grid unlink /users/{userName}/{groupName}

Omitting the removal of a group link from a user's directory
may render the user unable to log in if the group is destroyed.

# Clear up any access
control lists that the group was involved in.
grid chmod {/path/to/resource} 0 /groups/{groupName}

# Remove the group identity
from the /groups folder.
grid unlink /groups/{groupName}

It is often necessary to change a user's password after one
has already been assigned. For the XSEDE logins using Kerberos and MyProxy,
this cannot be done on the Genesis II side; the user needs to make a request to
the XSEDE administrators (for more information, see http://xsede.org).
But for standard Genesis II grid IDP accounts, the password can be changed
using the following steps:

First remove the existing password token using the grid
client, started with:

grid client-ui

Navigate to the appropriate user in the /users folder, and remove all entries that are marked as
(Username-Token) in the security permissions.

(Alternatively, this command would normally work for the
same purpose, but currently there is a bug that prevents it from removing the
existing username&password token:

The grid administrator can create an STS based on Kerberos
that will allow users to use the Kerberos identity as their grid identity.
This requires an existing Kerberos server and an identity on that server. To create
an STS for the grid that uses the server, do the following:

# User can then log
in like so...
grid login {containerPath}/Services/KerbAuthnPortType/{kerberosUserName}

The first step created an STS object in the GFFS under the
specified Kerberos service and user name. This path can then be relied upon
for logins as shown. Linking to a /users/kerberosUserName
folder (as is done for IDP logins) may also be desired. See the next section
for a more complete example of how an XSEDE login is created using both
Kerberos and MyProxy.

This procedure is used to create user identities that are
suitable for use with the XSEDE grid. Users of this type must log in using the
“xsedeLogin” command. It is necessary for the user's account to be enabled on
both the XSEDE kerberos server (which requires an XSEDE allocation) and the
XSEDE myproxy server.

To create an XSEDE compatible user as an administrator,
follow these steps (if there is no administrator for Kerberos Users yet, see
the end of this section):

The process for removing an XSEDE compatible user is
identical to the process for removing a standard grid user (Section G.1.5), except for the last step. For an XSEDE compatible user, the last step is:

By itself, authenticating to a Kerberos KDC for a user is
not enough to ensure that the user is properly vetted. Kerberos authorization
to a service principal is also needed for the STS to fully authenticate and
authorize the user against the Kerberos realm.

This needs to be configured on each container that
participates as a Kerberos STS. For a small grid, this may only be the root
container of the RNS namespace, or a complex grid may have several STS
containers. Each of these STS containers must have a separate service
principal created for it, and the container must use the keytab corresponding
to that service principal. Both the service principal and the keytab file must
be provided by the realm’s Kerberos administrator.

Once
the keytab and service principal have been acquired, the container owner can
set up the container to use them by editing the “security.properties” file
found in the deployment’s “configuration” folder. This assumes the container
is using the “Split Configuration Model” provided by the interactive installer
(see below for configuring RPM installs with the “Unified Configuration
Model”). The keytab file should be stored in the deployment’s “security” folder,
rather than the “configuration” folder. The security.properties file has internal
documentation to assist configuring, but this section will go over the
important details.

When using
an RPM or DEB install package in the “Unified Configuration Model”, the file
“$GENII_USER_DIR/installation.properties” should be edited instead of the
deployment’s “security.properties”. The storage folder for keytab files based
on an RPM or DEB install is “$GENII_USER_DIR/certs” instead of the deployment’s
security folder.

When
requesting a service principal from the Kerberos realm’s admistrator, it is
recommended to use the following form:

gffs-sts/STS_HOST_NAME@KERBEROS_REALM_NAME

This
naming convention makes it clear that the service in question is “gffs-sts”, or
the GFFS Server Trust Store. It includes the hostname of the STS container as
well as the Kerberos realm in which the service principal is valid. An example
of a “real” service principal is below:

gffs-sts/KHANDROMA.CS.VIRGINIA.EDU@TERAGRID.ORG

This is
the service principal that a testing machine uses to authenticate to the
TERAGRID.ORG realm maintained by XSEDE.

The
Kerberos administrator will also provide a keytab file for this service
principal. It is crucial that this keytab file be used on just a single STS
host. This file does not participate in replication of Kerberos STS within the
GFFS, and it should not be copied off machine or replicated by other means.

The
container’s security.properties file (or installation.properties) records the
container’s Kerberos authorization configuration in two lines per STS host.
One specifies the service principal name, and the other the keytab file. Each
entry has the Kerberos realm name appended to the key name, making them unique
in case there are actually multiple Kerberos realms being used by the same
container.

The
key name “gffs-sts.kerberos.keytab.REALMNAME” is used to specify the keytab
file. The keytab should be located in the “security” folder of the deployment.

The key
name “gffs-sts.kerberos.principal.REALMNAME” is used to specify the service
principal name for the realm.

Here is
another real-world example for the “khandroma” service principal (lines have
been split for readability, but these should each be one line in the
configuration and should not contain spaces before or after the equals sign):

The
name of the keytab file is provided without any path; the file will
automatically be sought in the same deployment’s “security” folder (or state
directory “certs” folder for RPM/DEB install).

Testing
the Kerberos configuration can be done by creating an XSEDE compatible user
(see prior section) and attempting to log in as that user. This requires a
portal account with XSEDE and an XSEDE allocation. Warning: it may not be
appropriate to test the Kerberos authentication yet if setting up an
XSEDE-style grid; testing should not be done until after the STS migration
process has occurred.

It may
be interesting to note that even after divulging all this critical security
information about the khandroma container in the discussion above, no breach of
security has been accomplished. This is true because the keytab for this
service principal has not been provided, and one will not be able to
successfully authenticate to this service principal without it.

If a
keytab is accidentally divulged, that is not a total calamity, but it is
important to immediately stop that container from authorizing the Kerberos
realm affected by the exposed keytab file and to request a new keytab from the
Kerberos administrator. Once the new keytab is deployed in the container,
normal authorization can resume. After the Kerberos administrator has
generated the new keytab, the older one will no longer authorize properly and
so the security risk has been mitigated.

As described in the section “Logging in with InCommon”
(Section E.2.2.6), the iclogin tool allows a user to log in using credentials
for an InCommon IDP. In order to accommodate this tool, a link must be
established between the InCommon identity and another existing grid identity
which has access permissions on the intended resources. The target identity may
be any of the existing STS types (Kerberos, X509, etc).

The first step is to determine the InCommon identity's
Certificate Subject, as follows:

Navigate a browser to https://cilogon.org,
and log on with the InCommon credentials. For example, the user might select
the ProtectNetwork identity provider, and then click the “Log On” button. This
will redirect the browser to that IDP's login page. The user will then provide
appropriate credentials for that IDP to login. The browser will then redirect
to the CILogon page. At the top of the page is listed the Certificate Subject
for the current user. For example, for an example user “inCommonTester” this
string might be:

/DC=org/DC=cilogon/C=US/O=ProtectNetwork/CN=IC
Tester A1234

This information may also be retrieved from an instance of the
user's certificate, if the administrator has been provided with a copy for this
purpose.

This will place a copy of the certificate in the current
local directory.

Assuming the administrator is currently logged in to his own
grid credentials, the next step is to add execute permissions to the target
credentials for the CILogon certificate. Using the example certificate subject
above, and an example XSEDE STS at “/users/xsede.org/xsedeTester”, the
administrator would run the following command:

The user may now authenticate using the iclogin tool and
their InCommon IDP's credentials.

Note that, at this time, the STS link must be in the
“/users/incommon.org” directory, and must be named with the InCommon IDP
username used to log in. The iclogin tool assumes this location when looking up
the grid credentials once the IDP authentication is complete. A more robust
solution for linking InCommon identities with grid credentials is in development.

The grid containers are important assets that the grid
administrator must ensure continue to operate, even in the face of hardware
failures. Thus it is important to have backups for the container's run-time
state, especially for those containers that hold critical assets for the GFFS.
Thus it is especially important that the root container is backed up, because
there really is no grid without it. The following sections discuss how to stop
a container, how to back it up and restore it, and how to start the container
running again. The backup procedure should be done regularly on all critical
containers.

The grid container process does not have a shutdown command
as such, but it responds to the control-C (break) signal and stops operation.
There are many different methods that would work to cause the container to shut
down. The easiest case is for when the Genesis II Installation Program was
used to install the container, but for source-based installs we also document
how to use the Linux command-line tools to shut the container down and how to
use a script in the GFFS Toolkit to stop the container.

Archiving the data from the root GFFS container can take
hours, or even days, depending on the amount of data stored on the root. This
may need to be taken into account for scheduling the container down time.

G.2.3.1.Automated Container Backup

The backup process has been automated in a script available
in the GFFS Toolkit (documented in section I), located in $GFFS_TOOLKIT_ROOT/library/backup_container_state.sh.
The container should manually be stopped before running the script, and
manually restarted afterwards. For example:

G.2.3.2.Manual Container Backup

The procedure below describes how to save a snapshot of a Genesis
II container's run-time state. This includes all of its databases, which in
turns contain the RNS folders and ByteIO files that live on the container.
These steps should work with any container.

When backing up the root GFFS container's data, note that
this can be a truly huge amount of data. If users tend to rely on storing
their data files in their home folder, and that folder is located on the root
GFFS container, then the administrator is backing up all of those data files
when the root container is backed up. This is one reason it is recommended to
share the load for home folders by storing them across other containers (see
the section on “Where Do My Files Really Live” for more details).

To backup a container, use the following steps. Note that
it is expected that GENII_USER_DIR is already set
to the right location for this container:

G.2.4.1.Automated Container Restore

The restore process above has been automated in a script
available in the GFFS Toolkit (documented in section I), located in $GFFS_TOOLKIT_ROOT/library/restore_container_state.sh.
The restore script relies on the backup having been produced by the corresponding
backup_container_state.sh script. The container should manually be stopped
before running the restore script, and manually restarted afterwards.

There are two restoration scenarios that may be encountered;
either the container data has been trashed, or the installation itself has been
trashed. This first situation, where only the container data needs to be
restored, is taken care of by the basic restoration process:

If the installation itself has been damaged, then additional
steps may be needed. Note that this should only ever be a concern for an
interactive installation with the “Split Configuration” model; for the RPM
installation or Unified Configuration installation, the above process is
sufficient. But the split configuration approach stores some configuration
data for the container in the installation directory, and more steps are needed
to completely restore both the damaged data and configuration.

Before the “split configuration” restore is attempted, a
healthy installation of the appropriate version of Genesis II GFFS should be
installed. This installation does not need to be configured identically to the
container being restored, as the configuration information will be put back into
place in the next steps. Once the installation is available, these steps
should perform a full repair of the configuration:

For deployments other than XCG or XSEDE, the actual
deployment name of “current_grid” may differ. The real deployment name will be
visible in the deployments folder of the install.

G.2.4.2.Manual Container Restore

When the grid container has been backed up and saved at an
external location, the grid administrators are protected from catastrophic
hardware failures and can restore the grid to the state of the last backup.
This section assumes that the administrator is in possession of such a backup.

First, stop the container as described in the section “How
to Safely Stop a Grid Container”.

# Make a temporary
folder for storing the state.
mkdir $HOME/temporary
cd $HOME/temporary

# Clean up any
existing run-time state and recreate the state directory.
rm -rf $GENII_USER_DIR
mkdir $GENII_USER_DIR

Replication in the GFFS can be used for fault tolerance and
disaster recovery. For example, replication can be used to create a fail-over
system, where the loss of services of a crucial container does not necessarily
mean that the grid is down. Replication can also be used to create a backup
system that automatically copies assets that are modified on one container onto
a container at a different physical location, ensuring that even the total
destruction of the first container's host does not lead to data loss.

This section describes how to set up replicated files and
directories, and how to create the resolvers that are used to locate replicated
assets.

USE CASE: The user is creating
a new project. The project starts with an empty home directory, such as /home/project. The project’s home directory should be
replicated.

In this case, run these commands:

mkdir /home/project

resolver -p
/home/project /containers/backup

replicate -p
/home/project /containers/backup

The “resolver” command defines a “policy” that whenever a
new file or subdirectory is created under /home/project,
that new resource will be registered with a resolver in /containers/backup.

The “replicate” command defines a “policy” that whenever a
new file or subdirectory is created under /home/project,
that new resource will be replicated in /containers/backup.

That’s it. Whenever a file or directory is created,
modified, or deleted in the directory tree in the first container, that change
will be propagated to the backup container. Whenever a security ACL is modified
in the first container, that change will be propagated too. If the first
container dies, then clients will silently fail-over to the second container. If
resources are modified on the second container, then those changes will be
propagated back to the first container when possible.

USE CASE: The project
already exists. There are files and directories in
/home/project. These resources should be replicated, as well as any new
resources that are created in the directory tree.

In this case, simply add the -r option
to the resolver command:

resolver -r -p
/home/project /containers/backup

replicate -p
/home/project /containers/backup

The “resolver” command registers all existing resources with
a resolver, and it defines the policy for new resources. The “replicate”
command replicates all existing resources, and it defines the policy for new
resources.

USE CASE: The user wants
to replicate a handful of specific resources. No new replication policies
(or auto-replication of new resources) are desired.

In this case, omit the -p
option:

resolver
/home/project/filename /containers/backup

replicate
/home/project/filename /containers/backup

This case is only useful for certain unusual setups
involving hard links or other rarities.

In general, if fail-over is enabled for some file, then it
should also be enable for the file’s parent directory. In other words, the
directions for replicating an existing directory hierarchy should be used.

USE CASE: The user wants
to create a resolver for replicated files and directories. Or the user
wants to give other users access to a resolver, so that those users can create
new replicas that can be used for failover.

In
this case, create a resolver resource using the create-resource command:

USE CASE: The user is
configuring a new grid using Genesis II and would like the top-level
folders to be replicated, including the root folder (/) and the next few levels
below (/home, /users, /groups, etc.).

Adding a replicated root makes the most important top-level
folders available through the resolver. Should the root GFFS container be
unavailable, each of the items in the replicated folders is still available
from the mirror container. Currently only grid administrators may add
replication in the RNS namespace for the XSEDE grid.

Prerequisites
for Top-Level Folder Replication

·These steps should be performed on a separate client
installation, not on a container, to isolate the new context properly.

·On the separate client install, remove the folder pointed
at by $GENII_USER_DIR, which will start the process with a completely clean
slate. This is shown in the steps below.

·This section assumes that the root
container has already been deployed, and that a mirror container (aka root
replica) has been installed, is running, but is not yet configured.

·The user executing the following commands requires administrator
permissions via an admin.pfx keystore login. Note that if the admin
certificates for the root and replica containers are distinct, then one should
login with the keystore file for both the root and the replica container. Only
the applicable keystore logins for the containers involved should be performed;
do not login as an XSEDE user or other grid user first. For example:

This example sets up replication on the top-level grid folders
within the XSEDE namespace. Note that this example uses
the official host names for XSEDE hosts (e.g. gffs-2.xsede.org) and the default
port (18443). These may need to vary based on your actual setup:

# GENII_INSTALL_DIR
and GENII_USER_DIR are already established.
# This action is being taken on an isolated client install that points at the
new grid;
# do not run this on the root or root replica container!

# If no errors
occurred, the new replication-aware context file is stored in:
# $HOME/replicated-context.xml

Note: allow the containers 5-10 minutes to finish
replicating before shutting any of them down.

After replication has finished (and all containers seem to
be in a quiescent state), it is important to backup both the root and the
mirror container data (see section G.2.3 for backup instructions).

The replicated-context.xml file created by the above steps needs
to be made available to grid users within an installation package. It is
especially important to use this installation package for all future container
installs. Submit the file to UVa Developers (xcghelp@cs.virginia.edu)
for binding into an updated version of the grid’s installer program. Installations
of the GFFS that are created using the new installer will automatically see the
replicated version of the root.

Testing
Basic Replication

It is important to validate the grid’s replication behavior before
attempting to use any replicated resources. The new replication configuration
can be tested with the following steps:

·On the host of the root container, stop the root container
process:

bash $GFFS_TOOLKIT_ROOT/library/zap_genesis_javas.sh

·On a different host than the root, use the grid client and log
out of all identities (the remainder of the steps also use this client host):

grid logout --all

·List the root folder in RNS (/). If this does not show the
top-level replicated folders, then something is wrong with the replication
configuration:

grid ls /

·If the above test succeeds, try a few other publically visible
folders:

grid ls /users
grid ls /groups

Neither of
the above commands should report any errors.

If the commands above work as described, then basic RNS replication
is working properly. This is assured by having shut down the root container;
the only source of RNS records that is still active in the grid is the mirror container.

G.2.5.6.Replicating User (STS) Entries

The replicated STS feature is used similarly to the
replicated RNS & ByteIO feature. Suppose Joe represents a Kerberos
or X509-Certificate STS resource created for a user Joe. In addition, assume
that Joe has access to a group Group1 (so Group1 should be
a sub-directory under Joe in the global namespace). Suppose Joe
and Group1 reside in arbitrary containers and we want to replicate them.
Then the sequence of steps for replication should be as follows.

1.Associate a resolver with the two resources. This step is not required
if the users folder already has a resolver established:

If we want only Joe to be replicated but not Group1
then we drop the -r flag (indicating “recursion”) from the resolver command.

Note that the container hosting the resolver and the
container hosting replicas are different in the above example, but they do not
have to be different containers. However, neither of them should be the primary
container where Joe or Group1 are stored, as that defeats the
goal of replication.

To replicate the entire /users/xsede.org users hierarchy,
use similar steps:

1.Associate a resolver with the users hierarchy. Skip this step if a
resolver already exists on the users hierarchy:

G.2.5.7.Serving User and Group Identities from a Replicated STS Container

Initially all user and group identities will be stored on
the Root container. The authentication processing for the grid can be migrated
to a different container, possibly one reserved for managing STS resources such
as users. The following sections describe how to accomplish the move in the
context of the XSEDE namespace, where there are two STS containers (a primary
and a secondary which replicates the primary).

Prerequisites
for Migrating to an STS Container

·Ensure the root container’s top levels are already replicated
(see section G.2.5.5 if that has not already been done).

·The steps for migrating the STS should be executed on an
administrative client host that has been installed with the replication-aware
installer (produced in section G.2.5.5).

·The primary and secondary STS containers must be installed before
executing the migration process (section D.4 or D.5).

·The user executing the following commands requires administrator
permissions via an admin.pfx keystore login. One should login with the admin
keystore for the root, the root replica, and the STS containers. Only the
applicable keystore logins for the containers involved should be performed; do
not login as an XSEDE user or other grid user first.

Steps for
Migrating to a New Primary STS Container

This section brings up both STS containers and configures
them before any replication is added:

·Run the script below with the host name and port number for the two
STS servers. In this example, we use the official hostnames. Test systems
should use the appropriate actual hostnames instead.

# variables
GENII_INSTALL_DIR, GENII_USER_DIR and GFFS_TOOLKIT_ROOT have
# already been established.
# This action is being taken on an isolated client install that points at the
new grid.

# Run the STS migration script. The hostnames below are based on the official
# XSEDE hosts and should be modified for a test system.
bash $GFFS_TOOLKIT_ROOT/tools/xsede_admin/migrate_to_sts_containers.sh \
sts-1.xsede.org 18443 sts-2.xsede.org 18443

# If no errors
occurred, then the two STS containers are now handling any new
# user account creations and will authenticate users of these new accounts.

Add an
XSEDE User and Test Replication

The GFFS Toolkit provides scripts for adding users to the
namespace and for adding the users to groups. Add an XSEDE MyProxy/Kerberos
for testing as follows (this step requires being logged in as an account that
can manage other XSEDE accounts, such as by using the administrative keystores
for the grid):

The “whoami” command should print out the actual XSEDE user
name that was configured above, and should also show group membership in the
“gffs-users” group.

Allow the containers a couple of minutes to finish
replicating before shutting any of them down. Once the user and groups have
been replicated, the soundness of the replication configuration can be tested with
the following steps:

·On the host of the root container, stop the root container
process:

bash $GFFS_TOOLKIT_ROOT/library/zap_genesis_javas.sh

·Run the above command on the primary STS container also, to stop
that container.

·On an administrative client install with replication enabled
(i.e., not on the primary containers), use the grid client and log out of all
identities, then log in to the new user again:

grid logout --all
grid xsedeLogin --username={myPortalID}
grid whoami

If the login attempt above works and whoami still shows the
details of the appropriate user and groups, then the STS replication configuration
is almost certainly correct. Having shut down the primary STS container, the
only source of authentication active in the grid is the secondary STS
container. Similarly, all RNS requests must be served by the mirror container,
since the root RNS container is down.

There are four cases that completely test the failover
scenarios. The above test is listed in the table as test 1. To ensure that
replication has been configured correctly, it is advisable to test the
remaining three cases also:

1.Root
down & sts-1 down

2.Mirror
down & sts-1 down

3.Root
down & sts-2 down

4.Mirror
down & sts-1 down

If these steps are successful, then the new primary and
secondary STS containers are now responsible for authentication and
authorization services for the grid. Any new STS identities will be created on
the STS container rather than on the root container. Even if the root
container is unavailable, users will still be able to log in to their grid
accounts as long as one root replica and one STS container is still available. (Any
other required login services, such as MyProxy or Kerberos, must also still be
available.)

The Genesis II software supports client-side caching of GFFS
resources to reduce the impact on the containers that actually store the
files. By enabling this feature on a container, an administrator can allow
users to cache files on their own hosts rather than always accessing the files
on the container. This actually benefits both the administrator and the user,
because the administrator will see fewer remote procedure calls requesting data
from their containers and the users will see faster access times for the
resources they are using frequently.

To enable the subscription based caching feature, it is
necessary to add a permission on the port-type that implements the cache
service:

After the port type is enabled, a grid command’s most
frequently used files will automatically be cached in memory. If the
container's version of the file changes, the client is notified via its
subscription to the cached data item, and the cached copy is automatically
updated.

The Genesis II codebase offers some accounting features to
track usage of the grid queue and the number of processed jobs. This section
describes how to set up the accounting features and how to create a web site
for viewing the accounting data.

This information is intended for local grid administrators
(such as the XCG3 admins) and is not currently in use by the XSEDE project.

Create a
central database, we use MySQL (installed and maintained by CS department)

Create the
following tables in the Database table xcgaccoutingrecords, table
xcgbescontainers, table xcgcredentials, table xcgareccredmap and table
xcgcommandlines (Information on the table columns is found in Appendix A)

Link the
database into grid name space

Two
usernames to access this central DB, one has write privileges and the
other has read-only privileges. Username with write privileges will be
used to run the accounting tool and username with read-only privileges
will be used to get web GUI statistics.

Web server
Eg. webserver provided by CS department to be able to talk to Central
database

There are several pieces to
collecting and processing job accounting data for the XCG.

Raw job accounting data is kept on the home container of the
BES that runs the job. It will stay there forever, unless someone signals that
the container can delete accounting information up to a specified accounting
record ID (we call this “committing” the records). We use a grid command line
tool named “accounting” to collect accounting records, put the collected
data into a database (currently this is the department’s MySQL database), and
to commit all records collected on the container so that the container can
delete them from its local database. In order to support our on-demand online
accounting graphs, the raw accounting data collected from the containers must
be processed into a format that the graphing pages can use.

The raw accounting data is placed into 5 related tables by
the accounting collection tool. In order to make the data easier to process
for usage graphs, a stored procedure named procPopulateXCGJobInfo
crunches all of the data in the 5 raw accounting data tables and stores them in
2 derived “tmp” tables for use by the accounting graph php pages on the web
site.

The vcgr database currently contains 15 tables, only about
half related to the new way of doing accounting.

Accounting related tables set by accounting
collection tool

xcgaccoutingrecords
Each row holds the raw accounting information
for a single XCG job.

xcgbescontainers
Each row holds information about a BES container that has
had accounting data collected for it. The key for the BES record in this table
is used to match accounting records in the xcgaccoutingrecords table to
the BES container it ran on. Records in this table are matched during the
accounting record collection process based on the BES’s EPI, so re-rolling a
BES will cause a new BES entry to appear in this table. The besmachinename
field in this table is populated by the machine’s IP address when it is first
created and this is used by our accounting graphs as the name of the BES.
However, the besmachinename field can be manually updated to put in a
more human friendly name and to tie together records from two BES instances
that have served the same purpose. This is something I do periodically to make
the usage graphs more readable and informative.

xcgcredentials
Contains a list of every individual credential
used by any job. Multiple jobs using the same credential will share an entry
in this table. Since the relationship between jobs (xcgaccountingrecords)
and credentials (xcgcredentials) is many-to-many, the xcgareccredmap
table provides the relationship mapping between them. The credentialtype
field is set to NULL for new entries, but can be set to values “Client”,
“Service”, “User” or “Group” manually. I occasionally manually edit this field
to set the proper designation for new entries.

xcgcommandlines
Contains each portion of a job’s commandline – one entry
for argument (including the executable argument). The accounting tool did not
work properly at first and only recorded the last argument for jobs. This was
fixed sometime after initial rollout of the new accounting infrastructure.

In order to easily support reasonably fast creation of a
wide range of usage graphs, we use a stored procedure to create two tables to
store pre-processed denormalized information about jobs.

tmpXCGJobInfo This table
contains 1 row per job with pretty much everything in it that we can run a
report against. This includes our best guess the job’s “owner” in human
friendly terms, the bes container’s name, various run time information, and
information about the day, week, month, and year of execution.

tmpXCGUsers This
table is an intermediate table used by the stored procedure that creates the
tmpXCGJobInfo table. It really can be deleted as soon as the stored procedure
finishes – not sure why it isn’t…

The denormalization process deletes and re-creates the tmpXCGJobInfo
and tmpXCGUsers tables from data in the raw accounting tables.
Denormalization is done by running the procPopulateXCGJobInfo
stored procedure.

N.B. Besides denormalizing the data so that there is
single row per job, it also tries to figure out which user “owns” the job and
stores it’s best guess in the username field of the tmpXCGJobInfo table. This
field is used by the usage graphs to display/filter who ran a job.

The algorithm for doing so is imperfect, but must be
understood to properly understand the usage graph behavior. A job’s owner is
determined as the credential with the lowest cid (credential id) associated
with the job that is designated as a “User” (it may also require that the
credential has the string X509AuthnPortType in it). It then assumes that the
credential is in the format of those we mint for ordinary XCg users and
extracts the owner’s name from the CN field of the credential.

This process will only work if these conditions are met:

·The job has at least one credential minted as a normal user by
XCG in the usual format.

·The credential has been manually marked as a “User” credential in
the xcgcredentials table.

Users who use a different type of credential (e.g.
username/password or other outside credential), were run only by admin
(different cred format) or have not had their entry in the xcgcredentials
table manually updated to type “User” will be labeled as owned by “unknown”.

For denormalization, there are two main procedures: 1)
collection of raw data and 2) processing the data for the online graphs.

1) The grid tool “accounting” is used to collect raw
accounting data and store it in the CS department database. The tool takes
several options/arguments:

·--collect: tells the tools to do the
actual data collection. NOTE: unless the “--no-commit” flag is specified, the
--collect flag will commit all accounting records successfully collected from
the target container(s).

Typical use (as admin – other users will not have permission
to collect or commit accounting data from most/all containers):

accounting --collect
--recursive /containers /accounting/CS-MySQL

The tool uses grid containers, not BES containers as targets.
Even though the accounting records do identify which actual BES container the
job was associated with, the tools collect all of the data for all BESes it
contains at once.

1.We
can use the directory /containers recursively
because we try to maintain /containers such that
it has all of our useful containers in it and no other extraneous entries. This
helps simplify the process significantly.

2.We
use the RNS path /accounting/CS-MySQL as the target database. Mark set up this
RNS entry with the proper connection information for the vcgr database on the
department server to help the collection process. The tool can handle
extracting the connection info from the EPR contained by the RNS entry.

3.There
will be a prompt for the password for the vcgr_mmm2a account on the CS
department database server.

4.The
tool will collect data from each container in turn. Note exceptions – there
are sometimes entries left for containers that are no longer present in the /containers directory.

2) To process the data for the online graphs:

Log into
the department’s PHP MySQL web front end. Use a browser and go to . At
the login screen, use the vcgr_mmm2a account and password to login.

Go to the
vcgr database. Just click on the vcgr database entry on the left side of
the screen

Get an SQL
window. Click the SQL tab and a text area will appear where SQL
statements can be entered.

Run the
stored procedure procPopulateXCGJobInfo
(procedure definition found in the section below called Database Table
Structure for Accounting). In the SQL text area type “call procPopulateXCGJobInfo”
and click the Go button. It will run for several minutes and then the
screen will go blank when it is done. The procPopulateXCGJobInfo
stored procedure will erase the old tmpXCHJobInfo
and tmpXCGUsers tables and will repopulate
them after crunching whatever data is in the raw accounting tables. GUI
statistics can be looked up here http://vcgr.cs.virginia.edu/XCG/2.0/stats/usage_graphs/

The statistics gathered from the grid can be displayed in a
web site that supports queries based on operating system and time ranges. An
example implementation is provided within the GFFS toolkit, which is bundled
with the GFFS client installer and which is also available at the svn repository:

These files are an example only, and would need to be
configured appropriately for the site's Ploticus installation location, the usage
graph site's location on the web server, and the login information for the
statistics database.

This implementation uses PHP
and the ploticus application ( http://ploticus.sourceforge.net/doc/ welcome.html)
to build graphs per a user request. The figure below shows the running site,
with a form for requesting accounting info using the available filters. Given
a query with a given date range and a daily report, the output might resemble the
Usage Graph in the next figure.

The Genesis II software fully supports grid federation,
where resources can be shared between multiple grids. This enables researchers
to connect to a low-latency grid that is geographically convenient while still
sharing data and BES resources with researchers on other grids. The XSEDE
namespace provides a convenient method to achieve “grid isomorphism”, where the
locations of other grids’ resources can be found at the identical location in
RNS regardless of which grid one is connected to.

For example, the XSEDE Operations Grid is a Genesis II GFFS
grid that is maintained by the XSEDE project. The Cross-Campus Grid (XCG) is
also a Genesis II GFFS grid, but it is maintained by the University of
Virginia. Despite these grids being within very different administrative
domains, the users on XCG grid can log into their accounts and access their
home directories on the XSEDE grid. This is accomplished by linking parts of
the XSEDE grid into the XCG namespace structure.

The interconnections from XCG to XSEDE were created by the
XCG administrator. Each “foreign” grid is given a well-defined location in the
/mount directory where the remote grid is linked. For the XSEDE grid, the
top-level (/) of the grid has been linked into /mount/xsede.org. Listing the
contents of that folder shows the root of XSEDE’s grid; note that this command
is executed on an XCG grid client, not an XSEDE grid client:

This is the same list of folders one sees if one is
connected to the XSEDE grid and lists the top-level of RNS, but in this case,
it is visible via the link in the XCG grid.

To gain fully isomorphic grid folders, one makes links for
each of the major items in the foreign grid under the appropriate folders in
one’s own grid. For example, XCG has a folder for its local users called
/users/xcg.virginia.edu, but it also has a folder called /users/xsede.org for
the remote users that live in the XSEDE grid. From the XSEDE grid’s
perspective, it would have a link for /users/xcg.virginia.edu that connects to
the XCG grid. Using the foreign path, one can authenticate against the XSEDE
grid’s STS for a user even though one is connected to the XCG grid. This
provides for fine-grained access control across the multiple grids, and ensures
that the user can acquire whatever credentials are needed to use the remote
grid’s resources.

Similarly, the home folders of the XSEDE grid are available
in the XCG grid, as /home/xsede.org. This allows a person who is connected to
the XCG to access their remote files and directories that reside in their XSEDE
grid home directory. Using this capability, researchers can submit jobs that
stage files in and out from any grid that they have access to, and can share
their data with other researchers on any of these interconnected grids.

Making a non-local grid available on one’s home grid is
achieved by the following steps. In the actions below, we are using the
concrete example of linking the XSEDE grid into the XCG grid, but any two grids
could be linked in this manner. To achieve grid isomorphism, it is important
to pick an appropriate name for the foreign grid and for that name to be used
consistently across all federated grids. Otherwise, a path on grid X may be
named differently on grid Y, which will lead to many problems with creating
grid jobs that run seamlessly on either of the two grids (since expected
stage-out paths may not be there without consistent naming).

1.Acquire the root EPR of the foreign grid. This can be accomplished when
actually connected to the foreign grid, using its installer or grid
deployment. The step below is assumed to be running from the XSEDE grid
client:

grid ls -ed / | tail -n +2 | sed -e
's/^\s//' >xsede_context.epr

2.The above creates a context file that can be used to link the foreign
grid. This step must be performed using the grid client for one’s local grid
(which is about to be augmented with a link to the foreign grid):

grid ln --epr-file=local:xsede_context.epr
/mount/xsede.org

3.Test the new link by listing its contents. It should show the top-level
folders of the foreign grid:

grid ls /mount/xsede.org

4.If the prior step is unsuccessful, then it is possible the local grid
does not trust the remote grid. To establish trust between the grids, the CA
certificate of the remote grid’s TLS certificate(s) should be added to the
local grid’s trust store. Below, it is assumed that “current_grid” is the
specific deployment in use in the local grid and that “remoteCA.cer” is a CA
certificate that issued the remote grid’s TLS certificates. Adding more than
one certificate is fine, and the certificates can either be in DER or PEM
format:

5.After the remote grid can be listed successfully at its location in
mount, the remote hierarchies can be added to the local grid. These should
continue the naming convention established for the mount, so that isomorphism
is maintained between the grids:

6.Listing each of the new folders should show the appropriate type of
resources. With a successful top-level mount, this step should always succeed.

This procedure can be repeated as needed to federate other
grids alongside one’s own grid. A grid structured isomorphically is a joy for
researchers to use, since all paths are precisely arranged and named in a way
that makes their true home clear. Jobs can be executed on any BES or queue
that the researcher has access to, and the staging output can be delivered on
any of the connected grids that the researcher desires. In addition, one’s
colleagues on other grids can provide access to their research data in a
regular and easy to understand structure, even when the grids are in completely
different countries and administrative domains.

This section focuses on building the XSEDE GFFS components
from the Genesis II source code. Support for basic EMS (via the fork/exec BES)
is included in the Genesis II source; building the UNICORE EMS is not addressed
here. This section may be found useful by hard-core users who want to run
their container from source and by developers who want to fix bugs or add
features to the Genesis II software.

Note that at this time, development of the Genesis II
software is only supported in Java. The Genesis II components can be
controlled by a variety of methods (grid client, Xscript, client-ui), but the
Genesis II software is extended by writing new Java classes or modifying
existing ones.

The configuration of Java can be quite confusing to a
neophyte, but going very deeply into that process is beyond the scope of this
document. This section does explain the basics of setting up Java for building
and running Genesis II. It is expected that a Genesis II developer has prior
training in Java, but normal users of Genesis II should not need Java
proficiency.

Some of the Genesis II scripts rely on the JAVA_HOME
variable being set. This should point to the top directory for the JDK being
used to build Genesis II. For example, if the Java JDK is installed at
/usr/lib/jvm/java-8-oracle, then the JAVA_HOME variable could be set with:

export
JAVA_HOME=/usr/lib/jvm/java-8-oracle

To determine what version of java is in the path, run:

java -version

If that does not show the appropriate version, the PATH
variable may need to be modified. For example:

export
PATH=$JAVA_HOME/bin:$PATH

Enabling Strong Encryption in the JRE

Building and running GFFS containers requires that full-strength
JCE security is enabled for the Java Runtime Environment (JRE). Otherwise the
JRE will not allow the certificates to be generated with the necessary key
length. The unlimited strength JCE jars are available at: http://www.oracle.com/technetwork/java/javase/downloads/index.html

As an example, after downloading the unlimited JCE zip file
for Oracle Java 8 on Linux, the security jars might be updated with these
steps:

Ensure that the GENII_INSTALL_DIR has been set to point at
the source code location, that Java is installed with unlimited JCE encryption,
and that JAVA_HOME is set (see section H.1).

To perform the main build of the Genesis II trunk, change to
the source code location and run “ant -Dbuild.targetArch=32
build” (for 32 bit platforms) or “ant
-Dbuild.targetArch=64 build” (for 64 bit platforms). If neither
targetArch flag is provided, then a 64 bit build is assumed.

The ANT_OPTS above are required because the web-services
build requires more memory than the default amount allocated by ant.

It is important to rebuild the source code on the target
machine, rather than using a build from someone else, to ensure that any
embedded script paths are regenerated properly.

After building the source code, one needs a grid to test
against. If you have an existing grid and the necessary deployment
information, then that is sufficient. But if you want to test on an isolated
grid that is under your control, consult the GFFS Toolkit chapter I.2 on how to “How to Bootstrap a Miniature Test Grid” for details on setting up a
local grid for testing.

Eclipse is an integrated development environment for Java
and other languages, and many developers prefer to manage their coding process
with Eclipse. These instructions should assist an Eclipse developer to become
comfortable with building and debugging the Genesis II codebase.

Download the newest version of the Eclipse IDE for Java
Developers from http://www.eclipse.org/. The eclipse projects for Genesis II currently rely
on features found in the “Mars” version of Eclipse. As well as providing an
excellent software development environment, Eclipse can be used for debugging
with dynamic code injection, for call-hierarchy searches, and for Java code auto-formatting.

There is a plugin called Subclipse which integrates Eclipse
with SVN with GUI features for diff’ing workspaces, files, etc.

The Genesis II team has had success using a Java profiler
called “YourKit Profiler” which can be integrated with Eclipse.

When first running Eclipse, the user will be asked to select
a workspace. Do not specify a path that contains spaces (this just generally
makes life easier, although it may not be strictly necessary).

Note that use of subclipse is deprecated. Using basic
console “svn” commands to manage the checked out source code is sufficient for
most purposes.

Subclipse is a useful add-in for Eclipse that provides
subversion repository support. To obtain Subclipse, go to http://subclipse.tigris.org/. Click on "Download and Install". Follow
the instructions to install the Subclipse plugin in Eclipse. The best version
of the SVN client to use with our SVN server is version 1.6.x.

If Eclipse complains about not finding JavaHL on Linux, then
it may be that /usr/lib/jni needs to be added to the Java build path in
Eclipse. This article has more information about this issue: Failing
to load JavaHL on Ubuntu

·Add additional tags for “to-do” items to eclipse’s list. This
causes all of the to-do / fix-it notes in the GFFS code to show up under the
“Tasks” tab. Open the Eclipse settings using the “Window | Preferences” menu
item. Within the settings, navigate to the “Java | Compiler | Task Tags”
setting. Add the following tags to the list along with their priority:

future: Low
hmmm: High

H.4.3.1.Projects to Load in Eclipse

There is a main trunk project for Genesis II called GenesisII-trunk.
Once you have downloaded the Genesis II project source code, you can load this
using Eclipse’s “Import Existing Projects into Workspace” choice. Browse to
the folder where the trunk resides in the “Select root directory” field. Enable
the option “Search for nested projects”. Disable the option to “Copy projects
into workspace”. Select “Finish” to complete the project import. This should
now show several projects in the package explorer.

Loading the project will cause Eclipse to build its
representation of the Java classes. This will fail if an ant build has not
been done before (see above section for building from the command line). Once
an ant build has been done, select the “Project | Clean” menu item and clean
all projects; this will cause Eclipse to rebuild the classes.

H.4.3.2.Setting the “derived” Type on Specific Folders

Eclipse will search for classes in any directory that is
listed in a project. This sometimes is irksome, as it will find matches for
X.class as well as X.java, but X.class is a compiled class output file and is
not useful for reading or setting breakpoints. Luckily eclipse also provides
an approach for forcing it to ignore file hierarchies. Eclipse ignores any
folder that has the “derived” flag set on it. This can be applied to a
directory by right-clicking on it and selecting “Properties”. The resulting
pop-up window will show a Resource tab for the folder, with a check-box for a
derived attribute.

Note that each developer must set his own “derived”
attributes on folders, since these attributes are not stored in the project
file (they live in the user’s workbench).

It is recommended that any project’s generated file folders
be marked as derived, which includes the following folders:

The “libraries” folder is not generated, but its contents
are already provided by other projects.

After marking all of the folders that contain generated
.class files as derived, future searches for classes in eclipse should only match
.java files. This can be tested with the “open reference” command (Ctrl-Shift-r).

To build Genesis II, we use Ant. The two Ant targets that
are most often used: build and clean.

The “ant build” target performs the following activities:

1.Creates
directories for generated sources

2.Normalizes
our extended form of WSDL (GWSDL) into proper service WSDL

3.Runs
Axis WSDL2Java stub generation on our service WSDL to create the Java stub
classes used by client tools and the data-structure classes for representing
operation parameters within both client and server code.

4.Copies
the generated .wsdd files into the "deployments/default/services"
directory, so that the Axis web application can find the Java classes that
implement the port types