4.4. Executable Discovery (Transformation Catalog)

The Transformation Catalog maps logical transformations to physical
executables on the system. It also provides additional information about
the transformation as to what system they are compiled for, what profiles
or environment variables need to be set when the transformation is invoked
etc.

Pegasus currently supports a Text formatted Transformation
Catalog

Text: A multi line text based
Transformation Catalog (DEFAULT)

In this guide we will look at the format of the Multiline Text based
TC.

4.4.1. MultiLine Text based TC (Text)

The multile line text based TC is the new default TC in Pegasus.
This format allows you to define the transformations

The file is read and cached in memory. Any modifications, as
adding or deleting, causes an update of the memory and hence to the file
underneath. All queries are done against the memory representation. The
file sample.tc.text in the etc directory contains an example

tr tr - A transformation
identifier. (Normally a Namespace::Name:Version.. The Namespace
and Version are optional.)

pfn - URL or file path for
the location of the executable. The pfn is a file path if the
transformation is of type INSTALLED and generally a url (file:///
or http:// or gridftp://) if of type STAGEABLE

site - The site identifier
for the site where the transformation is available

type - The type of
transformation. Whether it is installed ("INSTALLED") on the
remote site or is availabe to stage ("STAGEABLE").

arch, os, osrelease,
osversion - The arch/os/osrelease/osversion of the
transformation. osrelease and osversion are optional.

ARCH can have one of the following values x86, x86_64,
sparcv7, sparcv9, ppc, aix. The default value for arch is
x86

OS can have one of the following values linux,sunos,macosx.
The default value for OS if none specified is linux

Profiles - One or many
profiles can be attached to a transformation for all sites or to a
transformation on a particular site.

To use this format of the Transformation Catalog you need to set
the following properties

pegasus.catalog.transformation=Text

pegasus.catalog.transformation.file=<path
to the transformation catalog
file>

4.4.1.1. Containerized Applications in the Transformation
Catalog

Users can specify what container they want to use for running
their application in the Transformation Catalog using the multi line
text based format described in this section. Users can specify an
optional attribute named container that refers to the container to be
used for the application.

tr example::keg:1.0 {
#specify profiles that apply for all the sites for the transformation
#in each site entry the profile can be overriden
profile env "APP_HOME" "/tmp/myscratch"
profile env "JAVA_HOME" "/opt/java/1.6"
site isi {
# environment to be set when the job is run in the container
# overrides env profiles specified in the container
profile env "HELLo" "WORLD"
profile env "JAVA_HOME" "/bin/java.1.6"
profile condor "FOO" "bar"
pfn "/path/to/keg
arch "x86"
os "linux"
osrelease "fc"
osversion "4"
# INSTALLED means pfn refers to path in the container.
# STAGEABLE means the executable can be staged into the container
type "INSTALLED"
#optional attribute to specify the container to use
container "centos-pegasus"
}
}
cont centos-pegasus{
# can be either docker or singularity
type "docker"
# URL to image in a docker|singularity hub OR
# URL to an existing docker image exported as a tar file or singularity image
image "docker:///rynge/montage:latest"
# optional site attribute to tell pegasus which site tar file
# exists. useful for handling file URL's correctly
image_site "optional site"
# mount information to mount host directories into container
# format src-dir:dest-dir[:options]
mount "/Volumes/Work/lfs1:/shared-data/:ro"
# environment to be set when the job is run in the container
# only env profiles are supported
profile env "JAVA_HOME" "/opt/java/1.6"
}

The container itself is defined using the cont entry. Multiple
transformations can refer to the same container.

cont cont - A container
identifier.

image - URL to image in a
docker|singularity hub or URL to an existing docker image exported
as a tar file or singularity image. Example of a docker hub URL is
docker:///rynge/montage:latest, while for singularity
shub://pegasus-isi/fedora-montage

image_site - The site
identifier for the site where the container is available

Profiles - One or many
profiles can be attached to a transformation for all sites or to a
transformation on a particular site. For containers, only env
profiles are supported.

Note

Containerized Applications can only be specified in the
transformation catalog, not via the DAX API.

4.4.2. TC Client pegasus-tc-client

We need to map our declared transformations (preprocess,
findrange, and analyze) from the example DAX above to a simple "mock
application" name "keg" ("canonical example for the grid") which reads
input files designated by arguments, writes them back onto output files,
and produces on STDOUT a summary of where and when it was run. Keg ships
with Pegasus in the bin directory. Run keg on the command line to see
how it works.

Now we need to map all 3 transformations onto the "keg"
executable. We place these mappings in our File transformation catalog
for site clus1.

Note

In earlier version of Pegasus users had to define entries for
Pegasus executables such as transfer, replica client, dirmanager, etc
on each site as well as site "local". This is no longer required.
Pegasus versions 2.0 and later automatically pick up the paths for
these binaries from the environment profile PEGASUS_HOME set in the
site catalog for each site.

A single entry needs to be on one line. The above example is
just formatted for convenience.

Alternatively you can also use the pegasus-tc-client to add
entries to any implementation of the transformation catalog. The
following example shows the addiition the last entry in the File based
transformation catalog.

The Pegasus project is supported by the National Science Foundation under the OAC SI2-SSI program, grant #1664162. Pegasus also receives support from the Department of Energy, the National Institutes of Health, Defense Advanced Research Projects Agency, and the USC Information Sciences Institute.