UNICORE Workflow System manual

JavaScript must be enabled in your browser to display the table of contents.

The UNICORE Workflow System provides advanced
workflow processing capabilities using UNICORE Grid resources.
Its main components are the Workflow Engine and the
Service Orchestrator. While the Workflow Engine provides high-level
control constructs (for-each, while, if-then-else, etc), the
Service Orchestrator contains a powerful, extensible resource
broker, and deals with execution of single UNICORE jobs.

1. Installing and setting up the UNICORE workflow servers

This chapter covers basic installation of the workflow system
and integration of workflow services into an existing UNICORE
Grid.

As a general note, the workflow services are organized into two
UNICORE/X instances termed "workflow server" and "servorch server".
General UNICORE configuration concepts (such as gateway
integration, shared registry, attribute sources) fully apply,
and you should refer to the UNICORE/X manual for details.

1.1. Prerequisites

Java 8 (JRE or SDK) or later

An existing UNICORE installation with Gateway, Shared Registry
and one ore more UNICORE/X target systems.

(for production use) Two server certificates for Workflow
engine and Service orchestrator.

For storing workflow input and output data you need one of

a "global storage" service (see below)

a StorageFactory service

If you want to use the RESTful API for submitting workflows, you’ll
also need Unity

1.2. Updating from previous versions

Please refer to the separate release notes!

Note

On Windows, please stop and uninstall the services before updating!
Uninstalling works by executing

workflow\bin\uninstall.bat
servorch\bin\uninstall.bat

In any case you need to replace the jar files and the wrapper.conf
files for workflow and servorch by the new versions.

1.3. Installation

The workflow system is available either as a bundle containing
workflow engine and service orchestrator, OR as separate Linux
packages (deb or rpm) for workflow engine and service orchestrator.

The basic installation procedure is completely analogous to the
installation of the UNICORE core servers.

If you downloaded the workflow system bundle, either use the graphical
installer, or untar the tar.gz, edit configure.properties and run
configure.py

Graphical installer: during installation, you will be asked for
the parameters of your UNICORE installation.

Using the tar.gz bundle: please review the configure.properties file
and edit the parameters to integrate the workflow services into your
existing UNICORE environment. Then call ./configure.py to apply
your settings to the configuration files. Finally use ./install.py to
install the workflow server files to the selected installation directory.

If using the Linux packages, simply install using the package manager
of your system.

1.4. Setup

After installation, there are some manual steps needed to integrate the
new servers into your UNICORE installation.

Gateway: edit gateway/conf/connections.properties and add the connection
data for the workflow server(s). For example,

WORKFLOW = https://localhost:7700
SERVORCH = https://localhost:7701

XUUDB: if you chose to use an XUUDB for workflow and service orchestrator,
you might have to add entries to the XUUDB to allow users
access to the workflow engine. Optionally, you can edit the GCID used
by the workflow/servorch servers, so that existing entries in the XUUDB
will match.

Registry: if the registry is setup to use access control (which is the default),
you need to allow the workflow and servorch services to register themselves in
the Registry. The exact procedure depends on how you configured your Registry,
please cross-reference the section "Enabling access control" in the Registry
manual. If you’re using default certificates and the XUUDB, the required entries
can be added as follows.

1.5. Workflow data storage

For storing workflow data (i.e. input/output files needed by the workflow tasks)
a storage service instance has to be available. Currently there are two options,
using a storage factory or using a shared storage instance. In fact, if
multiple options are available at runtime, users using the UNICORE Rich Client (URC)
can choose one when they submit their workflows.

Storage Factory

This is the "best" way to store workflow data. Each workflow will
store its data on its own storage service instance, making management
of these data simpler. The clients (UCC and URC) allow to choose the
storage factory that should be used.

Single shared storage

The workflow system can use a single shared normal UNICORE storage
service instance for storing files shared between workflow tasks.

Note

while this is simple to set up, it can create a bottleneck in your
system, because there is no automated cleanup of workflow data.

The storage to be used can be configured on any UNICORE container
running StorageManagement and FileTransfer services. For example, one of the
target systems can be used for this purpose.

Please refer to the "Configuring shared storage services" section in
the UNICORE/X manual to learn how to set up a shared storage.

1.6. Verifying the installation

If you use the UNICORE Rich Client, you should see the workflow service in the Grid Browser view,
and you should be able to submit workflows to it.

Using the UNICORE commandline client, you can
check whether the new servers are available and accessible:

1.7. RESTful API

Since version 7.6.0, the Workflow server allows workflow submission
and management using a RESTful API. Please refer to the section on
"RESTful services" in the UNICORE/X manual (7.6 or later), since the
configuration is exactly the same.

Note

If you want to submit workflows (in contrast to just checking their
status) you must setup Unity for authentication, because this is
currently the only way to get delegation, i.e. allow the workflow
engine to work on your behalf.

2. Configuration of the Workflow server

This chapter covers configuration options for the Workflow server.
Since the Workflow server is running in the same underlying
environment (UNICORE Services Environment, USE), you can find
basic configuration options in the UNICORE/X manual.

NOTE

The configuration files in the distribution are commented, and contain
example settings for all the options listed here.

Depending on how you installed the server, the files are located ion

/etc/unicore/workflow (Linux package)

<basedir>/workflow/conf (standalone installer)

2.1. Workflow processing

Some details of the workflow engine’s behaviour can be configured.
All these settings are made in uas.config.

2.1.1. Limits

To avoid too many tasks submitted (possibly erroneously) from a
workflow, various limits can be set.

workflow.maxActivitiesPerGroup limits the total number
of tasks submitted for a single group (i.e. (sub-)workflow).
By default, this limit is 1000, ie. a maximum number of 1000 jobs can
be created by a single group. Note, that it is not possible to
limit the total number of jobs for any workflow, it can only be applied
to individual parts of the workflow (such as loops).

workflow.forEachMaxConcurrentActivities limits
the maximum number of tasks in a for-each group that can be active at
the same time (default: 20).

2.1.2. Resubmission

The workflow engine will (in some cases) resubmit failed tasks
to the service orchestrator. To completely switch off
the resubmission,

workflow.resubmitDisable=true

To change the maximum number of resubmissions from the default "3",

workflow.resubmitLimit=3

2.1.3. Disabling tracing

To disable sending messages to the tracer component, set

workflow.tracing=false

2.1.4. Cleanup behaviour

This controls the behaviour when a workflow is removed (automatically or
by the user). By default, the workflow engine will remove all child jobs,
but will keep the storage where the files are.
This can be controlled using two properties

2.2. XNJS settings

The workflow engine uses the XNJS library for processing workflows.
The XNJS has a separate configuration file, which is controlled using
the following property

workflow.xnjsConfiguration=conf/xnjs.xml

The number of threads used by the workflow engine for processing
can be controlled in the xnjs.xml file. Note, this does
not control the number of parallel activities etc, since all XNJS
processing is asynchronous. The default number (4) is usually
sufficient.

What is more important is the data directory where the XNJS will store
its state. This should be on a fast (local) filesystem for maximum
performance. Shared (NFS) directories should not be used.

2.3. Location mapper

The location mapper provides a crucial service: it is used to
obtain "abstract names" for files, i.e. clients and server
components can define names that refer to actual files stored
on some storage without having to deal with the actual file
locations.

The location mapper uses its own database for storing these
mappings, which can be either H2 or MySQL. The database configuration
is done in wsrflite.xml using a set of property values named
org.chemomentum.dataManagement.locationManager.*

2.4. Tracer

The (optional) tracer service stores timestamps for activities associated
with any given workflow, for example submission time, workflow to service
orchestrator submission, job submissions, etc. It is used on the clients
to show time profile data to the user. The URC contains a nice user interface
for interacting with this trace data.

This data is stored in a H2 database, which stores its data on the
filesystem (in the usual persistence directory).
Currently no other database is supported.

2.5. Property reference

A complete reference of the properties for configuring the Workflow server
is given in the following table.

Property name

Type

Default value / mandatory

Description

workflow.cleanupJobs

[true, false]

true

Whether to remove child jobs when the workflow is destroyed.

workflow.cleanupStorage

[true, false]

false

Whether to cleanup the workflow storage when the workflow is destroyed.

workflow.forEachMaxConcurrentActivities

integer >= 1

100

Maximum number of concurrent for-each iterations.

workflow.maxActivitiesPerGroup

integer >= 1

1000

Maximum number of workflow activities per activity group.

workflow.pollingInterval

integer >= 1

600

Interval in seconds for (slow) polling of job states from the service orchestrator.

workflow.resubmitDisable

[true, false]

false

Whether disable automatic re-submission of failed jobs.

workflow.resubmitLimit

integer >= 1

3

Maximum number of re-submissions of failed jobs.

workflow.tracing

[true, false]

true

Whether to send trace messages to the tracer service.

workflow.xnjsConfiguration

string

conf/xnjs.xml

XNJS configuration file.

3. Servorch server

This chapter covers configuration options for the Servorch server.
Since the Servorch server is running in the same underlying
environment (UNICORE Services Environment, USE), you can find
basic configuration options in the UNICORE/X manual.

Additional servorch server configuration is performed in the
uas.config file. Advanced re-configuration such as adding new
brokering strategies can be done in the set of Spring configuration
files servorch/conf/spring.

NOTE

The directory containing the Spring config files is controlled by
the property servorch.springConfig in uas.config.

Depending on how you installed the server, the config files are
located in

/etc/unicore/servorch (Linux package)

<basedir>/servorch/conf (standalone installer)

3.1. Data directories

By default, runtime data is placed into the "data" subdirectory in the
service orchestrator directory. To change, there are several properties.

The usual UNICORE data directory is set in wsrflite.xml in the
persistence.directory property (default: "data")

The local indexes created by the resource broker are placed into
the directory configured in conf/spring/attributeCache.xml, by
default this is set to "data/brokering/attributes".

3.2. Preferred file transfer protocol

If you want to change the preferred protocol, you may set

servorch.outcomesProtocol=BFT

The default "BFT" will work with any UNICORE installation.
If available, "UFTP" will provide more performance.

3.3. Job processing

A number of properties control how jobs are processed by the
service orchestrator.

servorch.jobSupervisors controls the number of threads
that act as "job supervisors". These threads are used for resource
brokering, job submission, status polling and storing job outcomes.
The default is "10".

servorch.jobUpdateInterval controls the number of
milliseconds between two job status polls. The default is "5000".

servorch.jobFirstUpdateInterval is the delay in
milliseconds between job submission and first status check.
The default is "5000".

servorch.outcomesUpdateInterval is the number of
milliseconds between status polls while transferring files.
The default is "5000".

3.4. Resource checking and attribute gathering interval

The service orchestrator periodically updates its internal information
about available sites and their resources. The update interval
is controlled in the file conf/uas.config

servorch.siteUpdateInterval controls the number of
milliseconds between two site refresh calls. The default is "20000".

3.5. Property reference

A complete reference of the properties for configuring the Servorch server
is given in the following table.

Property name

Type

Default value / mandatory

Description

servorch.dbClearOnStartup

[true, false]

false

Whether to clear the persisted state when starting.

servorch.jobFirstUpdateInterval

integer >= 1

5000

Number of milliseconds to wait after job submission before the first status check.

servorch.jobSupervisors

integer >= 1

10

Number of threads that act as job supervisors. These threads are used for resource brokering, job submission, status polling and storing job outcomes.

servorch.jobUpdateInterval

integer >= 1

5000

Number of milliseconds between two job status polls.

servorch.numParallelTransfers

integer >= 1

5

Maximum number of parallel data transfers.

servorch.outcomesProtocol

string

BFT

Protocol to use for storing outcomes.

servorch.outcomesUpdateInterval

integer >= 1

5000

Number of milliseconds between two progress checks while storing outcomes.

servorch.siteUpdateInterval

integer >= 1

20000

Number of milliseconds between site status polls.

servorch.tracing

[true, false]

true

Whether to send trace messages to the tracer service.

4. The "simple workflow" workflow description language

4.1. Introduction

This chapter provides an overview of the "simple workflow" XML
dialect that is used to describe workflows. It will allow
you to write workflows "by hand", i.e. without using the graphical
UNICORE Rich client. These can be submitted for example using
the UNICORE commandline client (UCC).

The workflow language is an XML dialect, the corresponding
XML schema can be found in the UNICORE SourceForge code repository

After presenting all the constructs individually, several complete
[wf_examples] are given.

Here and in the following we use a simple notation to denote XML elements
and their multiplicity, where "*" denotes zero or multiple occurences and
"?" denotes zero or one occurence of a given element. In the next sections
the elements of the workflow description will be discussed in detail.

NOTE

The Id attribute is used in many workflow elements, and must be an identifier
string that is UNIQUE within the workflow.

4.2.1. Documentation

The Documentation element allows to add some meta-information to
the workflow description, i.e. it will be ignored by the processing engine.
In detail

"ModifyVariable" allows to modify a workflow variable. An option named "variableName"
identifies the variable to be modified, and an option "expression" holds the
modification expression in the Groovy programming language syntax. See also the variables
section later

"Split": this activity can have multiple outgoing transitions. All transitions with matching
conditions will be followed. This is comparable to an "if() … if() … if()" construct
in a programming language.

"Branch": this activity can have multiple outgoing transitions. The transition with the
first matching condition will be followed. This is comparable to an "if() … elseif() … else()"
construct in a programming language

"Merge" merges multiple flows without synchronising them

"Synchronize" merges multiple flows and synchronises them

"HOLD" stops further processing of the current flow until the client explicitely
sends continue message.

4.2.3. Subworkflows

The workflow description allows nested sub workflows, which have the same formal structure
as the main workflow

The processing of the JSDL activity can be influenced using Option sub-elements.
Currently the following options can be used

IGNORE_FAILURE if set to "true", the workflow engine will ignore any
failure of the task and continue processing as if the activity had been completed
successfully. NOTE: this has nothing to do with the exit code of the actual UNICORE
job! Failure means for example data staging failed, or the service orchestrator did
not find a matching target system for the job.

MAX_RESUBMITS set to an integer value to control the number of times the activity
will be retried. By default, the workflow engine will re-try three times (except in those
cases where it makes no sense to retry).

4.2.5. Transitions and conditions

The basic flow of control in a workflow is handled using Transition elements.
These reference to "From+ and To activities (or subflows) and may have conditions
attached. If no condition is present, the transition is followed unconditionally, otherwise
the condition is evaluated and the transition is followed only if the condition matches
(i.e. evaluates to true).

The From and To attributes denote Activity or SubWorkflow Id’s,
and the Id attribute has to be workflow-unique.

An activity can have outgoing (and incoming) transitions. In general,
all outgoing transitions (where the condition is fulfilled) will be
followed. The exception is the "Branch" activity, where only the
first matching transition will be followed.

where Expression is string-valued. The workflow engine offers some
pre-defined functions that can be used in these expressions.
For example you can use the exit code of a job, or check for the existence of a file
within these expressions.

eval(expr) Evaluates the expression "expr" in Groovy syntax, which must evaluate to a boolean.
The expression may contain workflow variables

exitCodeEquals(activityID, value) Allows to compare the exit code of the Grid job associated
with the Activity identified by “activityID” to "value"

exitCodeNotEquals(activityID, value) Allows to check the exit code of the Grid job associated
with the Activity identified by "activityID", and check that it is different from "value"

fileExists(activityID, fileName) Checks that the working directory of the Grid job associated with
the given Activity contains a file "fileName"

fileLengthGreaterThanZero(activityID, fileName) Checks that the working directory of the Grid job
associated with the given Activity contains the named file, which has a non-zero length

before(time) and after(time) check whether the current time is before or after the given time
(in "yyyy-MM-dd HH:mm" format)

fileContent(activityID, fileName) Reads the content of the named file in the working directory of
the job associated with the given Activity and returns it as a string.

4.3. Using workflow variables

Workflow variables need to be declared using a DeclareVariable element
before they can be used.

The option named "expression" contains an expression in Groovy syntax (which is very close
to Java).

The workflow engine will replace variables in JSDL data staging sections and
environment definitions, allowing to inject variables into jobs. Examples for this
mechanism will be given in the examples section.

4.4. Loop constructs

Apart from graphs constructed using Activity and Transition elements, the workflow
system supports special looping constructs, for-each, while and repeat-until,
which to setup allow complex workflows very easily.

4.5. While and repeat-until loops

These allow to loop a certain part of the workflow while (or until) a condition is met.
A while loop looks like this

Semantically, the repeat-loop will always execute the body at least once, since the
condition is checked after executing the body, while in the "while" case, the condition will
be checked before executing the body.

4.6. For-each loop

The for-each loop is a complex, yet powerful feature of the workflow system, since it allows
parallel execution of the loop body, and different ways of building the different iterations.
Put briefly, one can loop over variables (as in the "while" and "repeat-until" case), but
one can also loop over enumerated values and (most importantly) over file sets.

The IteratorName attribute allows to control how the "loop iterator
variable" is to be called.

4.6.1. The ValueSet element

Using ValueSet, iteration over a fixed set of strings can be defined.
The main use for this is parameter sweeps, i.e. executing the same job multiple
times with different arguments or environment variables.

The Base element defines a base of the filenames, which will be resolved at runtime,
and complemented according to the Includes and/or Excludes elements.
The recurse attribute allows to control whether the resolution should be done
recursively into any subdirectories. The indirection attribute is explained below.

For example to recursively collect all PDF files (but not the file named "ununsed.pdf")
in a certain directory on a storage:

The following variables are set where ITERATOR_NAME is the loop iterator name defined in the SubWorkflow as
shown above.

ITERATOR_NAME is set to the current iteration index (1, 2, 3, …)

ITERATOR_NAME_VALUE is set to the current full file path

ITERATOR_NAME_FILENAME is set to the current file name (last element of the path)

4.6.4. Indirection

Sometimes the list of files that should be looped over is not known at workflow design time,
but will be computed at runtime. Or, you wish simply to list the files in a file, and not put them
all in your workflow description. The indirection attribute on a FileSet allows to do just that.
If indirection is set to true, the workflow engine will load the given file(s) in the fileset
at runtime, and read the actual list of files to iterate over from them.
As an example, you might have a file filelist.txt containing a list of UNICORE SMS files and
logical files:

4.6.5. Chunking

Chunking allows to group sets of files into a single iteration, for example for
efficiency reasons. The number of files in a chunk can be controlled, alternatively
the size of the chunk in kbytes can be set.

If required the chunksize can also be computed at runtime using the expression given in
the ComputeChunksize element. In the expression, two special variables may be used.
The TOTAL_NUMBER variable holds the total number of files iterated over, while
the TOTAL_SIZE variable holds the aggregated file size in kbytes.
The script must return an integer-valued result. The IsKbytes element is used
to choose whether the chunk size is interpreted as data size or as number of files.

For example:

To choose a larger chunksize if a certain total file size is exceeded:

The optional FilenameFormat allows to control how the individual files (which are staged
into the job directory) should be named. By default, the index is prepended, i.e. "inputfile"
would be named "1_inputfile" to "N_inputfile" in each chunk. The pattern uses the variables
respectively. For example, if you have a set of PDF files, and you want them to be
named "file_1.pdf" to "file_N.pdf", you could use the pattern

<s:FilenameFormat>file_{0}.pdf</s:FilenameFormat>

or, if you prefer to keep the existing extensions, but append an index to the name,

<s:FilenameFormat>{1}{0}.{2}</s:FilenameFormat>

4.7. Examples

This section collects a few simple example workflows. They
are intended to be submitted using UCC.

4.7.1. Simple "diamond" graph

This example shows how to use transitions for building simple workflow graphs.
It consists of four "Date" jobs arranged in a diamond shape, i.e. "date2a" and "date2b"
are executed roughly in parallel. A "Split" activity is inserted to divide the
control flow into two parallel branches.

Here we use the "Branch" activity to make sure only the first matching
transition is followed.

4.7.3. While loop example using workflow variables

The next example shows some uses of workflow variables in a while loop.
The loop variable "C" is copied into the job’s environment.
Another possible use is to use workflow variables in data staging sections,
for example to name files.