Adaptive Workflows with Syrup

Syrup is an adaptive Workflow system with a difference. Like any other
Workflow solution, Syrup can be used to describe the tasks, procedural
steps, required input and output information and tools needed for each step
in a business process [1]. To be able to do this, Syrup provides five basic
concepts: Tasks, Links, Workflows, Workers and the WorkSpace.
Additionally, the minimal core is build in such a way that integrating Syrup
into an existing infrastructure poses a minimal challenge to developers.

Syrup can overcome the von Neumann bottleneck that stops traditional
software systems from scaling [2]. It does this by strictly separating the
specification, identification and execution phase of Workflows in a
distributed setup. Although the phases are explicitly separated, this
doesn't mean that each phase can't be expressed in terms of one another. For
example, Workflows can be dynamically compiled by other Workflows in
Syrup. This Lisp-like flexibility gives developers a chance to maximize
concurrency and the distribution of Workflows even further [3].

Note that Syrup doesn't follow the more complex standards such as Wf-XML,
BPML and XPDL [4]. Instead it provides the basic building blocks to
implement any of the standards - and hopefully more.

Example Workflow

The following example shows a simple mail filter to give a first taste of what
Workflows are about.

All mails of 'myname' are read by
ReadMail. These mails are pushed through and filtered on the
keywords 'problem, error and help'.
The mails that contain these keywords are pushed
through SendMail and send to
'problems@anotherhost.com'.

While this example may be considered not very realistic, it is already very
close to a real implementation. Indeed Workflow solutions can help bridge the gap between
specification and implementation, not only at the initial phase but also
throughout the whole development cycle. Ideally, Syrup could serve as an
intermediate (visual) language that can be talked by both developers and
the business [5].

Basic concepts

To see how to build Workflows with Syrup, the basic building blocks are
introduced first:

Task

A Task is a static description of how work must be done. It can describe a
calculation, transformation or any other programmable function. Tasks
are managed by Syrup and executed by Workers.
One or two inputs are given as parameters before a Task can be executed.
These input values are determined at runtime and they are typically
produced by other Tasks.
After execution, the Worker may have produced one or two outputs based
on the Task's static description and inputs. At the same time, the Task's
inputs may have been consumed so that the Task will be ready for a next round of
execution.

Tasks are graphically depicted as
filled squares with a certain color and label. The label
describes the function and the color indicates the state of a
Task:

The green color indicates the state
EXECUTABLE.

The blue color indicates the state
WAITING.

The grey color indicates the state DONE.

A Task also has a fixed type,
indicated by its boundary line that is either dotted or solid.

Type OR is indicated by the dotted
boundary line.

Type AND is indicated by the solid
boundary line.

Tasks can turn from WAITING into EXECUTABLE and vice
versa during the execution of a Workflow. In contrast, when Tasks are
DONE they will stay DONE forever. A Task is
WAITING when its inputs are not available or when it is prohibited
to generate outputs. Finally, when a Task is in state
EXECUTABLE it may consume inputs and generate outputs when
executed.

Tasks of type OR are ready for execution if either of the two
inputs is available. Type AND Tasks can only execute if
both inputs are available. The availability of inputs and the
prohibition for Tasks to generate outputs is fully determined by
Links.

Link

A Link is a connection between the output of one Task (the source) and the input
of another (the destination). Another possibility for a Link is to have only
one Task at its ends. This type of Link will typically mark the
initialization or finalization of a Task or even a whole Workflow.
Links can be in two states: they are either FULL with information or
they are EMPTY. It is only when the destination Tasks of Links are
executed, that Links can turn from FULL to EMPTY.
Consequently, a Task can only turn into EXECUTABLE when all
its outgoing Links are EMPTY.
After execution, a Task may have consumed input - clearing its (full) ingoing
Links - and it may have produced outputs - filling its (empty)
outgoing Links. Following this pattern for each executable Task, it
becomes clear that Links manage the data flow between Tasks [6].

Links are graphically depicted as lines
connecting squares. The arrowhead shows the direction of the Link
- from one Task to another Task.

The 'from' side of a Link can be either the first output (white
diamond) or the second output (black diamond). The 'to' side
of a Link can be either the first input (white arrow) or the
second input (black arrow).

The label indicates that the Link is FULL with data.
This can be either the data itself or a reference to the data
(i.e. an URL).

Workflow

Workers execute Tasks to consume inputs and produce outputs, but they can
also do something more special: create new Workflows.
A Workflow is essentially a network of Tasks and Links between them.
New Workflows can dynamically be added during the run of a Workflow or
adapted to meet new business requirements. Adaptive Workflows can be
regarded as the most powerful feature of Syrup.
The following example shows this feature in more detail:

1: The Launch Task is ready to take a XML
Workflow description and expand the description into a Syrup
Workflow.
The XML description declares the Tasks ('d' and 'f') and
the bindings and links. The references to Tasks (d-1, d-2, f-1
and f-2) contain suffixes -1 and -2 to specify which input or
output is taken (first or second).

2: The Launch Task has been executed
and expanded to a new Workflow. The Launch Task's old inputs
and outputs are now bounded to the new Duplicate and Flip Task.
The surrounding box 'Launch' indicates that the Launch Task
became the parent of the new Tasks.

3: The Duplicate Task has duplicated
the '1' input to both its outputs.

4: The Flip Task has flipped the '1'
input to output '0'.

Worker

The Worker is the entity that does the actual work described by a Task. Workers are needed because
Syrup is only concerned with the administration and coordination of Tasks
and not with their execution.
Workers use Syrup to know what Tasks need to be executed and in what order.
Workers also deal with the efficient use of computational resources and the
scheduling of Tasks.

Besides interacting with Syrup, Workers may decide to interact with other
systems while executing Tasks (databases for instance). They can even decide to collaborate with
other Workers, using only the 'directory' service of Syrup. By using this single service,
efficient Peer to Peer (P2P) solutions could be build on top of Syrup without
much effort.

WorkSpace

This is the top-level container holding the Tasks managed by Syrup - the
ultimate parent of all Tasks. It is the WorkSpace that enables Workers to
query Tasks, mark them for execution or commit results. The interaction
between Workers and the WorkSpace is what makes a well behaved system.
Because the Workspace holds and manages all the Tasks, most of the time it can
be regarded synonymous to 'Syrup' itself.

Determinism

Syrup enforces that Tasks executed by Workers behave in a deterministic way -
output should solely depend on the given input (pure functions) [7]. This
property is essential for Syrup to ensure that faulty executions, due to
crashed Workers, can be rolled back safely.

If this was not so, Syrup would be held responsible for actions based on
the internal state of Workers. Because Syrup cannot possibly know
everything about the internal state of Workers, this functional
restriction can be justified.

Non-Determinism

If Syrup were purely deterministic, how then can it react to external events?
External, real world events are notorious for their non-determinism
- they just happen.

Syrup provides two special inputs to receive external events and two
special outputs to generate external events. In fact, it is only via these
special inputs and outputs that the Syrup system can be bootstrapped, for
instance, to receive Workflows from the external world.

The two ingoing Links of the WorkSpace can
be filled to signal events from the external world
(represented by the non-labeled box). The two outgoing Links can be
consumed to publish Syrup events to the external world. Only
specialized Workers are allowed to do this.

There is another, more subtle kind of non-determinism. This is
introduced by type OR Tasks that may depend on the
relative timing between two inputs (comparable to
race-conditions) [8].
For example, consider a Task that has the following behavior: if the
earliest available input is the second, the output will be 1. Likewise, if
the earliest input is the first, the output will be 0.

Design and Implementation

Environment

The default implementation is done in Java, but is not restricted to Java
alone. The implementation is also able to control Perl, shell scripts or C++
executables in a distributed environment. This makes Syrup ideally suited
for developing system integrations [9].

Distribution

Remember that Syrup itself does not execute Tasks. It only manages the flow
between Tasks and the creation of new Workflows. However, Syrup does track
all the Workers that have requested the execution of Tasks so that Workers
will cooperate nicely.

Syrup can track thousands of distributed Workers that execute Tasks
concurrently, making distributed computation a reality. It is also
designed in such a way that crashed Workers, network failures or other
exceptions do not influence the outcome of a Workflow [10].

Persistence

Tasks are managed with the aid of a persistent backing store so that no Task or
execution result is ever lost during the lifetime of a Workflow.
Persistence is desirable when Workflows take weeks or even months to
complete or when (partial) failure is not an option.

The default implementation uses (My)SQL [11] to store and retrieve
Tasks. But other, potentially more distributed stores such as JavaSpaces [12]
could be used to achieve even higher performance.

Scheduling

In the UNIX environment, the cron utility is primarily used to schedule
jobs on a single host. Note however, that cron doesn't prevent the loss
of scheduled jobs when the machine is temporary offline [13].
Alternatively Syrup could replace cron so that jobs can be scheduled
reliably on a single host but also across multiple hosts.

Example crontab

Consider the replacement of cron entries by Syrup. The following
approach could be taken:
For every cron entry, create a Workflow that includes the
following two Tasks:

a Task that can track time (for every workday ....) and signals when
a certain deadline has passed.

a Task that runs the shell command that would have normally be run by
cron. After it receives the time signal from the first, it executes
the shell command.

The following diagrams show the replacement of the following cron
entry: '00 13 * * 1-5 /usr/bin/zip /backup/backup.zip /tmp/*'.

The Alarm has signaled that the
deadline has passed.

The Shell Task has run
'/usr/bin/zip /backup/backup.zip /tmp/*'.

After execution, it produced stdout on the first
output and stderr on the second output.

For Workflows to be executed by a Worker, a cron entry (can be on different
hosts) has to be added to start the Worker and to run it frequently and
reliably throughout the day. Each time a Worker is launched by the cron
scheduler, it will fetch Tasks from the Workspace and execute them. By
adding more and more Workers to the crontab, higher levels of
parallelism can be achieved without much effort.

Syrup processing

Tasks can be executed by Workers, but who runs the Workers?

As the previous cron example showed, Workers are typically run by a
concrete operating system but could - in principle - be humans as well.
Workers are build to run forever (at least during the lifetime of their
hosts) possibly having executed thousands of Tasks that were taken
one-by-one from a WorkSpace.

Although Workers are considered to be the active Agents that have the
ability to process something, they are not only concerned with
executing Tasks. In fact, Workers have to first find a WorkSpace to be
able to do something useful. And after the WorkSpace has been found, a
Worker has to query it for Tasks to be executed. So, besides execution,
Workers have to do some additional processing in order to find and
retrieve Tasks from a WorkSpace.

Query and select

Idle Workers can start querying the WorkSpace for Tasks waiting to be
executed. When such Tasks are found, they are carried out in a specific
order - based on the Worker's internal scheduling policy.

Before execution of any single Task, a Worker must notify the WorkSpace
that it wants to do so. This notification has two effects:

First, the Worker identifies itself as a 'reachable' resource by
creating a unique (communication) address where the Worker can be
reached. During execution, the Worker is expected to respond to
any request send to that address.

Second, the communication address is stored in the WorkSpace. So,
for each Task that is in progress, there will be an associated and
known address.

Given an address, other Workers are able to check whether the executing
Worker is still active. If the Worker is not responding to requests,
other Workers may decide to take a turn in executing the Task.
The preferred address type that is used among Workers is an URL
referring to an HTTP address. The choice of using the HTTP protocol is
convenient because HTTP requests are synchronious and are thus more
responsive when compared to the email protocol for example.
From the examples that will follow, it will become clear that
Tasks can be in one additional state: EXECUTING. This state
is indicated by the green fill color of the EXECUTING Task and
a self-referencing dotted edge. The corresponding label states the
URL address of the Worker executing the Task.

1: There are 2 Shells being
executed by 2 different Workers concurrently. The label
of the dotted line indicates the (HTTP) address of each
Worker.

The first Shell will sleep for 30 seconds and then
output 'slept 30 seconds' to stdout.

The second Shell will sleep for 40 seconds and then
output 'slept 40 seconds'.

2: The first Shell has finished
while the second is still in progress. Additionally the
Concatenation of the strings 'slept 30 seconds' and ' -- '
has been done.

Note that Workers that execute small Tasks (~ 1
second) such as Concat will publicize their address for
the same short period of time (~ 1 second).

It is only during the execution of medium to big Tasks
that addresses will be visible to other Workers.

3: The second Shell is finished
and produced 'slept 40 seconds' on stdout.

Execution and failure

As previously mentioned, Syrup is resilient towards network failures
or power outages. Still there is one type of failure that is difficult to
catch, in particular the case in which Workers may have crashed or are
otherwise blocked. This problem is closely related to identifying
(from the outside) which processes are still doing something useful or
which of them are in some kind of infinite loop.

For example, a Worker preempted (put to the background) by the operating
system is essentially doing nothing but hasn't exactly failed either.
The Worker just has to wait for another time-slice to continue. The
difficulty is that the (preempted) state of one Worker is hard to assess
by others. In fact, the only way to check the state of a Worker is by
sending it a request. When no reply is received within a certain amount
of time, the associated Worker is considered to be crashed and will be
removed from the Workspace as an active Agent.

This scheme has one drawback. A Worker could possibly still have been
operational but was somehow not able to respond to requests (for
example, because of network failure). And when such a 'disconnected'
Worker tries to commit the execution result to the Workspace there will
be an error because some other Worker has taken control over the same
Task (while it was disconnected). In this rare case a Task may have been
executed twice (or more) by different Workers after subsequent
failures (although only one result is committed to the WorkSpace). In
general, distributed systems cannot prevent redundant executions
without incurring great communication costs. Syrup also doesn't
prevent redundant executions. Instead it adopts another algorithm
that tries to keep redundant executions to an absolute minimum [14].

When Tasks are to be executed multiple times however, it is important to
know that they behave like pure functions (see Determinism). The
alternative - impure functions - should be avoided because they
typically modify state outside the realm of Syrup (like modifying
database records or transferring money). Such stateful programs
usually require that their modifications be applied once (and only
once) to ensure overall correctness.

In contrast, the proper and exclusive use of pure functions will always
yield the same outcome regardless of how many times they are executed.

Temporal data

If state cannot be avoided it is best to use temporal databases.
Temporal doesn't mean they are thrown away after use. Instead they have
the unique property that state cannot be changed in time but
only added (with explicit timestamps). Practice has shown
that if Syrup and temporal databases are married, they are able to
produce fully accountable, easily distributable and very scalable
systems [15].

Controlling complexity

It is the dream of the author that, when dealing with increasingly more
complex systems, solutions like Syrup and temporal databases can give
programmers and business people more insight and less to worry about.
And if not, Syrup could at least make the building and running of
software more fun.