Makeflow is a workflow engine for large scale distributed computing.
It accepts a specification of a large amount of work to be performed,
and runs it on
remote machines in parallel where possible. In addition, Makeflow is
fault-tolerant, so you can use it to coordinate very large tasks that may run
for days or weeks in the face of failures. Makeflow is designed to be similar
to Make, so if you can write a Makefile, then you can write a
Makeflow.

Makeflow makes it easy to move a large amount of work from one facility to another. After writing a workflow, you can test it out on your local laptop,
then run it at your university computing center, move it over to a national
computing facility like XSEDE, and
then again to a commercial cloud system. Using the (bundled) Work Queue system, you can
even run across multiple systems simultaneously. No matter where you run your tasks, the workflow language stays the same.

Makeflow is used in production to support large scale problems
in science and engineering. Researchers in fields such as bioinformatics,
biometrics, geography, and high energy physics all use Makeflow to compose
workflows from existing applications.

Makeflow can send your jobs to a wide variety of services,
such as batch systems (HTCondor, SGE, PBS, Torque),
cluster managers (Mesos and Kubernetes), cloud services (Amazon EC2)
and container environments like Docker and Singularity.
Details for each of those systems are given in the Batch System Support section.

Makeflow is part of the Cooperating Computing Tools, which is easy to install.
In most cases, you can just build from source and install in your home directory like this:
git clone https://github.com/cooperative-computing-lab/cctools cctools-source
cd cctools-source
./configure
make
make install
Then, set your path to include the appropriate directory. If you use bash, do this:
export PATH=${HOME}/cctools/bin:$PATH
Or if you use tcsh, do this:
setenv PATH ${HOME}/cctools/bin:$PATH
For more complex situations, consult the CCTools installation instructions.

The Makeflow language is very similar to Make. A Makeflow script consists
of a set of rules. Each rule specifies a set of target files to create,
a set of source files needed to create them, and a command that
generates the target files from the source files.

Makeflow attempts to generate all of the target files in a script. It
examines all of the rules and determines which rules must run before others.
Where possible, it runs commands in parallel to reduce the execution time.

Here is a Makeflow that uses the convert utility to make an
animation. It downloads an image from the web, creates four variations of the
image, and then combines them back together into an animation. The first and
the last task are marked as LOCAL to force them to run on the controlling
machine.

(Note that Makeflow differs from Make in a few subtle ways,
you can learn about those in the Language Reference below.)

To try out the example above, copy and paste it into a file named
example.makeflow. To run it on your local machine:

makeflow example.makeflow

Note that if you run it a second time, nothing will happen, because all of
the files are built:

makeflow example.makeflow
makeflow: nothing left to do

Use the --clean option to clean everything up before trying it again:

makeflow --clean example.makeflow

If you have access to a batch system like Condor, SGE, or Torque,
or a cloud service provider like Amazon, you can direct Makeflow to run your jobs there by using the -T option:

makeflow -T condor example.makeflow
or
makeflow -T sge example.makeflow
or
makeflow -T torque example.makeflow
or
makeflow -T amazon example.makeflow
...
To learn more about the various batch system options, see the Batch System Support section.

Most batch systems require information about what resources
each job needs, so as to schedule them appropriately. You can convey
this by setting the variables CORES, MEMORY (in MB), and DISK (in MB)
ahead of each job. Makeflow will translate this information as needed
to the underlying batch system. For example:

A variety of tools are available to help you monitor the progress of a workflow as it runs.
Makeflow itself creates a transaction log (example.makeflow.makeflowlog) which
contains details of each task as it runs, tracking how many are idle,
running, complete, and so forth. These tools can read the transaction
log and summarize the workflow:

makeflow_monitor reads the transaction log and produceds a continuous
display that shows the overall time and progress through the workflow:
makeflow example.makeflow & makeflow_monitor example.makeflow.makeflowlog

makeflow_graph_log will read the transaction log, and produce a timeline
graph showing the number of jobs ready, running, and complete over time:
makeflow example.makeflow & makeflow_graph_log example.makeflow.makeflowlog example.png

makeflow_viz will display the workflow in graphical form,
so that you can more easily understand the structure and dependencies.
(Read more about Visualization)

In addition, if you give the workflow a "project name" with the -N
option, it will report its status to the catalog server once per minute. The makeflow_status command will query the catalog
and summarize your currently running workloads, like this:

A few key bits of advice address the most common problems encountered when using Makeflow:

First, Makeflow works best when it has accurate information about each task that you wish
to run. Make sure that you are careful to indicate exactly which input files each task
needs, and which output files it produces.

Second, if Makeflow is doing something unexpected, you may find it useful to turn on
the debugging stream with the -d all option. This will emit all sorts of detail
about how each job is constructed and sent to the underlying batch system.

When debugging failed tasks, it is often useful to examine any output produced.
Makeflow automatically saves these files in a makeflow.failed.$ruleno directory for each failed rule.
Only the specified outputs of a rule will be saved.
If the rule is retried and later succeeds, the failed outputs will be automatically deleted.

Finally, Makeflow was created by the Cooperative Computing Lab at the University of Notre Dame.
We are always happy to learn more about how Makeflow is used and assist you if you
have questions or difficulty.

For the latest information about Makeflow, please visit our web site and subscribe to our mailing list for more information.

Makeflow supports a wide variety of batch systems.
Use makeflow --help to see the current list supported.
Generally speaking, simply run Makeflow with the -T option
to select your desired batch system. If no option is given,
then -T local is assumed.

If you need to pass additional parameters to your batch system,
such as specifying a specific queue or machine category,
use the -B option to Makeflow, or set the BATCH_OPTIONS
variable in your Makeflow file. The interpretation of these options
is slightly different with each system, as noted below.

To avoid overwhelming a batch system with an enormous number of idle jobs,
Makeflow will limit the number of jobs sent to a system at once.
You can control this on the command line with the --max-remote
option or the MAKEFLOW_MAX_REMOTE_JOBS environment variable.
Likewise, local execution can be limited with --max-local
and MAKEFLOW_MAX_LOCAL_JOBS.

By default, Makeflow executes on the local machine.
It will measure the available cores, memory, and disk
on the local machine, and then run as many jobs as fit
in those resources. (Of course, you must label your jobs
with CORES, MEMORY, and DISK appropriately.
You can put an upper limit on the resources used with the
--local-cores, --local-memory, and --local-disk
options to Makeflow. Also, the total number of jobs running
locally can be limited with --max-local.

Use the -T condor option to submit jobs to the HTCondor batch system. (Formerly known as Condor.)

Makeflow will automatically generate a submit file for each job.
However, if you would like to customize the Condor submit file,
use the -B option or BATCH_OPTIONS variable
to specify text to add to the submit file.

For example, if you want to add Requirements and Rank lines to
your Condor submit files, add this to your Makeflow:

As above, Makeflow will automatically generate qsub commands.
Use the -B option or BATCH_OPTIONS variable
to specify text to add to the command line.
For example, to specify that jobs should be submitted to the devel queue:

Makeflow can be used with Apache Mesos. To run Makeflow with Mesos,
give thebatch mode via -T mesos and pass the hostname and
port number of Mesos master to Makeflow with the --mesos-master option.
Since the Makeflow-Mesos Scheduler is based on Mesos Python2 API, the path to Mesos
Python2 library should be included in the $PATH, or one can specify a
preferred Mesos Python2 API via --mesos-path option. To successfully
import the Python library, you may need to indicate dependencies through
--mesos-preload option.

For example, here is the command to run Makeflow on a Mesos cluster
that has the master listening on port 5050 of localhost, with a user
specified python library:

Makeflow can be configured to run jobs remotely on Amazon Web Services.
For each job in the workflow, a new virtual machine will be created,
the job will run on the virtual machine, and then the virtual machine
will be destroyed.

You will need to do some one-time setup first:

Create an account with Amazon Web Services, and test it out by creating and deleting some virtual machines manually.

Add the aws command to your PATH, run aws configure and
enter your AWS Access Key ID, your AWS Secret Access Key,
and your preferred region name. (The keys can be obtained from the AWS
web page by going to your profile menu, selecting "My Security Credentials"
and then "Access Keys". You may need to create a new access key.)

Before proceeding, test that the command-line tool is working by entering:
aws ec2 describe-hosts
Which should display:
{
"Hosts": []
}
Now you are ready to use Makeflow with Amazon.
Before running a workflow, use the makeflow_ec2_setup command,
which will setup a virtual private cluster, security group, keypairs, and
other necessary details, and store the necessary information into my.config.
Give the name of the default Amazon Machine Image (AMI) you wish to use as the
second argument:
makeflow_ec2_setup my.config ami-343a694f
Then, run makeflow with the -T amazon option
and --amazon-config my.config to use the configuration you
just created. You can run multiple workflows using a single configuration.
makeflow -T amazon --amazon-config my.config example.makeflow ...
When you are done, destroy the configuration with this command:
makeflow_ec2_cleanup my.config

Makeflow selects the virtual machine instance type automatically
by translating the job resources
into an appropriate instance type of the c4 or m4 category.
For example, a job requiring
CORES=2 and MEMORY=8000 will be placed into an m4.large instance type.
If you do not specify any resources for a job,
then it will be placed into a t2.micro instance,
which has very limited performance. To get good performance,
you should label your jobs with the appropriate CORES and MEMORY.

You can override the choice of virtual machine instance type,
as well as the choice of virtual machine image (AMI) within
the Makeflow on a job-by-job basis by setting the
AMAZON_INSTANCE_TYPE and AMAZON_AMI
environment variables within the Makeflow file. For example:
export AMAZON_INSTANCE_TYPE=m4.4xlarge
export AMAZON_AMI=ami-343a694f

However, if you have a system similar to Torque, SGE, or PBS which submits
jobs with commands like "qsub", you can inform Makeflow of those commands
and use the cluster driver. For this to work, it is assumed
there is a distributed filesystem shared (like NFS) shared across all
nodes of the cluster.

To configure a custom driver, set the following environment variables:

BATCH_QUEUE_CLUSTER_NAME: The name of the cluster, used in debugging messages and as the name for the wrapper script.

BATCH_QUEUE_CLUSTER_SUBMIT_COMMAND: The submit command for the cluster (such as qsub or msub)

BATCH_QUEUE_CLUSTER_SUBMIT_OPTIONS: The command-line arguments that must be passed to the submit command. This string should end with the argument used to set the name of the task (usually -N).

BATCH_QUEUE_CLUSTER_REMOVE_COMMAND: The delete command for the cluster (such as qdel or mdel)

These will be used to construct a task submission for each makeflow rule that consists of:

As described so far, Makeflow translates each job in the workflow
into a single submission to the target batch system. This works well
as long as each job is relatively long running and doesn't perform
a large amount of I/O.
However, if each job is relatively short, the workflow may run very
slowly, because it can take 30 seconds or more for the batch system to
start an individual job. If common files are needed by each job,
they may end up being transferred to the same node multiple times.

To get around these limitations, we provide the
Work Queue system.
The basic idea is to submit a number of persistent "worker" processes
to an existing batch system. Makeflow communicates directly with the
workers to quickly dispatch jobs and cache data, rather than interacting
with the underlying batch system. This accelerates short jobs and
exploits common data dependencies between jobs.

To begin, let's assume that you are logged into the head node
of your cluster, called head.cluster.edu
Run Makeflow in Work Queue mode like this:

Or, you can start workers manually on any other machine you can log into:

work_queue_worker head.cluster.edu 9123

Once the workers begin running, Makeflow will dispatch multiple tasks to
each one very quickly. If a worker should fail, Makeflow will retry the work
elsewhere, so it is safe to submit many workers to an unreliable system.

When the Makeflow completes, your workers will still be available, so you
can either run another Makeflow with the same workers, remove them from the
batch system, or wait for them to expire. If you do nothing for 15 minutes,
they will automatically exit.

Note that condor_submit_workers and sge_submit_workers,
are simple shell scripts, so you can edit them directly if you would like to
change batch options or other details. Please refer to Work Queue manual for more details.

Makeflow listens on a port which the remote workers would connect to. The
default port number is 9123. Sometimes, however, the port number might be not
available on your system. You can change the default port via the -p option. For example, if you want the master to listen
on port 9567 by default, you can run the following command:

If you don't like using port numbers, an easier way to match workers to masters is to use a project name. You can give each master a project name with the -N option.

makeflow -T wq -N MyProject example.makeflow

The -N option gives the master a project name called 'MyProject',
and will cause it to advertise its information
such as the project name, running status, hostname and port number, to a catalog server.
Then a worker can simply identify the workload by its project name.

You can also give multiple -N options to a worker. The worker will find out
which ones of the specified projects are running from the catalog server and
randomly select one to work for. When one project is done, the worker would
repeat this process. Thus, the worker can work for a different master without
being stopped and given the different master's hostname and port. An example of
specifying multiple projects:

work_queue_worker -N proj1 -N proj2 -N proj3

In addition to creating a project name using the -N option, this will also trigger automatic reporting to the
designated catalog server. The Port and Server address are taken from the environment variables CATALOG_HOST
and CATALOG_PORT. If either variable is not set, then the addresses "catalog.cse.nd.edu,backup-catalog.cse.nd.edu" and port 9097 will be used.

It is also easy to run your own catalog server, if you prefer.
For more details, see the Catalog Server Manual.

We recommend that anyone using the catalog server also set a password.
To do this, select any password and write it to a file called
mypwfile. Then, run Makeflow and each worker with the
--password option to indicate the password file:

Makeflow can interoperate with a variety of container technologies,
including Docker, Singularity, and Umbrella. In each case, you name
the container image on the command line, and then Makeflow will apply
it to each job by creating the container, loading (or linking)
the input files into the container, running the job, extracting
the output files, and deleting the container.

Note that container specification is independent of batch system selection.
For example, if you select HTCondor execution with -T condor and
Docker support with --docker, then Makeflow will submit HTCondor
batch jobs, each consisting of an invocation of docker to run the job.
You can switch to any combination of batch system and container technology that you like.

If you have the Docker container enviornment installed on your cluster,
Makeflow can be set up to place each job in your workflow into a Docker container.
Invoke Makeflow with the --docker argument followed by the container image to use.
Makeflow will ensure that the named image is pulled into each Docker node, and then execute the task within that container. For example, --docker debian
will cause all tasks to be run in the container name debian.

Alternatively, if you have an exported container image, you can use the exported image via the --docker-tar option.
Makeflow will load the container into each execution node as needed. This allows you to use a container without pushing it to a remote repository.

In a similar way, Makeflow can be used with the Singularity container system.
When using this mode, Singularity will take in an image, set up the container, and runs the command inside of the container.
Any needed input files will be read in from Makeflow, and created files will be delivered by Makeflow.

Invoke Makeflow with the --singularity argument, followed by the path to the desired image file.
Makeflow will ensure that the named image will be transferred to each job, using the appropriate mechanism
for that batch system

Makeflow allows the user to specify the execution environment for each rule
via its --umbrella-spec and --umbrella-binary options. The
--umbrella-spec option allows the user to specify an Umbrella
specification, the --umbrella-binary option allows the user to specify
the path to an Umbrella binary. Using this mode, each rule will be converted
into an Umbrella job, and the specified Umbrella specification and binary will
be added into the input file list of a job.

To run each makeflow rule as an Umbrella job, the --umbrella-spec
must be specified. However, the --umbrella-binary option is optional:
when it is specified, the specified umbrella binary will be sent to the
execution node; when it is not specified, the execution node is expected to
have an umbrella binary available through the $PATH environment
variable.

Makeflow also allows the user to specify the umbrella log file prefix via
its --umbrella-log-prefix option. The umbrella log file is in the
format of ".". The default value for the
--umbrella-log-prefix option is ".umbrella.log".

Makeflow also allows the user to specify the umbrella execution mode via
its --umbrella-mode option. Currently, this option can be set to the
following three modes: local, parrot, and docker. The default value of the
--umbrella-mode option is local, which first tries to utilize the
docker mode, and tries to utilize the parrot mode if the docker mode is not
available.

You can also specify an Umbrella specification for a group of rule(s) in the
Makefile by putting the following directives before the rule(s) you want to apply
the Umbrella spec to:

.MAKEFLOW CATEGORY 1
.UMBRELLA SPEC convert_S.umbrella

In this case, the specified Umbrella spec will be applied to all the
following rules until a new ".MAKEFLOW CATEGORY ..." directive is declared. All
the rules before the first ".MAKEFLOW CATEGORY ..." directive will use the
Umbrella spec specified by the --umbrella-spec option. If the
--umbrella-spec option is not specified, these rules will run without being
wrapped by Umbrella.

Makeflow allows a global wrapper command to be applied to every rule in the workflow. This is particularly useful for applying troubleshooting tools, or for setting up a global environment without rewriting the entire workflow. The --wrapper option will prefix a command in front of every rule, while the --wrapper-input and --wrapper-output options will specify input and output files related to the wrapper.

A few special characters are supported by wrappers. If the wrapper command or wrapper files contain two percents (%%), then the number of the current rule will be substituted there. If the command contains curly braces ({}) the original command will be substituted at that point.
Square brackets ([]) are the same as curly braces, except that the command is quoted and escaped before substitution.
If neither specifier is given, Makeflow appends /bin/sh -c [] to the wrapper command.

For example, suppose that you wish to shell builtin command time to
every rule in the workflow. Instead of modifying the workflow, run it like
this:

makeflow --wrapper 'time -p' example.makeflow

Since the preceding wrapper did not specify where to substitute the command, it is equivalent to

makeflow --wrapper 'time -p /bin/sh -c []' example.makeflow

This way, if a single rule specifies multiple commands, the wrapper will time all of them.

The square brackets and the default
behavior of running commands in a shell were added because Makeflow allows a rule to run multiple commands. The curly braces simply perform text substitution, so for example

By default, Makeflow does not assume that your cluster has a shared
filesystem like NFS or HDFS, and so, to be safe, it copies all necessary dependencies for each job. However, if you do have a shared filesystem, it can
be used to efficiently access files without making remote copies.
Makeflow must be told where the shared filesystem is located, so that it
can take advantage of it. To enable this, invoke Makeflow with the
--shared-fs option, indicating the path where the shared
filesystem is mounted. (This option can be given multiple times.)
Makeflow will then verify the existence of input and output files in these
locations, but will not cause them to be transferred.

For example, if you use NFS to access the /home and
/software directories on your cluster, then invoke makeflow like this:

After a job completes, Makeflow checks that the expected output files
were actually created. However, if the output files were written via
NFS (or another shared filesystem), it is possible that the outputs
may not be visible for a few seconds. This is due to caching and buffering
effects in many filesystems.

If you experience this problem, you can instruct Makeflow to retry
the output file check for a limited amount of time, before concluding
that the files are not there. Use the
--wait-for-files-upto option to specify the number of seconds
to wait.

It is often convenient to separate the logical purpose of
a file from it's physical location. For example, you may
have a workflow which makes use of a large reference database called
ref.db which is a standard file downloaded from a common data
repository whenever needed.

Makeflow allows you to map logical file names within the workflow
to arbitrary locations on disk, or downloaded from URLs.
This is accomplished by writing a "mount file" in which each line
lists a logical file name and its physical location.

Then, simply execute Makeflow with the --mounts option
to apply the desred mountfile:

makeflow --mounts my.mountfile example.makeflow ...

Before execution, Makeflow first parses each line of the mountfile when the --mounts option is set, and copies the specified dependency from the location specified by source field into a local cache, and then links the target to the item under the cache. Makeflow also records the location of the local cache and the info (source, target, filename under the cache dir, and so on) of each dependencies specified in the mountfile into its log.

To cleanup a makeflow together with the local cache and all the links created due to the mountfile:

makeflow -c example.makeflow

By default, makeflow creates a unique directory under the current working
directory to hold all the dependencies introduced by the mountfile.
This location can be adjusted with the --cache-dir option.

To only cleanup the local cache and all the links created due to the mountfile:

makeflow -ccache example.makeflow

To limit the behavoir of a makeflow inside the current working directory, the target field should satisfy the following requirements:

the target path should not exist;

the target path should be relative;

the target path should not include any symbolic links; (For example,
if the target path is a/b/c/d, then a, b,
c should not be symbolic links.) This avoids the makeflow
leaving the current working directory through symbolic links.

the target path should not include doubledots (..) to limit
makeflow to the current working directory. For example, a/../b
is an invalid target, however, a/b..c/d is a valid target.

the target path should be the same with its usage in Makeflow makefiles.
For example, the target path for the path ./mycurl from a makefile
must also be ./mycurl, can not be mycurl.

The source field can be a local file path or a http URL. When a local file path is specified,
the following requirements should be satisfied:

the source path should exist;

the source path can be relative or absolute;

the source path can include doubledots (..);

the source path should only be regular files, directories, and symbolic links;

the source path should not be any ancestor directory of the target to avoid infinite loop;

To limit the behavoir of a makeflow inside the current working directory, the cache_dir should satisfy the following requirements:

the cache_dir path should be a relative path;

the cache_dir path should not include any symbolic link; (For example,
if the cache_dir path is a/b/c/d, then a, b,
c should not be symbolic link.) This limitation aims to avoid the makeflow
leaving the current working directory through symbolic links.

the cache_dir path should not include doubledots (..) to limit the
behavoir of the makeflow within the current working directory. For example, a/../b
is an invalid cache_dir, however, a/b..c/d is a valid cache_dir.

As the workflow execution progresses, Makeflow can automatically delete
intermediate files that are no longer needed. In this context, an intermediate
file is an input of some rule that is the target of another rule. Therefore, by
default, garbage collection does not delete the original input files, nor
final target files.

Which files are deleted can be tailored from the default by appending files
to the Makeflow variables MAKEFLOW_INPUTS and MAKEFLOW_OUTPUTS.
Files added to MAKEFLOW_INPUTS augment the original inputs files that
should not be deleted. MAKEFLOW_OUTPUTS marks final target files that
should not be deleted. However, different from MAKEFLOW_INPUTS, files
specified in MAKEFLOW_OUTPUTS does not include all output files. If
MAKEFLOW_OUTPUTS is not specified, then all files not used in subsequent
rules are considered outputs. It is considered best practice to always specify
MAKEFLOW_INPUTS/OUTPUTS to clearly specify which files are considered
inputs and outputs and allow for better space management if garbage collection
is used.

Makeflow offers two modes for garbage collection: reference count, and on
demand. With the reference count mode, intermediate files are deleted as soon
as no rule has them listed as input. The on-demand mode is similar to reference
count, only that files are deleted until the space on the local file system is
below a given threshold.

There are several ways to visualize both the structure of a Makeflow
as well as its progress over time.
makeflow_viz can be used to convert a Makeflow into
a file that can be displayed by Graphviz DOT tools like this:

Makeflow allows for users to archive the results of each job within a specified archive directory. This is done using the --archive option, which by default creates a archiving directory at /tmp/makeflow.archive.$UID. Both files and jobs are stored as the workflow executes. Makeflow will also check to see if a job has already been archived into the archiving directory, and if so the outputs of the job will be copied to the working directory and the job will skip execution.

makeflow --archive example.makeflow

To only write to the archiving directory (and ensure that all nodes will be executed instead), pass --archive-write. To only read from the archive and use the outputs of any archived job, pass --archive-read. To specify a directory to use for the archiving directory, give an optional argument as shown below

Makeflow provides a tool to collect all of the dependencies for a
given workflow into one directory. By collecting all of the input files and
programs contained in a workflow it is possible to run the workflow on other
machines.

Currently, Makeflow copies all of the files specified as
dependencies by the rules in the makeflow file, including scripts and data
files. Some of the files not collected are dynamically linked libraries,
executables not listed as dependencies (python, perl), and
configuration files (mail.rc).

To avoid naming conflicts, files which would otherwise have an identical
path are renamed when copied into the bundle:

All file paths are relative to the top level directory.

The makeflow file is rewritten with the new file locations and placed in the top level directory.

Files originally specified with an absolute path are placed into the top level directory.

The Makeflow language is very similar to Make, but it does have a few important differences that you should be aware of.

Get the Dependencies Right

You must be careful to accurately specify all of the files that a rule
requires and creates, including any custom executables. This is because
Makeflow requires all these information to construct the environment for a
remote job. For example, suppose that you have written a simulation program
called mysim.exe that reads calib.data and then produces and
output file. The following rule won't work, because it doesn't inform Makeflow
what files are neded to execute the simulation:

However, the following is correct, because the rule states all of the files
needed to run the simulation. Makeflow will use this information to construct
a batch job that consists of mysim.exe and calib.data and
uses it to produce output.txt:

Note that when a directory is specified as an input dependency, it
means that the command relies on the directory and all of its contents.
So, if you have a large collection of input data, you can place it
in a single directory, and then simply give the name of that directory.

No Phony Rules

For a similar reason, you cannot have "phony" rules that don't actually
create the specified files. For example, it is common practice to define a
clean rule in Make that deletes all derived files. This doesn't make
sense in Makeflow, because such a rule does not actually create a file named
clean. Instead use the -c option as shown above.

Just Plain Rules

Makeflow does not support all of the syntax that you find in various
versions of Make. Each rule must have exactly one command to execute. If you
have multiple commands, simply join them together with semicolons. Makeflow
allows you to define and use variables, but it does not support pattern rules,
wildcards, or special variables like $< or $@. You simply
have to write out the rules longhand, or write a script in your favorite
language to generate a large Makeflow.

Local Job Execution

Certain jobs don't make much sense to distribute. For example, if you have
a very fast running job that consumes a large amount of data, then it should
simply run on the same machine as Makeflow. To force this, simply add the word
LOCAL to the beginning of the command line in the rule.

Rule Lexical Scope

Variables in Makeflow have global scope, that is, once defined, their value
can be accessed from any rule. Sometimes it is useful to define a variable
locally inside a rule, without affecting the global value. In Makeflow, this
can be achieved by defining the variables after the rule's requirements, but
before the rule's command, and prepending the name of the variable with
@, as follows:

Environment Variables

Environment variables can be defined with the export keyword inside a workflow. Makeflow will communicate explicitly named environment variables
to remote batch systems, where they will override whatever local setting is present. For example, suppose you want to modify the PATH for every job in the makeflow:
export PATH=/opt/tools/bin:${PATH}
If no value is given, then the current value of the environment variable is passed along to the job:
export USER

When the underlying batch system supports it,
Makeflow supports "Remote Renaming" where the name
of a file in the execution sandbox is different than
the name of the same file in the DAG.
This is currently supported by the Work Queue, Condor, and Amazon batch system.

For example, local_name->remote_name indicates that the file local_name is called remote_name
in the remote system. Consider the following example:

The first rule runs locally, using the executable myprog and the
local file a.in to locally create b.out. The second rule
runs remotely, but the remote system expects a.in to be named
in1, c.out, to be named out and so on. Note that we
did not need to rename the file b.out. Without remote file renaming,
we would have to create either a symbolic link, or a copy of the files with the
expected correct names.

Nested Makeflows

One Makeflow can be nested inside of another by writing a rule
with the following syntax:
output-files: input-files
MAKEFLOW makeflow-file [working-dir]
The input and output files are specified as usual, describing the files
consumed and created by the child makeflow as a whole. Then, the MAKEFLOW keyword introduces the child makeflow specification, and an
optional working directory into which the makeflow will be executed.
If not given, the current working directory is assumed.

Makeflow has the capability of automatically setting the cores, memory,
and disk space requirements to the underlying batch system (currently this only
works with Work Queue and Condor). Jobs are grouped into job categories
, and jobs in the same category have the same cores, memory, and disk
requirements.

Job categories and resources are specified with variables. Jobs are assigned
to the category named in the value of the variable CATEGORY. Likewise, the
values of the variables CORES, MEMORY (in MB), and DISK (in MB) describe the
resources requirements for the category specified in CATEGORY.

Jobs without an explicit category are assigned to default. Jobs in
the default category get their resource requirements from the value of the
environment variables CORES, MEMORY, and DISK.

Resources Specified

Makeflow supports a JSON workflow format in addition to the Make-inspired one.
To use the JSON workflow format, invoke Makeflow as
makeflow --json input.json
An example JSON workflow is available here.
This workflow is equivalent to the convert example above.
In general, the Make-style format is easier for humans to read, while the JSON format is simpler to parse and generate.

Makeflow can also evaluate a workflow as JX.
JX is a superset of JSON, so this evaluation step will leave normal JSON unchanged.
Invoking Makeflow with the --jx option will evaluate the input workflow from JX to JSON before parsing,
allowing for more compact workflow representations.
The JSON example above was evaluated from this JX workflow.

All Objects are assumed to have String keys.
All keys are optional unless otherwise specified.

A workflow is encoded as a JSON object. Makeflow interprets the following keys.
All workflow-level keys are optional, though a workflow with no production rules does nothing.

"categories": Object<Category>

"default_category": String

"environment": Object<String>

"rules": Array<Rule>

Each rule can specify a category for resource requirements.
The "categories" field defines the categories used in the Makeflow (see the Categories section for details).

"default_category" is the category used if none is specified for a rule.
If not provided, the default category is "default".
This string value SHOULD match one of the keys of the "categories" object.
If there is no corresponding category defined, default values will be filled in.

"environment" specifies environment variables to be set when executing all rules.

"rules" is an unordered collection of rules comprising the Makeflow.
Each entry corresponds to a single rule in the traditional Make style.
The contents of each Rule are discussed in the corresponding section.

Categories

A Category is encoded as a JSON object with the following keys.

"environment": Object<String>

"resources": Resources

"environment" gives environment variables to set when executing rules in this category.
These values override workflow-level environment variables.

Rules

A Rule is encoded as a JSON object with the following keys.

"inputs": Array<File>

"outputs": Array<File>

"command": String

"local_job": Boolean

"category": String

"resources": Resources

"allocation": String

"environment": Object<String>

"inputs" and "outputs" specify the files required and produced, respectively, by the rule.
Makeflow will ensure that all inputs either exist or are created before executing the rule.
After execution finishes, Makeflow will ensure that all outputs really exist.

Every rule requires a "command"
which is a string command to run in a shell.

If "local_job" is specified, Makeflow will only execute the rule locally, i.e. on the host where Makeflow is running.
Local jobs are not scheduled on remote resources such as batch systems.

"category" specifies the category of a rule for resource allocation purposes.

"resources" specifies the specific resource requirements of this rule.
These values override those specified in the rule’s category