GC3Apps provide two scripts to drive execution of applications
(protocols, in Rosetta terminology) from the Rosetta
bioinformatics suite.

The purpose of grosetta and gdocking is to
execute several concurrent runs of minirosetta or docking_protocol
on a set of input files, and collect the generated output. These runs
are performed in parallel using every available GC3Pie resource;
you can of course control how many runs should be executed and select
what output files you want from each one.

The grosetta and gdocking execute several runs
of minirosetta or docking_protocol on a set of input files, and
collect the generated output. These runs are performed in parallel,
up to a limit that can be configured with the -Jcommand-line
option. You can of course control how many runs should be
executed and select what output files you want from each one.

Note

The grosetta and gdocking scripts are very
similar in usage. In the following, whatever is written about
grosetta applies to gdocking as well; the
differences will be pointed out on a case-by-case basis.

In more detail, grosetta does the following:

Reads the session (specified on the command line with the
--session option) and loads all stored jobs into memory.
If the session directory does not exist, one will be created with
empty contents.

Scans the input file names given on the command-line, and generates
a number of identical computational jobs, all running the same
Rosetta program on the same set of input files. The objective is
to compute a specified number P of decoys of any given PDB file.

The number P of wanted decoys can be set with the
--total-decoys option (see below). The option
--decoys-per-job can set the number of decoys that each
computational job can compute; this should be a guessed based on
the maximum allowed run time of each job and the time taken by the
Rosetta protocol to compute a single decoy.

Updates the state of all existing jobs, collects output from
finished jobs, and submits new jobs generated in step 2.

Finally, a summary table of all known jobs is printed. (To control
the amount of printed information, see the -l command-line
option in the Introduction to session-based scripts section.)

If the -C command-line option was given (see below), waits
the specified amount of seconds, and then goes back to step 3.

The program grosetta exits when all jobs have run to
completion, i.e., when the wanted number of decoys have been
computed.

Execution can be interrupted at any time by pressing Ctrl+C.
If the execution has been interrupted, it can be resumed at a later
stage by calling grosetta with exactly the same
command-line options.

The gdocking program works in exactly the same way, with
the important exception that gdocking uses a separate
Rosettadocking_protocol program invocation per input file.

The 1st argument is the flags file, containing options to pass to
every executed Rosetta program;

then follows any number of input files (copied from your PC to the execution site);

then a literal colon character :;

finally, you can list any number of output file patterns (copied
back from the execution site to your PC); wildcards (e.g., *.pdb)
are allowed, but you must enclose them in quotes. Note that:

you can omit the output files: the default is "*.pdb""*.sc""*.fasc"

if you omit the output files patterns, omit the colon as well

Example 1. The following command-line invocation uses
grosetta to run minirosetta on the molecule files
1bjpA.pdb, 1ca7A.pdb, and 1cgqA.pdb. The flags file
(1st command-line argument) is a text file containing options to pass
to the actual minirosetta program. Additional input files are
specified on the command line between the flags file and the PDB
input files.

$ grosetta flags alignment.filt query.fasta query.psipred_ss2 boinc_aaquery03_05.200_v1_3.gz boinc_aaquery09_05.200_v1_3.gz 1bjpA.pdb 1ca7A.pdb 1cgqA.pdb
You can see that the listing of output patterns has been omitted,
so `grosetta`:command: will use the default and retrieve all
`*.pdb`:file:, `*.sc`:file: and `*.fasc`:file: files.

There will be a number of identical jobs being executed as a result
of a grosetta or gdocking invocation; this
number depends on the ratio of the values given to options -P
and -p:

-P NUM, --total-decoys NUM

Compute NUM decoys per input file.

-p NUM, --decoys-per-job NUM

Compute NUM decoys in a single job (default: 1). This
parameter should be tuned so that the running time of
a single job does not exceed the maximum wall-clock
time (see the --wall-clock-time command-line option in
Introduction to session-based scripts).

If you omit -P and -p, they both default to 1, i.e.,
one job will be created (as in the example 1. above).

Example 2. The following command-line invocation will run 3 parallel
instances of minirosetta, each of which generates 2 decoys (save the
last one, which only generates 1 decoy) of the molecule described in
file 1bjpA.pdb:

In this example, job information is stored into session
SAMPLE_SESSION (see the documentation of the --session option
in Introduction to session-based scripts). The command above creates the jobs,
submits them, and finally prints the following status report:

Note that the status report counts the number of jobs in the
session, not the total number of decoys being generated. (Feel
free to report this as a bug.)

Calling grosetta over and over again will result in the same jobs
being monitored; to create new jobs, change the command line and raise
the value for -P or -p. (To completely erase an existing
session and start over, use the --new-session option, as per
session-based script documentation.)

The -C option tells grosetta to continue running until
all jobs have finished running and the output files have been
correctly retrieved. On successful completion, the command given in
example 2. above, would print:

The three jobs are named 0--1, 2--3 and 4--5 (you could
see this by passing the -l option to grosetta); each of
these jobs will create an output directory named after the job.

In general, grosetta jobs are named N--M with
N and M being two integers from 0 up to the value specified with
option --total-decoys. Jobs generated by gdocking are
instead named after the input file, with a .N--M suffix
added.

For each job, the set of output files is automatically retrieved and
placed in the locations described below.

Note

The naming and contents of output files differ between
grosetta and gdocking. Refer to the
appropriate section below!

The minirosetta.static.log file contains the output log of the
minirosetta execution. For each of the S_*.pdb files above, a
line like the following should be present in the log file (the file
name and number of elapsed seconds will of course vary!):

The -sexample option tells grosetta to store
information about the computational jobs in the example.jobs
directory.

The -C120 option tells grosetta to update job state
every 120 seconds; output from finished jobs is retrieved and new jobs
are submitted at the same interval.

The -P1 and -p1 options set the total number of decoys to
compute and the maximum number of decoys that a single computational
job can handle. These values can be arbitrarily high (however the p
value should be such that the computational job can actually compute
that many decoys in the allotted wall-clock time).

The above command will start by printing a status report like the
following:

Status of jobs in the 'example.csv' session:
SUBMITTED 1/1 (100.0%)

It will continue printing an updated status report every 120 seconds
until the requested number of decoys (set by the -P option) has
been computed.

In GC3Pie terminology when a job is finished and its output has been
successfully retrieved, the job is marked as TERMINATED:

We now show how one can obtain the same result by calling
grosetta multiple times (there could be hours of
interruption between one invocation and the next one).

Note

This is not the typical mode of operating with grosetta,
but may still be useful in certain settings.

Create a session (1 job only, since no -P option is given); the
session name is chosen with the -s (short for --session)
option. You should take care of re-using the same session name
with subsequent commands.

Now we call grosetta again, and request that 3 decoys be
computed starting from a single PDB file (--total-decoys3 on
the command line). Since we are submitting a single PDB file, the
3 decoys will be computed all in a single run, so the
--decoys-per-job option will have value 3.

Note that 3 jobs were submitted: grosetta interprets the
--total-decoys option globally, and adds one job to compute the
2 missing decoys from the file set from step 1. (This is currently
a limitation of grosetta)

From here on, one could simply run grosetta-C120 and let it
manage the session until completion of all jobs, as in the example
Manage a set of jobs from start to end above. For the sake of
showing how the use of several command-line options of
grosetta, we shall further show how manage the session
by repeated separate invocations.

Next step is to monitor the session, so we add the command-line
option -l which tells grosetta to list all the jobs
with their status. Also note that we keep the -sexample
option to tell grosetta that we would like to operate on
the session named example.

All non-option arguments can be omitted: as long
as the total number of decoys is unchanged, they’re not needed.