OOMMF 2D Micromagnetic Solver Batch System

The OOMMF Batch System (OBS) provides a scriptable interface
to the same micromagnetic solver engine used by
mmSolve2D, in the
form of three Tcl applicatons
(batchmaster, batchslave, and
batchsolve) that provide support for complex job scheduling.
All OBS script files are in the OOMMF distribution directory
app/mmsolve/scripts.

Unlike much of the OOMMF package, the OBS is meant to be
driven primarily from the command line or shell (batch) script.
OBS applications are launched from the command line using the
bootstrap application.

Whether or not to explicitly call exit at bottom of batchsolve.tcl.
When launched from the command line, the default is to exit after
solving the problem in file. When sourced into another script,
like batchslave.tcl, the default is to wait for the caller script
to provide further instructions.

-interface <0|1>

Whether to register with the account service
directory application, so
that mmLaunch, can provide
an interactive interface. Default = 1 (do register), which will
automatically start account service directory and
host service directory applications as necessary.

-start_paused

Pause solver after loading problem.

-end_paused

Pause solver and enter event loop at bottom of batchsolve.tcl
rather than just falling off the end (the effect of which will
depend on whether or not Tk is loaded).

-restart <0|1>

Determines solver behavior when a new problem is loaded. If 1, then
the solver will look for basename.log
and basename*.omf files to
restart a previous run from the last
saved state (where basename is the ``Base Output
Filename'' specified in the input problem specification). If these
files cannot be found, then a warning is issued and the solver falls
back to the default behavior (equivalent to -restart 0) of
starting the problem from scratch. The specified -restart
setting holds for all problems fed to the solver, not just
the first.

file

Immediately load and run the specified MIF 1.1
file.

The input file file should contain a
Micromagnetic Input Format 1.1 problem
description, such as produced by
mmProbEd. The batch solver
searches several directories for this file, including the current
working directory, the data and
scripts subdirectories, and parallel directories relative to the
directories app/mmsolve and
app/mmpe in the OOMMF distribution. Refer to the
mif_path variable in batchsolve.tcl for the complete list.

If -interface is set to 1 (enabled), batchsolve registers
with the account service directory
application, and mmLaunch will be able to provide an interactive
interface. Using this interface, batchsolve may be controlled in
a manner similar to mmSolve2D. The
interface allows you to pause,
un-pause, and terminate the current
simulation, as well as to attach data display applications to monitor
the solver's progress. If more interactive control is needed,
mmSolve2D should be used.

If -interface is 0 (disabled), batchsolve does not register,
leaving it without an interface, unless it is sourced into another
script (e.g., batchslave.tcl) that
arranges for an interface on the behalf of batchsolve.

Use the -start_paused switch to monitor the progress of
batchsolve from the very start of a simulation. With this
switch the solver will be paused immediately after loading the
specified MIF file, so you can bring up the interactive interface
and connect display applications before the simulation begins. Start the
simulation by selecting the Run command from the interactive
interface. This option cannot be used if -interface is disabled.

The -end_paused switch insures that the solver does
not automatically terminate after completing the specified
simulation. This is not generally useful, but may find application
when batchsolve is called from inside a Tcl-only wrapper
script.

Note on Tk dependence: If a problem is loaded that uses a
bitmap mask file, and
if that mask file is not in the PPM P3 (text) format, then
batchsolve will launch any2ppm to convert it into
the PPM P3 format. Since any2ppm
requires Tk, at the
time the mask file is read a valid display must be available. See the
any2ppm documentation for details.

Output
The output may be changed by a Tcl wrapper
script, but the default
output behavior of batchsolve is to write tabular text data and
the magnetization state at the control point for each applied field
step. The tabular data are appended to the file basename.odt, where basename is the
``Base Output Filename'' specified in the input
MIF 1.1 file. See the routine GetTextData in
batchsolve.tcl for details, but at present the output consists of
the solver iteration count, nominal applied field
B, reduced average
magnetization m, and total
energy. This output is in the ODT file format.

The magnetization data are written to a series of OVF (OOMMF Vector
Field) files,
basename.fieldnnnn.omf, where nnnn
starts at 0000 and is incremented at each applied
field step. (The ASCII text header inside each
file records the nominal applied field at that step.) These files are
viewable using mmDisp.

The solver also automatically appends the input problem specification
and miscellaneous runtime information to the log file
basename.log.

Programmer's interfaceIn addition to directly launching batchsolve from the command
line, batchsolve.tcl may also be sourced into another Tcl script
that provides additional control structures. Within the scheduling
system of OBS,
batchsolve.tcl is sourced into batchslave, which provides
additional control structures that support scheduling control by
batchmaster.
There are several variables and routines
inside batchsolve.tcl that may be accessed and redefined from such
a wrapper script to provide enhanced functionality.

Global variables

mif

A Tcl handle to a global mms_mif
object holding the problem description defined by the input
MIF 1.1 file.

Refer to the source code and sample scripts for details on manipulation
of these variables.

Batchsolve procsThe following Tcl procedures are designed for external use and/or
redefinition:

SolverTaskInit

Called at the start of each task.

BatchTaskIterationCallback

Called after each iteration in the simulation.

BatchTaskRelaxCallback

Called at each control point reached in the simulation.

SolverTaskCleanup

Called at the conclusion of each task.

FindFile

Searches the directories specified by the global variable
search_path for a specified file. The default
SolverTaskInit proc uses this routine to locate the requested
input MIF file.

SolverTaskInit and SolverTaskCleanup accept an arbitrary
argument list (args), which is copied over from the args
argument to the BatchTaskRun and BatchTaskLaunch procs in
batchsolve.tcl. Typically one copies the default procs (as needed)
into a task script, and makes appropriate
modifications. You may (re-)define these procs either before or after
sourcing batchsolve.tcl.

Overview
The OBS supports complex scheduling of multiple batch jobs
with two applications, batchmaster and batchslave.
The user launches batchmaster and provides it with
a task script. The task script is a
Tcl script that describes the set of tasks for batchmaster
to accomplish. The work is actually done by instances of
batchslave that are launched by batchmaster.
The task script may be
modeled after the included simpletask.tcl or multitask.tclsample scripts.

The OBS has been designed to control multiple sequential and
concurrent micromagnetic simulations, but
batchmaster and batchslave are completely general
and may be used to schedule other types of jobs as well.

specifies the network address for the master to use (default is localhost),

port

is the port address for the master (default is 0, which
selects an arbitrary open port).

When batchmaster is run, it
sources the task script. Tcl commands in the task script
should modify the global object $TaskInfo
to inform the master what tasks to perform and
optionally how to launch slaves to perform those tasks.
The easiest way to create a task script is to modify one of the
included example scripts. More detailed instructions are in
the Batch task
scripts section.

After sourcing the task script, batchmaster launches all the
specified slaves, initializes each with a slave initialization script,
and then feeds tasks sequentially from the task list to the slaves.
When a slave completes a task it reports back to the master and is given
the next unclaimed task. If there are no more tasks, the slave is shut
down. When all the tasks are complete, the master prints a summary of
the tasks and exits.

When the task script requests the launching and controlling of jobs off
the local machine, with slaves running on remote machines, then the
command line argument hostmust be set to the local machine's
network name, and the $TaskInfo methods AppendSlave and
ModifyHostList will need to be called from inside the task script.
Furthermore, OOMMF does not currently supply any methods for launching
jobs on remote machines, so a task script which requests the launching
of jobs on remote machines requires a working
rsh command or
equivalent.
(Details.)

The name of an optional script to source (which actually performs the
task the slave is assigned), and any arguments it needs.

In normal operation, the user does not launch
batchslave. Instead, instances of batchslave are
launched by batchmaster as instructed by a task script.
Although batchmaster may launch any slaves requested
by its task script, by default it launches instances of
batchslave.

The function of batchslave is to make a connection to
a master program, source the auxscript and pass it the
list of arguments aux_arg .... Then it receives commands
from the master, and evaluates them, making use of the
facilities provided by auxscript. Each command is typically a
long-running one, such as solving a complete micromagnetic problem.
When each command is complete, the batchslave reports back to
its master program, asking for the next command. When the master
program has no more commands batchslave terminates.

Inside batchmaster, each instance of batchslave is
launched by evaluating a Tcl command. This command is called
the spawn command, and it may be redefined by the task script
in order to completely control which slave applications are launched
and how they are launched. When batchslave is to be launched,
the spawn command might be:

The Tcl command exec is used to launch subprocesses. When
the last argument to exec is &, the subprocess runs in
the background. The rest of the spawn command should look familiar
as the command line syntax for launching batchslave.

The example spawn command above cannot be completely provided by
the task script, however, because parts of it are only known
by batchmaster. Because of this, the task script should
define the spawn command using ``percent variables'' which are
substituted by batchmaster. Continuing the example, the task
script provides the spawn command:

batchmaster replaces %tclsh with the path to tclsh,
and %oommf with the path to the OOMMF bootstrap application.
It also replaces %connect_info with the five arguments from --
through $password that provide batchslave
the hostname and port where batchmaster is waiting for
it to report to, and the ID and password it should pass back.
In this example, the task script instructs batchslave to source the
file batchsolve.tcl and pass it the arguments -restart 1.
Finally, batchmaster always appends the argument & to
the spawn command so that all slave applications are launched in the
background.

The communication protocol between
batchmaster and batchslave is evolving and is not
described here. Check the source code for the latest details.

Batch Task Scripts

The application batchmaster
creates an instance of a BatchTaskObj object with
the name $TaskInfo. The task script uses
method calls to this object to set up tasks to be performed. The only
required call is to the AppendTask method, e.g.,

$TaskInfo AppendTask A "BatchTaskRun taskA.mif"

This method expects two arguments, a label for the task (here ``A'') and
a script to accomplish the task.
The script will be passed across a
network socket from
batchmaster to a slave application, and
then the script will be interpreted by the slave. In particular, keep
in mind that the file system seen by the script will be that of the
machine on which the slave process is running.

This example uses the default batchsolve.tcl procs to run the
simulation defined by the taskA.mif MIF 1.1
file. If you want to make changes to the MIF problem specifications on the fly, you will need to modify the default
procs. This is done by creating a slave initialization script, via the
call

$TaskInfo SetSlaveInitScript { <insert script here> }

The slave initialization script does global initializations, and also
usually redefines the SolverTaskInit proc; optionally the
BatchTaskIterationCallback, BatchTaskRelaxCallback and
SolverTaskCleanup procs may be redefined as well. At the start of
each task SolverTaskInit is called by BatchTaskRun (in
batchsolve.tcl), after each iteration
BatchTaskIterationCallback is executed, at each control
pointBatchTaskRelaxCallback is
run, and at the end of each task SolverTaskCleanup is called.
SolverTaskInit and SolverTaskCleanup are passed the arguments
that were passed to BatchTaskRun. A simple SolverTaskInit
proc could be

This proc receives the exchange constant A
for this task on the argument list, and makes use of the global
variables mif and basename. (Both should be initialized in
the slave initialization script outside the SolverTaskInit proc.)
It then stores the requested value of A in the
mif object, sets up the base filename to use for
output, and opens a text file to which tabular
data will be appended. The handle to this text
file is stored in the global outtextfile, which is closed by the
default SolverTaskCleanup proc. A corresponding task script could
be

$TaskInfo AppendTask "A=13e-12 J/m" "BatchTaskRun 13e-12"

which runs a simulation with A set to
13e-12 J/m.
This example is taken from the multitask.tclsample script. (For
commands accepted by mif objects, see the file mmsinit.cc.
Another object than can be gainfully manipulated is solver, which
is defined in solver.tcl.)

If you want to run more than one task at a time, then the
$TaskInfo method AppendSlave will have to be invoked. This
takes the form

$TaskInfo AppendSlave <spawn count> <spawn command>

where <spawn command> is the command to launch the slave
process, and <spawn count> is the number of slaves to launch
with this command. (Typically <spawn count> should not be
larger than the number of processors on the target system.) The default
value for this item (which gets overwritten with the first call to
$TaskInfo AppendSlave) is

1 {Oc_Application Exec batchslave -tk 0 %connect_info batchsolve.tcl}

The Tcl command Oc_Application Exec is supplied by OOMMF and provides access to the same application-launching capability
that is used by the OOMMF
bootstrap application. Using a <spawn command> of
Oc_Application Exec instead of exec %tclsh %oommf
saves the spawning of an additional process.
The default <spawn command>
launches the batchslave
application, with connection information provided by batchmaster, and
using the auxscriptbatchsolve.tcl.

Before evaluating the <spawn command>, batchmaster
applies several percent-style substitutions useful in slave
launch scripts: %tclsh, %oommf, %connect_info, %oommf_root, and
%%. The first is the Tcl shell to use, the second is an absolute
path to the OOMMF bootstrap program on the master machine, the third
is connection information needed by the batchslave application, the
fourth is the path to the OOMMF root directory on the master machine,
and the last is interpreted as a single percent.
batchmaster automatically appends the argument
& to the
<spawn command> so that the slave applications
are launched in the background.

To launch batchslave on a remote host, use rsh
in the spawn command, e.g.,

This example assumes tclsh is in the execution path on the remote
machine foo, and OOMMF is installed off of your home directory.
In addition, you will have to add the machine foo to the host
connect list with

$TaskInfo ModifyHostList +foo

and batchmaster must be run with the network interface specified
as the server host (instead of the default localhost), e.g.,

tclsh oommf.tcl batchmaster multitask.tcl bar

where bar is the name of the local machine.

This may seem a bit complicated, but the examples in the
next section should make things clearer.

Sample task scripts

The
first sample task script is a simple example that runs the
3 micromagnetic simulations described by the MIF 1.1 files
taskA.mif, taskB.mif and taskC.mif. It
is launched with the command

tclsh oommf.tcl batchmaster simpletask.tcl

This example uses the default slave launch script, so a single slave is
launched on the current machine, and the 3 simulations will be run
sequentially. Also, no slave initialization script is given, so the
default procs in batchsolve.tcl are used. Output will be magnetization
states and tabular data
at each control point, stored in
files on the local machine with base names as specified in the MIF files.

The
second sample task script builds on the previous example by
defining BatchTaskIterationCallback and
BatchTaskRelaxCallback procedures in the slave init script.
The first set up to write tabular data every 10 iterations, while the
second writes tabular data on each control point event. The data is
written to the output file specified by the Base Output Filename
entry in the input MIF files. Note that there is no magnetization
vector field output in this example. This task script is launched the
same way as the previous example:

The
third task script is a more complicated example
running concurrent processes on two
machines. This script should be run with the command

tclsh oommf.tcl batchmaster multitask.tcl bar

where bar is the name of the local machine.

Near the top of the multitask.tcl script several Tcl variables
(RMT_MACHINE through A_list) are defined; these are used
farther down in the script. The remote machine is specified as
foo, which is used in the $TaskInfo AppendSlave and
$TaskInfo ModifyHostList commands.

There are two AppendSlave commands, one to run two slaves on the
local machine, and one to run a single slave on the remote machine
(foo). The latter changes to a specified
working directory before
launching the batchslave application on the remote machine. (For
this to work you must have rsh configured properly. In the future
it may be possible to launch remote commands using the OOMMF account
server application, thereby lessening the reliance on system commands
like rsh.)

Below this the slave initialization script is defined. The Tcl regsub command is used to place the task script defined value of
BASEMIF into the init script template. The init script is run on
the slave when the slave is first brought up. It first reads the base
MIF file into a newly created mms_mif instance. (The MIF file
needs to be accessible by the slave process, irrespective of which
machine it is running on.) Then replacement SolverTaskInit and
SolverTaskCleanup procs are defined. The new SolverTaskInit
interprets its first argument as a value for the exchange constant
A. Note that this is different from the default
SolverTaskInit proc, which interprets its first argument as the
name of a MIF 1.1 file to load. With this task
script, a MIF file is read once when the slave is brought up, and then
each task redefines only the value of A for the simulation (and
corresponding changes to the output filenames and data table header).

Finally, the Tcl loop structure

foreach A $A_list {
$TaskInfo AppendTask "A=$A" "BatchTaskRun $A"
}

is used to build up a task list consisting of one task for each value
of A in A_list (defined at the top of the task script). For
example, the first value of A is 10e-13, so the first task
will have the label A=10e-13 and the corresponding script is
BatchTaskRun 10e-13. The value 10e-13 is passed on by
BatchTaskRun to the SolverTaskInit proc, which has been
redefined to process this argument as the value for A, as
described above.

There are 6 tasks in all, and 3 slave processes, so the first three
tasks will run concurrently in the 3 slaves. As each slave finishes
it will be given the next task, until all the tasks are complete.