supervisor

MODULE

supervisor

MODULE SUMMARY

A Behaviour for Supervision of Processes.

DESCRIPTION

A supervisor is a process that supervises
child processes. A child can be another supervisor or a
worker process. A supervisor is always linked to its
children. This structure is used to build a supervision
tree, which is a nice way to structure an application for
fault tolerance.

The basic idea of a supervisor is that it keeps its children
alive. If a child terminates abnormally, it is restarted. There
are three basic types of restart strategies for supervisors,
one-for-one, one-for-all, and
rest-for-one

If a child in a one-for-one
supervisor dies abnormally, it is restarted.

If a child in
a one-for-all supervisor dies, the supervisor shuts down all of
the other children and then restarts all children. This strategy
can be used when there are dependencies among the
children.

If a child in a rest-for-one supervisor dies,
all children started after the faulty child are shut
down, then restarted. The children started before the faulty
child are not affected.

There is yet another restart strategy which is a variant of the
ordinary one-for-one. It is called
simple-one-for-one. It should be used for dynamic
processes of the same type, for example processes which
represent a call. Compared to one-for-one, this type has reduced
overheads in starting dynamic children .

Each child can be one of three types: permanent,
transient, or temporary. A permanent child is
always restarted when it dies. A transient child is restarted if
it dies abnormally, and a temporary child is never restarted.

The supervisors have a built-in mechanism to prevent situations
where a child dies, is restarted by the supervisor, only to die
again for the same reason, is restarted again, and so on. It
limits the number of restarts which can occur in a given time
interval. This is determined by the values of two parameters,
MaxR and MaxT. If more than MaxR restarts
are performed in the last MaxT seconds, then the
supervisor shuts down all the children which it supervises and
then dies.

An instance of the supervisor behaviour can be debugged using
the module sys.

Use this function to dynamically add a child to a
supervisor. The start function Start is supposed to
return {ok, Pid} | {ok, Pid, Info} | ignore | {error,
Reason}. If
ignore is returned, the supervisor ignores the child
and returns {ok, undefined}. The start function is
executed by the supervisor process. It must return a Pid
that is linked to the caller (i.e. the supervisor). The
supervisor uses this link to monitor and control the child.
If {ok, Pid, Info} is returned from the start
function, the same is returned from this function. The
Info is not interpreted in any way by the supervisor.

Name is an internal name, which is used by the
supervisor to identify its children.

Modules is used for the code change procedure. It
should be dynamic if the modules that the child uses
can change dynamically at runtime, for example a
gen_event process. (Note that this refers to the
names of the modules rather than the implementation of the
module.) Otherwise, it should be a list of the module with
which the child is implemented, This information is used by
the release handler to find all processes which execute a
module. For example, if the child is a gen_server,
Modules is a list with the name of the callback
module as its only element.

The Shutdown value infinity must be used with
care. The supervisor tries to shut down the child by calling
exit(Child, shutdown) and waits for the child to
terminate. If the child does not terminate, the supervisor
will hang forever. infinity should be used for
children which themselves are supervisors, but it is not
allowed for workers. This is to make sure that the system
can be shut down without hanging forever.

If the supervisor is a simple_one_for_one
supervisor, this function should be called as
start_child(Supervisor, ExtraStartArgs). It starts a
new child of the same type and calls the child's start
function as apply(M, F, A ++ ExtraStartArgs). M,
F, and A are returned from the supervisor's
init function. The new child does not get a unique
name by which is identified in the supervisor. Therefore,
the functions terminate_child/2,
delete_child/2 and restart_child/2 cannot be
used for a simple_one_for_one supervisor. When a
temporary child dies for any reason or a
transient child dies normally, the child is removed
from the supervisor. Compare this with a ordinary
supervisor, where the child specification remains until
delete_child/2 is called. No progress report is
generated when the child is started. This is to reduce
overheads.

Terminates a child. The child is not removed from the
supervisor's set of children. This means that it can be
restarted explicitly by calling restart_child/2, or
started implicitly if the supervisor has to restart all
children.

Starts a child which has been terminated and not restarted
according to the restart specification. This can include a
temporary child which terminates, or a child that was
terminated explicitly by calling the function
terminate_child/2.

This function returns a supervisor
specification. ChildSpec is as previously defined in
the start_child/2 function. MaxR is the
maximum number of restarts which can be performed within
MaxT seconds.

When the restart strategy is simple_one_for_one, the
list of child specifications must be a list with one element
only. This child is not started during the initialization
phase, but all children are started dynamically. Each
dynamically started child is of the same type, which means
that all children are instances of the initial child
specification. New children are created with a call to
start_child(Supervisor, ExtraStartArgs).

If a child start function returns ignore, the child
is kept in the supervisor's list of children. The child can
be restarted explicitly by calling restart_child/2.
The child is also restarted if the supervisor is
one_for_all and performs a restart of all children,
or if the supervisor is rest_for_one and performs a
restart of this child. The supervisor start-up fails and
terminates if the child start function returns {error,
Reason}

This function can return ignore in
order to inform the parent, especially if it is another
supervisor, that the supervisor is not started according to
configuration data, for instance.

System Events

The supervisor behaviour generates the same system events as
the gen_server behaviour. System events are handled by the
sys module.