A variable object holds a data array and a
VariableNode object of
a computational graph. If the variable is constructed by the user, the node
is root and does not hold any parent. If the variable is constructed by a
FunctionNode object (i.e., by calling functions under
chainer.functions or user-defined functions), or by using operators
(see the list below), the node holds a reference to its parent called
creator_node.
This reference is used in backpropagation to backtrack the graph.

Users can disable (resp. enable) this chaining behavior by calling
no_backprop_mode() (resp.
force_backprop_mode()).
In the former context, a variable never creates a computational graph,
whereas in the latter context, it is forced to create.

It only supports types that are supported by CUDA’s atomicAdd when
an integer array is included in slices.
The supported types are numpy.float32, numpy.int32,
numpy.uint32, numpy.uint64 and numpy.ulonglong.

This method adds the gradient of a given variable to the gradient of
this variable. The accumulation is even done across the host and
different devices. If this variable has uninitialized data/grad arrays,
this method initializes it with the shape of the given variable and
then accumulates the gradient.

On backprop,
FunctionNode.backward()
is called on each FunctionNode object appearing in
the backward graph starting from this variable.
The backward graph is represented by backward
references from variable nodes to their creators, and from function
nodes to their input variable nodes. The backprop stops at all root
nodes. Some function nodes set None as gradients of some inputs,
where further backprop does not take place at such inputs.

This method uses grad as the initial error array. User can
manually set a gradient array before calling this method.
If the shape of data is () (i.e., it is scalar) and
grad is None, then this method automatically complements
1.0 as the initial error. This is useful on starting backprop from
some scalar loss value.

If True, the gradient arrays of all
intermediate variables are kept.
Otherwise, grad of the
intermediate variables are set to None on appropriate
timing, which may reduce the maximum memory consumption.

In most cases of training some models, the purpose of backprop
is to compute gradients of parameters, not of all variables,
and therefore it is recommended to set this flag False.

enable_double_backprop (bool) – (Added in v3.0) If True,
computational trace of the whole backpropagation procedure is
recorded to the computational graph so that one can further do
backpropagation from the resulting gradients. Note that
enabling it results in larger memory consumption needed to
store the gradients w.r.t intermediate variables that are
required for the second gradient computation.

loss_scale (float) – Loss scaling factor. Loss scaling is a usefull
technique to mitigate vanishing gradient issue that tends to
happen when low precision data type like float16 is used during
training. If you set loss scaling factor, gradients of loss
values are to be multiplied by the factor before backprop
starts. The factor is propagated to whole gradients in a
computational graph along the backprop. The gradients of
parameters are divided by the factor just before the parameters
are to be updated.

This method copies the data array from given variable to this variable.
The copy is done even if the arrays reside on different devices,
including across the host and a GPU device. If this variable has an
uninitialized data array, this method initializes it by the data array
of the given variable. Similarly, if the given variable has an
uninitialized data array, this method initializes it by the data array
of this variable (self). If both are uninitialized, this method
does nothing.

After this method completes, intermediate variable nodes and functions
that are not referenced from anywhere are deallocated by reference
count GC. Also this variable itself deletes the reference to its
creator function from the node, i.e. the node becomes root in the
computation graph. It indicates that backprop after unchaining stops at
this variable. This behavior is useful to implement truncated BPTT.

Note that using this attribute directly is discouraged; use
array instead. Using array, you can find an error
earlier when your code mixes up Variable and ndarray because
ndarray does not have an attribute .array while it has
.data.