A method for setting up an optimizer (introduced in the previous beta) has been reverted to avoid breaking backward compatibility; instead we have introduced a simpler syntactic sugar. See #4141 for details.

Assets

This is the release of v4.0.0b2. See here for the complete list of solved issues and merged PRs.

In this release, you can set up an optimizer with a simpler syntax. In previous versions, the code would be written as

optimizer = chainer.optimizer.SGD()
optimizer.setup(model)

We now also allow it to be written more concisely as

optimizer = chainer.optimizers.SGD(link=model)

The link argument should be specified as a keyword argument. Otherwise, some optimizers could wrongly interpret it as a hyperparameter (e.g. lr). We will enforce a keyword argument from the next release.

We introduced a check for mixed use of CuPy arrays and NumPy arrays in outputs returned from functions. Even though we have previously forbidden this, such functions may have worked without any errors. With the introduction of this check, however, those functions can begin raising errors.

Known Issues

Grouped convolution/deconvolution does not work in CPU mode with NumPy 1.9 (#4081). This issue is planned to be resolved in the next release.

Double backward support for many functions (see the list of almost all functions which support double backward in the v3.0.0rc1 release notes, and the others are listed below.)

As for the backward compatibility, most users of v2.x are not affected by the introduction of new-style function FunctionNode because the conventional Function is still supported in v3 (and in the future versions). Even if you are using custom functions written with Function, you can continue running the same code with Chainer v3.0.0. You need to rewrite such custom functions only when you want to use new features added to the new-style function, e.g. double backprop.

The backward compatibility of the overall APIs is slightly broken, though most users are not affected. See the above release notes for the details of broken compatibility.

Examples of grad of grad in Chainer

Usage of the grad function

You can calculate gradients of any variables in a computational graph w.r.t. any other variables in the graph using the chainer.grad function with enable_double_backprop=True option.

The loss function of WGAN-GP

WGAN-GP (which stands for Wasserstein GAN with Gradient Penalty[1]) is one example of a GAN that uses gradients of gradients when calculating the loss. It penalizes the gradient norm for enforcing the Lipschitz constraint. The gradient norm is computed at a random interpolation x_hat between a generated point x_tilde and a real example x. Then, the loss including the penalty term will be further differentiated w.r.t. trainable parameters in the model, so that it actually performs double backward for the discriminator. The code below shows how to implement it using the backward() method with enable_double_backprop=True option:

The performance of numerical_grad is improved (#2966). It now performs numerical check of a randomly chosen directional derivative instead of the full gradient check. This change reduces the number of forward computations run for numerical gradient to a constant of the input dimensionality.

Make double backprop support optional in Variable.backward() (#3298). To enable double backprop, you have to explicitly pass enable_double_backprop=True. Note that when you do not need double backprop, it is better to turn off this option, then backward() skips constructing the computational graph of backpropagation so that the performance overhead (esp. the memory consumption) is saved.

You can write your own function node by implementing a subclass of FunctionNode. The following is a simple example of writing an elementwise multiplication function (which is already provided by this beta version):

There are mainly three differences from the conventional definition using Function.

The index (or target_input_indexes as the full name) is added. It indicates the set of inputs for which gradients are required. There are two ways to return gradients by backward: gradients for all inputs, or gradients for inputs selected by indexes. In the latter case, you can skip computing the gradients for inputs not listed in indexes.

The backward method implements computation on top of Variable instead of ndarray so that the resulting gradients can be further backpropagated. The grad_outputs is a tuple of Variable s, and the new get_retained_inputs() and get_retained_outputs() methods return a tuple of Variable s corresponding to retained inputs/outputs. Note that the inputs are not retained by default (which is also different from Function).

The forward computation is invoked by apply() method instead of __call__() operator.

There is also a variant of backward() method named backward_accumulate() which includes the accumulation of input gradients to existing ones. It enables us to improve the performance in some case.

This change also provides the following changes.

A new class FunctionAdapter provides an implementation of FunctionNode interface on top of Function interface. It can be used to convert Function into new-style function nodes. Note that it does not mean the converted function supports differentiable backprop; it is required to rewrite the implementation with FunctionNode directly to support it.

Function.__call__ is updated so that users do not need to update their implementation of custom Function definitions; it automatically creates a FunctionAdapter object, lets the adapter wrap the Function object itself, and inserts the adapter object (which implements FunctionNode) into the computational graph.

Currently, only elementwise addition and multiplication (+ and *) and F.identity (which exists just for testing purpose) supports differentiable (and economical) backprop. We are planning to widen the set of functions with differentiable backprop support in the up-coming releases.

Note that this change breaks the object structure of the computational graph; now FunctionNode objects act as function nodes in the computational graph, and Function is just an object referenced by a FunctionAdapter object (which implements FunctionNode).

New features

When using Trainer, any exceptions raised during training are now immediately shown before entering the finalization procedures. It helps users to know the cause of the error before waiting for the finalization which sometimes hangs up (esp. when using multiprocessing) (#2216)

Support a mask pattern shared among examples within each batch in F.simplified_dropconnect (#2534, thanks @fukatani!)

L.Classifier is extended so that users can feed multiple input features. An argument that should be treated as the ground truth labels is specified by label_key option. Keyword arguments are also supported. (#2834)