A method for setting up an optimizer (introduced in the previous beta) has been reverted to avoid breaking backward compatibility; instead we have introduced a simpler syntactic sugar. See #4141 for details.

Assets

This is the release of v4.0.0b2. See here for the complete list of solved issues and merged PRs.

In this release, you can set up an optimizer with a simpler syntax. In previous versions, the code would be written as

optimizer = chainer.optimizer.SGD()
optimizer.setup(model)

We now also allow it to be written more concisely as

optimizer = chainer.optimizers.SGD(link=model)

The link argument should be specified as a keyword argument. Otherwise, some optimizers could wrongly interpret it as a hyperparameter (e.g. lr). We will enforce a keyword argument from the next release.

We introduced a check for mixed use of CuPy arrays and NumPy arrays in outputs returned from functions. Even though we have previously forbidden this, such functions may have worked without any errors. With the introduction of this check, however, those functions can begin raising errors.

Known Issues

Grouped convolution/deconvolution does not work in CPU mode with NumPy 1.9 (#4081). This issue is planned to be resolved in the next release.

Double backward support for many functions (see the list of almost all functions which support double backward in the v3.0.0rc1 release notes, and the others are listed below.)

As for the backward compatibility, most users of v2.x are not affected by the introduction of new-style function FunctionNode because the conventional Function is still supported in v3 (and in the future versions). Even if you are using custom functions written with Function, you can continue running the same code with Chainer v3.0.0. You need to rewrite such custom functions only when you want to use new features added to the new-style function, e.g. double backprop.

The backward compatibility of the overall APIs is slightly broken, though most users are not affected. See the above release notes for the details of broken compatibility.

Examples of grad of grad in Chainer

Usage of the grad function

You can calculate gradients of any variables in a computational graph w.r.t. any other variables in the graph using the chainer.grad function with enable_double_backprop=True option.

The loss function of WGAN-GP

WGAN-GP (which stands for Wasserstein GAN with Gradient Penalty[1]) is one example of a GAN that uses gradients of gradients when calculating the loss. It penalizes the gradient norm for enforcing the Lipschitz constraint. The gradient norm is computed at a random interpolation x_hat between a generated point x_tilde and a real example x. Then, the loss including the penalty term will be further differentiated w.r.t. trainable parameters in the model, so that it actually performs double backward for the discriminator. The code below shows how to implement it using the backward() method with enable_double_backprop=True option:

The performance of numerical_grad is improved (#2966). It now performs numerical check of a randomly chosen directional derivative instead of the full gradient check. This change reduces the number of forward computations run for numerical gradient to a constant of the input dimensionality.

Make double backprop support optional in Variable.backward() (#3298). To enable double backprop, you have to explicitly pass enable_double_backprop=True. Note that when you do not need double backprop, it is better to turn off this option, then backward() skips constructing the computational graph of backpropagation so that the performance overhead (esp. the memory consumption) is saved.

You can write your own function node by implementing a subclass of FunctionNode. The following is a simple example of writing an elementwise multiplication function (which is already provided by this beta version):

There are mainly three differences from the conventional definition using Function.

The index (or target_input_indexes as the full name) is added. It indicates the set of inputs for which gradients are required. There are two ways to return gradients by backward: gradients for all inputs, or gradients for inputs selected by indexes. In the latter case, you can skip computing the gradients for inputs not listed in indexes.

The backward method implements computation on top of Variable instead of ndarray so that the resulting gradients can be further backpropagated. The grad_outputs is a tuple of Variable s, and the new get_retained_inputs() and get_retained_outputs() methods return a tuple of Variable s corresponding to retained inputs/outputs. Note that the inputs are not retained by default (which is also different from Function).

The forward computation is invoked by apply() method instead of __call__() operator.

There is also a variant of backward() method named backward_accumulate() which includes the accumulation of input gradients to existing ones. It enables us to improve the performance in some case.

This change also provides the following changes.

A new class FunctionAdapter provides an implementation of FunctionNode interface on top of Function interface. It can be used to convert Function into new-style function nodes. Note that it does not mean the converted function supports differentiable backprop; it is required to rewrite the implementation with FunctionNode directly to support it.

Function.__call__ is updated so that users do not need to update their implementation of custom Function definitions; it automatically creates a FunctionAdapter object, lets the adapter wrap the Function object itself, and inserts the adapter object (which implements FunctionNode) into the computational graph.

Currently, only elementwise addition and multiplication (+ and *) and F.identity (which exists just for testing purpose) supports differentiable (and economical) backprop. We are planning to widen the set of functions with differentiable backprop support in the up-coming releases.

Note that this change breaks the object structure of the computational graph; now FunctionNode objects act as function nodes in the computational graph, and Function is just an object referenced by a FunctionAdapter object (which implements FunctionNode).

New features

When using Trainer, any exceptions raised during training are now immediately shown before entering the finalization procedures. It helps users to know the cause of the error before waiting for the finalization which sometimes hangs up (esp. when using multiprocessing) (#2216)

Support a mask pattern shared among examples within each batch in F.simplified_dropconnect (#2534, thanks @fukatani!)

L.Classifier is extended so that users can feed multiple input features. An argument that should be treated as the ground truth labels is specified by label_key option. Keyword arguments are also supported. (#2834)

Others

Downloads

This is the second major version. See the list for the complete list of solved issues and merged PRs (the list only shows the difference from v2.0.0b1; see the Release Note section below for the difference from v1.24.0).

Announcements

CuPy has been separated from Chainer into an independent package: CuPy.

It means you need to install CuPy if you want to enable GPU for Chainer.

As is explained in the Contribution Guide, we have changed the development and release cycle. The main development will be continued on the master branch, which will correspond to the next pre-releases of v3 (including alpha, beta, and RC). The maintenance of v2 will be done at v2 branch.

If you want to send a pull request, please send it to the master branch unless you have a special reason.

Release Notes

It should be noted that these release notes contain only the differences from v2.0.0b1. See the release notes of v2.0.0a1 and v2.0.0b1 to confirm the full set of changes from v1.

Tests

Downloads

This is a minor release. See the list for the complete list of solved issues and merged PRs.

Announcements

This is the final regular release of Chainer v1.x. No further changes will be made to Chainer v1 except for critical bug fixes.

We will soon merge the current _v2 branch into master. It is predicted that many PRs targeted to the current master will be made obsolete (i.e., they will conflict with the v2 source tree).

We have decided to postpone the release of v2.0.0 to May 30. We will work hard to finish the planned changes and documentation stuffs, so wait for the release date!

We have to apologize that we cannot fulfill for v2 the compatibility-breaking steps that we declared in our compatibility policy. In particular, many APIs that will be partially changed in v2 do not emit any warnings in v1.24.0.

Instead, we are preparing an upgrade guide that lists up which part of the existing user codes should be updated to be compatible with v2.0.0. We believe that this upgrade guide is helpful for all users to properly update their codes.

New features

Summary

MultiprocessParallelUpdater is added. It is an updater for Trainer that accumulates the gradients computed by multiple processes using multiprocessing and NCCL.

reduce option is added to loss functions. By passing reduce=’no’, we can let the loss function not aggregate the loss values across data in the mini-batch.

Many differentiable functions and links are added. In particular, depthwise convolution and spatial transformer networks are supported.