CNTKMicrosoft Cognitive Toolkit (CNTK)

CNTK

Chat

Windows build status

Linux build status

The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows users to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code.

News

Project changelog

2018-09-17. CNTK 2.6.0

Efficient group convolution

The implementation of group convolution in CNTK has been updated. The updated implementation moves away from creating a sub-graph for group convolution (using slicing and splicing), and instead uses cuDNN7 and MKL2017 APIs directly. This improves the experience both in terms of performance and model size.

As an example, for a single group convolution op with the following attributes:

Input tensor (C, H, W) = (32, 128, 128)

Number of output channels = 32 (channel multiplier is 1)

Groups = 32 (depth wise convolution)

Kernel size = (5, 5)

The comparison numbers for this single node are as follows:

First Header

GPU exec. time (in millisec., 1000 run avg.)

CPU exec. time (in millisec., 1000 run avg.)

Model Size (in KB, CNTK format)

Old implementation

9.349

41.921

38

New implementation

6.581

9.963

5

Speedup/savings Approx.

30% Approx.

65-75% Approx.

87%

Sequential Convolution

The implementation of sequential convolution in CNTK has been updated. The updated implementation creates a separate sequential convolution layer. Different from regular convolution layer, this operation convolves also on the dynamic axis(sequence), and filter_shape[0] is applied to that axis. The updated implementation supports broader cases, such as where stride > 1 for the sequence axis.

For example, a sequential convolution over a batch of one-channel black-and-white images. The images have the same fixed height of 640, but each with width of variable lengths. The width is then represented by sequential axis. Padding is enabled, and strides for both width and height are 2.

Operators

depth_to_space and space_to_depth

There is a breaking change in the depth_to_space and space_to_depth operators. These have been updated to match ONNX specification, specifically
the permutation for how the depth dimension is placed as blocks in the spatial dimensions, and vice-versa, has been changed. Please refer to the updated doc
examples for these two ops to see the change.

Tan and Atan

Added support for trigonometric ops Tan and Atan.

ELU

Added support for alpha attribute in ELU op.

Convolution

Updated auto padding algorithms of Convolution to produce symmetric padding at best effort on CPU, without affecting the final convolution output values. This update increases the range of cases that could be covered by MKL API and improves the performance, E.g. ResNet50.

Default arguments order

There is a breaking change in the arguments property in CNTK python API. The default behavior has been updated to return arguments in python order instead of in C++ order. This way it will return arguments in the same order as they are fed into ops. If you wish to still get arguments in C++ order, you can simply override the global option. This change should only affect the following ops: Times, TransposeTimes, and Gemm(internal).

Bug fixes

Updated doc for Convolution layer to include group and dilation arguments.

Added improved input validation for group convolution.

Updated LogSoftMax to use more numerically stable implementation.

Fixed Gather op's incorrect gradient value.

Added validation for 'None' node in python clone substitution.

Added validation for padding channel axis in convolution.

Added CNTK native default lotusIR logger to fix the "Attempt to use DefaultLogger" error when loading some ONNX models.

Bug or minor fixes:

Updated LRN op to match ONNX 1.2 spec where the size attribute has the semantics of diameter, not radius. Added validation if LRN kernel size is larger than channel size.

Updated Min/Max import implementation to handle variadic inputs.

Fixed possible file corruption when resaving on top of existing ONNX model file.

.Net Support

The Cntk.Core.Managed library has officially been converted to .Net Standard and supports .Net Core and .Net Framework applications on both Windows and Linux. Starting from this release, .Net developers should be able to restore CNTK Nuget packages using new .Net SDK style project file with package management format set to PackageReference.

Accelerates some common tensor ops in Intel CPU inference for float32, especially for fully connected networks

Can be turned on/off by cntk.cntk_py.enable_cpueval_optimization()/cntk.cntk_py.disable_cpueval_optimization()

1BitSGD incorporated into CNTK

1BitSGD source code is now available with CNTK license (MIT license) under Source/1BitSGD/

1bitsgd build target was merged into existing gpu target

New loss function: hierarchical softmax

Thanks @yaochengji for the contribution!

Distributed Training with Multiple Learners

Trainer now accepts multiple parameter learners for distributed training. With this change, different parameters of a network can be learned by different learners in a single training session. This also facilitates distributed training for GANs. For more information, please refer to the Basic_GAN_Distributed.py and the cntk.learners.distributed_multi_learner_test.py

Alternatively, you can also click corresponding build badge to land to nightly build page.

2018-01-31. CNTK 2.4

Highlights:

Moved to CUDA9, cuDNN 7 and Visual Studio 2017.

Removed Python 3.4 support.

Added Volta GPU and FP16 support.

Better ONNX support.

CPU perf improvement.

More OPs.

OPs

top_k operation: in the forward pass it computes the top (largest) k values and corresponding indices along the specified axis. In the backward pass the gradient is scattered to the top k elements (an element not in the top k gets a zero gradient).

zeros_like and ones_like operations. In many situations you can just rely on CNTK correctly broadcasting a simple 0 or 1 but sometimes you need the actual tensor.

depth_to_space: Rearranges elements in the input tensor from the depth dimension into spatial blocks. Typical use of this operation is for implementing sub-pixel convolution for some image super-resolution models.

space_to_depth: Rearranges elements in the input tensor from the spatial dimensions to the depth dimension. It is largely the inverse of DepthToSpace.

Fixed bug in group convolution. Output of CNTK Convolution op will change for groups > 1. More optimized implementation of group convolution is expected in the next release.

Better error reporting for group convolution in Convolution layer.

Halide Binary Convolution

The CNTK build can now use optional Halide libraries to build Cntk.BinaryConvolution.so/dll library that can be used with the netopt module. The library contains optimized binary convolution operators that perform better than the python based binarized convolution operators. To enable Halide in the build, please download Halide release and set HALIDE_PATH environment varibale before starting a build. In Linux, you can use ./configure --with-halide[=directory] to enable it. For more information on how to use this feature, please refer to How_to_use_network_optimization.

To setup build and runtime environment on Linux using docker, please build Unbuntu 16.04 docker image using Dockerfiles here. For other Linux systems, please refer to the Dockerfiles to setup dependent libraries for CNTK.