framework::OperatorWithKernel: Inherited from OperatorBase, describing an operator with computation kernels.

Operators can be categorized into two groups: operator with kernel(s) and operator without kernel(s). An operator with kernel(s) inherits from OperatorWithKernel while the one without kernel(s) inherits from OperatorBase. This tutorial focuses on implementing operators with kernels. In short, an operator includes the following information:

Information

Where is it defined

OpProtoMake definition

`.cc`files, Backward Op does not need an OpProtoMake interface.

Op definition

`.cc` files

Kernel implementation

The kernel methods shared between CPU and CUDA are defined in `.h` files. CPU-specific kernels live in `.cc` files, while CUDA-specific kernels are implemented in `.cu`files.

New Operator implementations are added to the list paddle/operators, with file names in the format *_op.h (if applicable), *_op.cc, *_op.cu (if applicable). The system will use the naming scheme to automatically build operators and their corresponding Python extensions.

Let's take matrix multiplication operator, MulOp, as an example to introduce the writing of an Operator with Kernel.

Implementing C++ Types

Defining ProtoMaker

Matrix Multiplication can be written as $Out = X * Y$, meaning that the operation consists of two inputs and pne output.

Note AddAttr<AttrType>("scale", "...").SetDefault(1.0); adds scaleconstant as an attribute, and sets the default value to 1.0.

Defining Operator

The following code defines the interface for MulOp:

classMulOp:publicframework::OperatorWithKernel{public:usingframework::OperatorWithKernel::OperatorWithKernel;protected:voidInferShape(constframework::InferShapeContext&ctx)constoverride{autodim0=ctx.Input<Tensor>("X")->dims();autodim1=ctx.Input<Tensor>("Y")->dims();PADDLE_ENFORCE_EQ(dim0.size(),2,"input X(%s) should be a tensor with 2 dims, a matrix",ctx.op_.Input("X"));PADDLE_ENFORCE_EQ(dim1.size(),2,"input Y(%s) should be a tensor with 2 dims, a matrix",ctx.op_.Input("Y"));PADDLE_ENFORCE_EQ(dim0[1],dim1[0],"First matrix's width must be equal with second matrix's height.");ctx.Output<Tensor>("Out")->Resize({dim0[0],dim1[1]});}};

InferShape interface needs to be re-written.InferShape is a constant method and cannot modify Op's member variables, its constant member const framework::InferShapeContext &ctx can be used to extract input, output, and attributes. It functions to

1). validate and error out early: it checks input data dimensions and types.

2). configures the tensor shape in the output.

Usually OpProtoMaker and Op's type definitions are written in .cc files, which also include the registration methods introduced later.

Defining OpKernel

MulKernel inherits framework::OpKernel, which includes the following templates:

typename DeviceContext denotes device context type. When different devices, namely the CPUDeviceContext and the CUDADeviceContext, share the same kernel, this template needs to be added. If they don't share kernels, this must not be added. An example of a non-sharing kernel is OnehotCrossEntropyOpKernel.

To ease the writing of OpKernel compute, and for reusing code cross-device, Eigen-unsupported Tensor module is used to implement Compute interface. To learn about how the Eigen library is used in PaddlePaddle, please see usage document.

This concludes the forward implementation of an operator. Next its operation and kernel need to be registered in a .cc file.

The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. Note that a backward operator does not include a ProtoMaker.

Registering Operator and OpKernel

In .cc files, register forward and backward operator classes and the CPU kernel.

Note that if CUDA Kernel is implemented using the Eigen unsupported module, then on top of .cu, a macro definition #define EIGEN_USE_GPU is needed, such as

// if use Eigen unsupported module before include head files#define EIGEN_USE_GPUnamespaceops=paddle::operators;REGISTER_OP_CUDA_KERNEL(mul,ops::MulKernel<paddle::platform::CUDADeviceContext,float>);REGISTER_OP_CUDA_KERNEL(mul_grad,ops::MulGradKernel<paddle::platform::CUDADeviceContext,float>);

Compilation

Run the following commands to compile.

# maybe you need to rerun cmake
make mul_op

Python Binding

The system will automatically bind to Python and link it to a generated library.

Unit Tests

Unit tests for an operator include

comparing a forward operator's implementations on different devices,

comparing a backward operator's implementation on different devices, and