Prerequisites

Introduction

In the previous tutorial, we went through the steps to implementing an operator using C++ in the MXNet backend. In this tutorial, we will cover how sparse operators are implemented in the backend. Specifically, we will practice adding CSRNDArray support to the forward function of the quadratic operator.

Implementation

A Sparse Operator Example

Let's consider the quadratic function f(x) = ax^2+bx+c when x is a CSRNDArray. Notice that if the input x is sparse and c is 0.0, the output is also sparse. If c is non-zero, the output is dense. In MXNet frontend, the operator works like this:

Note that the statement z = mx.nd.quadratic(x, a=1, b=2, c=3) generates a warning message, saying that a dense operator is used when the sparse operator doesn't support the above case. If you are not familiar with the storage fallback mechanism, please revisit the tutorials for CSRNDArray and RowSparseNDArray.

In this tutorial, we will implement the forward function of the sparse quadratic operator. The storage type of the output depends on the inputs:

quadratic('csr', a, b, 0.0) outputs 'csr'

otherwise, outputs 'default'

To implement this, we first register the storage type inference property of the operator, from which the operator infers the output storage type based on operator arguments and inputs types. Then we implement the forward function for the case where c is 0.0 and x is a CSRNDArray.

The next steps will go into detail on how to create a sparse operator in C++:

Understand the FComputeEx and relevant NDArray interfaces in backend.

Define storage type inference functions in quadratic_op-inl.h.

Define the forward function in quadratic_op-inl.h.

Register the sparse operator using nnvm in quadraticop.cc and quadraticop.cu for CPU and GPU computing, respectively.

Write a unit test for the sparse operator implemeneted.

Now let's walk through the process step by step.

The FComputeEx and Relevant NDArray Interfaces in Backend

Before we dive into the details of relevant interfaces, here are two differences between dense and sparse operators: - Dense operators only handle dense inputs and outputs. Sparse operators support various combinations of storage types. - Memories of inputs and outputs are pre-allocated based their shapes for dense operators. However, with sparse representations, memories for sparse inputs and outputs depend on the number of non-zero elements they have, which is only known at runtime.

With these differences in mind, let's review the FCompute interface introduced in the previous dense operator tutorial:

Notice the FCompute interface includes TBlobs, which don't include data structures that could be used to query storage types of inputs, nor manipulate auxiliary arrays like indices and indptr. Therefore, instead of the FCompute interface, sparse operators are registered with the following FComputeEx interface:

Note that the vectors of TBlobs are replaced with vectors of NDArrays. Now, let's go through a few important methods in the NDArray class.

In the python frontend, there are three types of NDArrays, namely mx.nd.NDArray, mx.nd.sparse.RowSparseNDArray and mx.nd.sparse.CSRNDArray. In the C++ backend, however, all of them are represented by the mxnet::NDArray class. The storage_type() method indicates the storage type of the NDArray:

On the other hand, from python one could inspect the auxiliary array of a sparse ndarray via RowSparseNDArray.indices, CSRNDArray.indices and CSRNDArray.indptr, and the actual data array via RowSparseNDArray.data and CSRNDArray.data.

In the backend, auxliary arrays such as indices and indptr are retrieved by the aux_data method, while the actual data array is retrived by the data method.

Storage Type Inference

Storage type inference is the process of deducing storage types of NDArrays in neural networks from operator arguments, and deciding whether to dispatch to the FCompute or FComputeEx interface. Let's take a look at the following example. Given an input CSRNDArray called x, you invoke the quadratic operator like this: output = mx.nd.sparse.quadratic(x, a=1, b=2, c=0). Before calculating the output values, MXNet infers the storage type of output to be default(dense), and dispatch to FComputeEx operator implementation following the the storage type inference rules you defined.

For our quadratic operator, the storage type inference function is the following. Let's go through it line by line.

Lines 4-5: in_attrs is a vector containing all input storage types. out_attrs is a vector containing all output storage types.

Lines 6-7: We check the number of inputs and that of outputs. Both should be equal to 1.

Line 8: We get QuadraticParam from attrs. It contains the argument c, whose value is used later to decide if the output is sparse.

Lines 9-10: The storage type of the input is stored in the local varible in_stype. The reference to output storage type is stored in the local varible out_stype.

Line 11: The initialize the return value dispatched to false.

Lines 12-15: If the input is dense, try to assign dense storage to the output storage type and assign kFCompute to dispatch_mode. The function storage_type_assign() first attempts to assign kDefaultStorageType to out_stype. If the assignment to out_stype is successful (i.e. out_stype was either not defined, or was already assigned with kDefaultStorageType previously), storage_type_assign() assigns dispatch_mode to kFCompute and returns true; If the assignment to out_stype is not successful, dispatch_mode keeps its old value and false is returned.

Lines 16-19: If dispatch_mode is not defined, the input storage type is "csr" and c is 0.0, try to assign csr storage to the output storage type and assign kFComputeEx to dispatch_mode.

Line 20-22: If dispatch_mode is still not defined, infer dense storage for the output and dispatch to storage fallback mode. The dispatch_fallback() functions first attempts to assign kDefaultStorage to all out_attrs. If the assignment is successful, return true; otherwise, return false.

Line 23: return the value of dispatched. If dispatched is false, an exception will be thrown by MXNet.

Line 1-6: inputs is a vector of input NDArrays (only one input tensor for the quadratic operator). outputs is a vector of output NDArrays (only one for the quadratic operator). xpu, attrs, ctx and req each holds the same thing introduced in the dense operator tutorial.

Lines 7-9: Verify that the size of each vector is expected. Otherwise, stop moving forward and print error message.

Line 10: Get operator parameters, the input storage type and the output storage type respectively.

Lines 13-18: If both the input storage type and the output storage type are "csr" and c is 0.0, invoke the "csr" implementation. Otherwise, an exception will be thrown with detailed information about the unimplemented operator arguments.

Lines 20-25: Function definition for the "csr" implementation of the quadratic operator.

Lines 26-28: Declare a few namespaces used in the current function scope. Note that the csr::kIdx is for the access to the indices array of all auxiliary arrays, while csr::kIndPtr is for the access to the indptr array.

Line 29-30: Check the provided req of the operator. If req is kNullOp, no work is required. Since the output of this operator is a "csr" NDArray, whose memory has to be allocated at runtime, only kWriteTo is allowed. Both kAddTo and kWriteInplace usually are not supported when the output is sparse.

Line 31: Get the stream of the context for serializing asynchronous executions.

Lines 32-35: Before we access the data, indices and indptr arrays to compute the result, we first check if these arrays are empty. If so, we set the output to be zeros. The storage_initialized() method returns true if a sparse NDArray contains at least one element in its data and indices array; it returns false otherwise.

Line 36: Get the number of elements stored in the input and store it in variable nnz. The storage_shape() method returns the shape of the data array of a sparse NDArray.

Line 37: Get the number of rows of the output and store it in variable num_rows.

Line 38: Allocate memory for the data array and auxiliary arrays. For a CSRNDArray of shape (M, N) storing K elements, it has a data array of length K, an indices array of length K and an indptr array of length (M + 1). The CheckAndAlloc method takes the shape of auxiliary arrays as the input, and allocates the memory for the data array and auxiliary arrays. It is not necessary to provide the shape of the data array, as it can be inferred from shapes of auxilary arrays.

Line 39-54: This is the place where the values of output data array and auxiliary arrays are computed. The macros MSHADOW_TYPE_SWITCH and MXNET_ASSIGN_REQ_SWITCH enable the code block to work for all the supported data types and req types in MXNet. For this operator, since the transformation only happens on the data array, we simply invoke the quadratic operator kernel quadratic_forward via Kernel::Launch. For the indices and indptr arrays, we just copy the values from the inputs. This way, a complete output CSRNDArray is computed.

Operator Registration

Finally let's extend the operator registration logic to expose sparse.quadratic to frontend. Below is the extended registration code in quadratic_op.cc:

In this test, we are testing the result of the sparse.quadratic operator on two cases:

CSRNDArray input with c = 0.0, which outputs a CSRNDArray

CSRNDArray input with c = 1.0, which outputs a NDArray

Backward Function

So far, only the forward operator supports sparse inputs. To add sparse support to the backward operator, you also need to register these two attributes to _backward_quadratic:

FComputeEx for sparse operator implementation

FInferStorage for storage type inference in the backward computation.

Due to length constraint, this is left as an exercise for readers.

Summary

In this tutorial, we practiced adding sparse support to the operator quadratic in MXNet backend and unit testing the implementation in frontend. More specifically, we went through a few important interfaces, added the storage type inference function, implemented the forward function, and registered the sparse operator using nnvm. Congratulations! You now know how to add sparse operators. We welcome your contributions to MXNet.