Pair where Gradient is gradient for this layer, INDArray is epsilon (activation gradient)
needed by next layer, but before element-wise multiply by sigmaPrime(z). So for standard feed-forward layer, if this layer is
L, then return.getSecond() == dL/dIn = (w^(L)*(delta^(L))^T)^T. Note that the returned array should be placed in the
ArrayType.ACTIVATION_GRAD workspace via the workspace manager