The gradF' function calculates both the result and the Jacobian of a nonscalar-to-nonscalar function, using m invocations of reverse AD,
where m is the output dimensionality. Applying fmap snd to the result will recover the result of gradF

gradWithF g f' calculates both the result and the Jacobian of a nonscalar-to-nonscalar function f, using m invocations of reverse AD,
where m is the output dimensionality. Applying fmap snd to the result will recover the result of gradWithF

Instead of returning the Jacobian matrix, the elements of the matrix are combined with the input using the g.

hessianProduct' f wv computes both the gradient of a non-scalar-to-scalar f at w = fst$ wv and the product of the hessian H at w with a vector v = snd $ wv using "Pearlmutter's method". The outputs are returned wrapped in the same functor.

H v = (d/dr) grad_w (w + r v) | r = 0

Or in other words, we take the directional derivative of the gradient.

AD serves as a common wrapper for different Mode instances, exposing a traditional
numerical tower. Universal quantification is used to limit the actions in user code to
machinery that will return the same answers under all AD modes, allowing us to use modes
interchangeably as both the type level "brand" and dictionary, providing a common API.