Algorithmic Differentiation (AD) is a novel way to differentiate computer code, i.e. to compute $\frac{\partial F}{\partial x_1}, \frac{\partial F}{\partial x_2},\ldots$ where $F$ is given by a computer program, e.g.

The derivatives are mathematically exact and computed to machine precision. Traditionally, derivative estimates would be obtained by finite differences (also called "bump and revalue"), which is slow and can be inaccurate. In contrast, AD (and adjoint AD in particular - AAD) gives precise derivative information at a fraction of the cost.

Examples

Here are some examples of what adjoint AD has enabled in various domains.

Engineering

Figure 1: Sensitivity of drag coefficient to the surface geometry of a car at high speed (left) and low speed (right)

Figure 1 shows sensitivities of the drag coefficient to each point on a car's surface when it moves at high speed (left) and low speed (right). The simulation was performed with AD-enabled OpenFOAM built on top of the AD tool dco (See below for information about dco). The surface mesh had 5.5 million cells and the gradient vector was 18GB. (1)

The normal simulation on a top-end desktop took 44s, while the AD-enabled simulation took 273s. To obtain the same gradient information, on the same machine, by finite differences would take roughly 5 years.

The gradient information can now be used to optimize the shape of the car so as to reduce the drag.

Figure 2 shows the sensitivity of the amount of water flowing through the Drake passage to changes in the topography of the ocean floor. The simulation was performed with the AD-enabled MIT Global Circulation Model (MITgcm) run on a supercomputer. The ocean was meshed with 64,800 grid points. (2)

Obtaining the gradient through finite differences took a month and a half. The adjoint AD code obtained the gradient in less than 10 minutes.

The gradient information can be used to further refine climate prediction models and our understanding of global weather, for example the high sensitivity over the South Pacific Ridge and Indonesian Throughflows even though these are far away from the Drake Passage.

AD and AAD is used in finance to get sensitivities of complex instruments quickly, enabling real time risk management and hedging of quantities like xVA.

Here we show some results from a paper which studied two typical codes arising in finance: Monte Carlo and PDEs. (3)

Monte Carlo

$n$

$f$

cfd

AD

$\frac{\tt AD}{f}$

$\frac{\tt cfd}{\tt AD}$

34

0.5s

29.0s

3.0s (2.5MB)

6.0x

9.7x

62

0.7s

80.9s

5.1s (3.2MB)

7.3x

15.9x

142

1.5s

423.5s

12.4s (5.1MB)

8.3x

34.2x

222

2.3s

1010.7s

24.4s (7.1MB)

10.6x

41.4x

PDE

34

0.6s

37.7s

11.6s (535MB)

19.3x

3.3x

62

1.0s

119.5s

18.7s (919MB)

18.7x

6.4x

142

2.6s

741.2s

39s (2GB)

15.0x

19x

222

4.1s

1857.3s

60s (3GB)

14.6x

31x

Table 1: Run times and memory requirements as a function of gradient size $n$ for Monte Carlo and PDE applications.

Table 1 shows the runtimes of a first-order adjoint code using dco vs. central finite differences on a typical finance application (option pricing under local volatility model, 10K sample paths/spatial points and 360 time steps). The second column $f$ is the normal runtime of the application, ${\tt cfd}$ is the runtime for central finite differences and ${\tt AD}$ is adjoint AD runtime along with additional memory required (tape size). Calculations were run on a laptop so only the relative runtimes $\frac{\tt AD}{f}$ and $\frac{\tt cfd}{\tt AD}$ are important, the latter showing the speedup of AD over finite differences.

In finance such derivative information is often used for hedging and risk calculations, so these gradients must be computed many times per day.

Algorithmic Differentiation vs. Finite Differences

AD computes mathematical derivatives of a computer program. Conceptually the process is straightforward:

Computers can only add, subtract, multiply and divide floating point numbers

Any computer program (operating on floating point numbers) simply consists of many of these fundamental operations strung together, along with calls to special functions (e.g. sin, cos, exp, etc.)

It is simple to compute the derivatives of all these basic operations

We can use the chain rule to combine these basic derivatives to get the derivative of our computer program

This process of differentiating a code can be applied recursively to generate 2nd, 3rd, 4th or higher derivatives of any computer program. Since the code is differentiated symbolically (albeit at a very low level), the derivatives are mathematically exact and are computed to machine accuracy.

What does NAG Provide?

Although conceptually simple, AD (and ajoint/ AAD in particular) is demanding to apply to a non-trivial code. When AD is used in production, high-performance, robust and flexible tools are essential to get the code running quickly and to keep it maintainable. In addition it is clear that external library dependencies break AD, since AD operates at source code level. A production AD tool must have a way to handle these dependencies, ideally by being applied to the underlying (3rd party) source to create a full AD solution.

AD Services

NAG in collaboration with RWTH Aachen provides consulting services, training, software and support to develop AD solutions for clients. NAG has delivered AD training to several large financial institutions to give them a solid grounding in the mathematical, computer science and computational aspects of AD, common pitfalls and techniques for working around them, as well as various tricks and optimizations. NAG has also helped to embed AD tools into large Tier 1 quantitative finance libraries and has helped overcome particular memory constraints in large adjoint codes.

AD Tools

NAG provides a best-in-class C++/Fortran operator-overloading AD tool called dco (derivative computation through overloading). This has been used on production codes numbering in the millions of lines and running on platforms ranging from single threaded desktops to large supercomputers.

The dco tool is developed jointly by NAG and RWTH Aachen, integrates easily with your code base and is extremely flexible

The flexibility is crucial since it allows the memory use of adjoint codes to be constrained almost arbitrarily

For simulation programs written in Fortran a version of the NAG Fortran compiler has been extended to serve as a pre-processor to dco. The seamless integration of AD into a complex build system is facilitated. Hence, the amount of modifications to be made by the user in the original source code can be minimized or even be eliminated entirely.

AD Versions of Computational Algorithms

NAG provides AD versions of computational algorithms such as those found in the NAG Library. These AD routines are compatible with dco to provide maximal productivity, but versions can be provided that are compatible with any AD tool including hand-written AD code.

How to try AD

The principles of AD in the context of computational finance and techniques for avoiding memory problems in adjoint codes is discussed in NAG Technical Report TR3/14 (3). Readers of the technical report can request the example programs it is based on. The example programs include a copy of dco and a trial licence so that readers can run the examples on their own machines, and apply dco to their own codes.

For more communication

About Us

The Numerical Algorithms Group (NAG) delivers trusted, high quality numerical computing software and high performance computing (HPC) services. For four decades NAG experts have worked closely with world-leading researchers in academia and industry to create powerful, accurate and flexible software which today is relied upon by tens of thousands of users, companies, and learning institutions as well as numerous independent software vendors. ...more