NumbaPro is an enhanced version of Numba which adds premium features and
functionality that allow developers to rapidly create optimized code that
integrates well with NumPy.

With NumbaPro, Python developers can define NumPy ufuncs and generalized ufuncs (gufuncs)
in Python, which are compiled to machine code dynamically and loaded on the fly.
Additionally, NumbaPro offers developers the ability to target multicore and
GPU architectures with Python code for both ufuncs and general-purpose code.

For targeting the GPU, NumbaPro can either do the work automatically, doing
its best to optimize the code for the GPU architecture. Alternatively,
CUDA-based API is provided for writing CUDA code specifically in Python for
ultimate control of the hardware (with thread and block identities).

Let’s start with a simple function to add together all the pairwise values in two NumPy arrays.
Asking NumbaPro to compile this Python function to vectorized machine code for execution
on the CPU is as simple as adding a single line of code (invoked via a decorator on the
function):

Similarly, one can instead target the GPU for execution of the same Python function by
modifying a single line in the above example:

@vectorize(['float32(float32, float32)'],target='gpu')

Targeting the GPU for execution introduces the potential for numerous GPU-specific
optimizations so as a starting point for more complex scenarios, one can also target
the GPU with NumbaPro via its Just-In-Time (JIT) compiler: