Numba automatically uses multiple dispatch on compiled functions to allow different specialized implementations of the same function to be used. Suppose we have a function that clamps values to zero if they are below a particular magnitude:

In [ ]:

@jit(nopython=True)defzero_clamp(x,threshold):# assume 1D array. See later in this notebook for more general functionout=np.empty_like(x)foriinrange(out.shape[0]):ifnp.abs(x[i])>threshold:out[i]=x[i]else:out[i]=0returnout

In [ ]:

a_small=np.linspace(0,1,50)zero_clamp(a_small,0.3)

Now let's benchmark some different kinds of array inputs. We'll try:

int64

float32

float32 with a stride (elements not contiguous in memory)

In [ ]:

n=10000a_int16=np.arange(n).astype(np.int16)a_float32=np.linspace(0,1,n,dtype=np.float32)a_float32_strided=np.linspace(0,1,2*n,dtype=np.float32)[::2]# view of every other element

We see different performance characteristics for each of these cases, even though they have the same number of input elements. Numba generated different machine code for each situation, which we can see if we look at the .signatures attribute of the compiled function:

In [ ]:

zero_clamp.signatures

When printed as strings, Numba array types have the form: array(dtype, dimensions, layout). The first signature therefore corresponds to a 1D array of float64 with C style layout (row-major order, no gaps between elements). The next two signatures are similar, but for int16 and float32 arrays. The final signature indicates an "any" layout array, which usually happens when you slice an array, and it no longer has a C or FORTRAN memory layout.

We can compare to a pure NumPy implementation and see the speed improvement that Numba has achieved through a combination of specialization and elimination of temporary arrays:

Universal functions, typically called "ufuncs" for short, are functions that broadcast an elementwise operation across input arrays of varying numbers of dimensions. Most NumPy functions are ufuncs, and Numba makes it easy to compile custom ufuncs using the @vectorize decorator.

Note that for this simple ufunc, Numba is not as fast as the function with the manual looping, and in some cases, is the same speed as the example that called NumPy directly. This is not surprising as this function is very simple, and NumPy also uses compiled ufuncs. Numba @vectorize is generally most effective when creating ufuncs that are not a simple combination of existing NumPy operations.

Numba supports many, but not all, NumPy functions. Some functions also have limitations that prevent the use of some of the optional arguments in nopython mode. A full description can be found in the Supported NumPy Features page in the Numba Reference Manual.

Note that when using NumPy functions on arrays, Numba will also compile and optimize array expressions:

In [ ]:

defnumpy_mpe(x,true):return(((x-true)/true)**2).mean()numba_mpe=jit(nopython=True)(numpy_mpe)# using jit as a function rather than a decorator

If the scipy package is installed, Numba will also automatically make use of the optimized BLAS/LAPACK implementation that SciPy was compiled with. In the case of Anaconda, this is Intel MKL, but OpenBLAS is also common for builds of scipy. (Note that Numba is not itself compiled and linked against any BLAS implementation.) Most functions in numpy.linalg will be accelerated this way, as well as numpy.dot.

Numba will not run any faster than NumPy for individual linear algebra routines (since both translate to calls to the same underlying library), but you are able to use linear algebra calls inside your Numba-compiled functions without any loss of performance.

This website does not host notebooks, it only renders notebooks
available on other websites.