Executive Summary

In recent years Python’s array computing ecosystem has grown organically to support
GPUs, sparse, and distributed arrays.
This is wonderful and a great example of the growth that can occur in decentralized open source development.

However to solidify this growth and apply it across the ecosystem we now need to do some central planning
to move from a pair-wise model where packages need to know about each other
to an ecosystem model where packages can negotiate by developing and adhering to community-standard protocols.

With moderate effort we can define a subset of the Numpy API that works well across all of them,
allowing the ecosystem to more smoothly transition between hardware.
This post describes the opportunities and challenges to accomplish this.

We start by discussing two kinds of libraries:

Libraries that implement the Numpy API

Libraries that consume the Numpy API and build new functionality on top
of it

Libraries that Implement the Numpy API

The Numpy array is one of the foundations of the numeric Python ecosystem,
and serves as the standard model for similar libraries in other languages.
Today it is used to analyze satellite and biomedical imagery, financial models,
genomes, oceans and the atmosphere, super-computer simulations,
and data from thousands of other domains.

However, Numpy was designed several years ago,
and its implementation is no longer optimal for some modern hardware,
particularly multi-core workstations, many-core GPUs, and distributed clusters.

Fortunately other libraries implement the Numpy array API on these other architectures:

So even when the Numpy implementation is no longer ideal,
the Numpy API lives on in successor projects.

Note: the Numpy implementation remains ideal most of the time.
Dense in-memory arrays are still the common case.
This blogpost is about the minority of cases where Numpy is not ideal

So today we can write code similar code between all of
Numpy, GPU, sparse, and parallel arrays:

importnumpyasnpx=np.random.random(...)# Runs on a single CPUy=x.T.dot(np.log(x)+1)z=y-y.mean(axis=0)print(z[:5])importcupyascpx=cp.random.random(...)# Runs on a GPUy=x.T.dot(cp.log(x)+1)z=y-y.mean(axis=0)print(z[:5].get())importdask.arrayasdax=da.random.random(...)# Runs on many CPUsy=x.T.dot(da.log(x)+1)z=y-y.mean(axis=0)print(z[:5].compute())...

Additionally, each of the deep learning frameworks
(TensorFlow, PyTorch, MXNet)
has a Numpy-like thing that is similar-ish to Numpy’s API,
but definitely not trying to be an exact match.

Libraries that consume and extend the Numpy API

At the same time as the development of Numpy APIs for different hardware,
many libraries today build algorithmic functionality on top of the Numpy API:

These projects and more enhance array computing in Python,
building on new features beyond what Numpy itself provides.

There are also projects like Pandas, Scikit-Learn, and SciPy,
that use Numpy’s in-memory internal representation.
We’re going to ignore these libraries for this blogpost
and focus on those libraries that only use the high-level Numpy API
and not the low-level representation.

Opportunities and Challenges

Given the two groups of projects:

New libraries that implement the Numpy API
(CuPy, Sparse, Dask array)

New libraries that consume and extend the Numpy API
(XArray, Autograd/tangent, TensorLy, Einsum)

We want to use them together, applying Autograd to CuPy, TensorLy to Sparse,
and so on, including all future implementations that might follow.
This is challenging.

Unfortunately,
while all of the array implementations APIs are very similar to Numpy’s API,
they use different functions.

>>>numpy.siniscupy.sinFalse

This creates problems for the consumer libraries,
because now they need to switch out which functions they use
depending on which array-like objects they’ve been given.

For example XArray can use either Numpy arrays or Dask arrays.
This has been hugely beneficial to users of that project,
which today seamlessly transition from small in-memory datasets on their laptops
to 100TB datasets on clusters,
all using the same programming model.
However when considering adding sparse or GPU arrays to XArray’s plugin system,
it quickly became clear that this would be expensive today.

Building, maintaining, and extending these plugin mechanisms is costly.
The plugin systems in each project are not alike,
so any new array implementation has to go to each library and build the same code several times.
Similarly, any new algorithmic library must build plugins to every ndarray implementation.
Each library has to explicitly import and understand each other library,
and has to adapt as those libraries change over time.
This coverage is not complete,
and so users lack confidence that their applications are portable between hardware.

Pair-wise plugin mechanisms make sense for a single project,
but are not an efficient choice for the full ecosystem.

Solutions

I see two solutions today:

Build a new library that holds dispatch-able versions of all of the relevant Numpy functions
and convince everyone to use it instead of Numpy internally

Build this dispatch mechanism into Numpy itself

Each has challenges.

Build a new centralized plugin library

We can build a new library,
here called arrayish,
that holds dispatch-able versions of all of the relevant Numpy functions.
We then convince everyone to use it instead of Numpy internally.

So in each array-like library’s codebase we write code like the following:

In all of the algorithm libraries (like XArray, autograd, TensorLy, …)
we use arrayish instead of Numpy

# inside XArray's codebase# import numpyimportarrayishasnumpy

This is the same plugin solution as before,
but now we build a community standard plugin system
that hopefully all of the projects can agree to use.

This reduces the big n by m cost of maintaining several plugin systems,
to a more manageable n plus m cost of using a single plugin system in each library.
This centralized project would also benefit, perhaps,
from being better maintained than any individual project is likely to do on its own.

However this has costs:

Getting many different projects to agree on a new standard is hard

Algorithmic projects will need to start using arrayish internally,
adding new imports like the following:

Dispatch from within Numpy

Alternatively, the central dispatching mechanism could live within Numpy itself.

Numpy functions could learn to hand control over to their arguments,
allowing the array implementations to take over when possible.
This would allow existing Numpy code to work on externally developed array implementations.

There is precedent for this.
The array_ufunc protocol
allows any class that defines the __array_ufunc__ method
to take control of any Numpy ufunc like np.sin or np.exp.
Numpy reductions like np.sum already look for .sum methods on their arguments and defer to them if possible.

Some array projects, like Dask and Sparse, already implement the __array_ufunc__ protocol.
There is also an open PR for CuPy.
Here is an example showing Numpy functions on Dask arrays cleanly.

>>>importnumpyasnp>>>importdask.arrayasda>>>x=da.ones(10,chunks=(5,))# A Dask array>>>np.sum(np.exp(x))# Apply Numpy function to a Dask arraydask.array<sum-aggregate,shape=(),dtype=float64,chunksize=()># get a Dask array

I recommend that all Numpy-API compatible array projects implement the __array_ufunc__ protocol.

This works for many functions, but not all.
Other operations like tensordot, concatenate, and stack
occur frequently in algorithmic code but are not covered here.

This solution avoids the community challenges of the arrayish solution above.
Everyone is accustomed to aligning themselves to Numpy’s decisions,
and relatively little code would need to be rewritten.

The challenge with this approach is that historically
Numpy has moved more slowly than the rest of the ecosystem.
For example the __array_ufunc__ protocol mentioned above
was discussed for several years before it was merged.
Fortunately Numpy has recently
receivedfunding
to help it make changes like this more rapidly.
The full time developers hired under this funding have just started though,
and it’s not clear how much of a priority this work is for them at first.

For what it’s worth I’d prefer to see this Numpy protocol solution take hold.

Final Thoughts

In recent years Python’s array computing ecosystem has grown organically to support
GPUs, sparse, and distributed arrays.
This is wonderful and a great example of the growth that can occur in decentralized open source development.

However to solidify this growth and apply it across the ecosystem we now need to do some central planning
to move from a pair-wise model where packages need to know about each other
to an ecosystem model where packages can negotiate by developing and adhering to community-standard protocols.

The community has done this transition before
(Numeric + Numarray -> Numpy, the Scikit-Learn fit/predict API, etc..)
usually with surprisingly positive results.

The open questions I have today are the following:

How quickly can Numpy adapt to this demand for protocols
while still remaining stable for its existing role as foundation of the ecosystem

What algorithmic domains can be written in a cross-hardware way
that depends only on the high-level Numpy API,
and doesn’t require specialization at the data structure level.
Clearly some domains exist (XArray, automatic differentiation),
but how common are these?

Once a standard protocol is in place,
what other array-like implementations might arise?
In-memory compression? Probabilistic? Symbolic?