above we are just working with a, which is usual numpy.ndarray as we would
work in classical NumPy case. However since a is view of a ZBigArray (A),
even though we changed data directly on a -- numpy.ndarray object without explicitly
notifying A, the system knows it and keeps track of the data and changes.
Let's save the data to database:

In [14]: transaction.commit()

and finish current process:

In [15]: dbclose(root)
In [16]: exit
(test) $

After new process is started again we can see the data is there:

$ ipython# imports as [1-4] in previous session
In [1]: from wendelin.bigarray.array_zodb import ZBigArray
In [2]: from wendelin.lib.zodb import dbopen, dbclose
In [3]: import transaction
In [4]: import numpy as np
In [5]: root = dbopen('test.fs')
In [6]: A = root['A']
In [7]: A
Out[7]: <wendelin.bigarray.array_zodb.ZBigArray at 0x7f313416b150>
In [8]: a = A[:]# notice the data is the same as we set it above
In [9]: a
Out[9]: array([ 0, 1, 2, 3, 4, 5, 6, 22, 8, 9])
In [10]: type(a)
Out[10]: numpy.ndarray

It is real numpy.ndarray for all the code

As was shown above, we can create numpy.ndarray views to ZBigArray objects.
Thus, since views are real ndarrays, we can invoke any real code which accepts
ndarray as input. Let's for example compute the mean:

In [11]: np.mean(a)
Out[11]: 6.0

Notice, above we call C-function mean() implemented in NumPy library. It does
not know it works on a ZBigArray data - all it sees is regular
numpy.ndarray object with memory.

Let's also see how we can use regular Cython code, which expects only ndarrays,
to work on ZBigArray data:

Doing enough such iterations we can eventually get to array whose size is
bigger than local RAM.

Notice: contrary to NumPy, where numpy.append() works by copying data to newly
allocated array (thus working in O(len(array) + δ) time), ZBigArray objects
can be appended in-place without copying, thus ZBigArray.append() works
in O(δ) time.

After we have array bigger than RAM, we can still call existing functions on it
to do computations [1]. In particular numpy.mean() and counteven()
continue to work:

In [20]: a = A[:]
In [21]: np.mean(a)
Out[21]: # works -> some value depending on size of A
In [22]: counteven(a)
Out[22]: # works -> some value depending on size of A

When, for example np.mean(a) runs,
the system will be loading array parts from database and reclaiming
least-recently-accessed memory pages for new accesses when the amount of loaded data
is close to amount of local RAM. All this happens transparent to client code,
so it thinks it just accesses plain memory.

Arrays bigger than disk

In the previous section we already saw how to work with arrays bigger than local
RAM by using ZODB database located on local disk. Going beyond local disk is
possible by using distributed ZODB storage which shards data in a cluster of
nodes.

This can be achieved with NEO ZODB storage. Required changes on client side is
to adjust dbopen() call as follows:

In [5]: root = dbopen('neo://dbname@master')

And install NEO client library (via pip install neoppod[client]).

The system will be then reading and writing data from/to networked distributed
database, which scales in size the more storage nodes we add to its cluster.

Full cluster setup is documented in NEO readme.
One can setup a simple NEO cluster to play with neosimple command (after pip install neoppod).

Summary

In this tutorial we briefly overviewed how to use wendelin.core library for
working with NumPy compatible arrays, which

can be transparently persisted,

you can work with them using all the usual libraries including C/Fortran/Cython code [1], and