I often need to save a series of arrays in which one dimension varies in
length---sometimes called a ragged array [1]. For example, I'm running
particle tracking experiments, and I need to save the 2D coordinates of all
particles in each video frame. The number of particles in each frame will vary
due to movement across the edges of the frame and velocity components normal to
the focal plane; so, I can't simply save a (dense) 3D array. Instead, I just
store this data in a Python list of N-by-2 numpy arrays, where N is the
number of particles in a frame and varies for each array.

The question is: How do you save this list of arrays? In my first attempt,
I saved each array individually (as separate keys in an .npz file); this
approach gave slow save/load times and larger file sizes. A better approach is
to stack all the ragged arrays along the dimension that varies in
length---i.e. the ragged dimension. Then, I use numpy's .npz file to save
the array data.

Stacking and splitting arrays

numpy provides a number of functions to stack arrays: concatenate, hstack,
vstack, and dstack. The main difference here is that we want to save the
starting indices of the sub-arrays so that we can slice them back out later:

which returns our original list of arrays. (Note: the loop is just for prettier
printing.)

Saving and loading

So stacking turns our list of arrays into a single array, which we can easily
save using numpy's save (single array) or savez (dict of arrays)
functions. If we want to get back our original arrays, however, we also need to
save the start indices:

Alternatively, the save function could store the stacking-axis in the .npz file
so that you don't have to specify it in the load function. Another improvement
would be to guess the stacking axis in stack_ragged by checking which axis
varies in size (this would fail, however, for constant N). And finally, you
can use savez_compressed instead of savez to reduce storage.

P.S. After implementing this approach, I learned that NetCDF files support
ragged arrays out of the box (using VLEN types)---it's not the first time
I've reinvented the wheel; it won't be the last.