The physical segments defined by the VSegd are folded individually,
and these results are replicated according to the virtual segment
id table of the VSegd. The result contains as many elements as there
virtual segments.

This is difficult to parallelise. For each element in the result, the
source array we get this element from depends on the tag values associated
with all previous elements.

However, if we going to apply combine several times with the same flags array,
we can precompute a selector that tells us where to get each element.
The selector contains the original flags, as well as the source index telling
us where to get each element for the result array.

For example:

tagsToIndices2 [F,F,T,F,T,T] -- tags
= [0,1,0,2,1,2] -- indices

This says get the first element from index 0 in the second array,
then from index 1 in the second array,
then index 0 in the first array ...

The first array is the flags array, that says which of the data arrays to
get each successive element from. As combine is difficult to compute
in parallel, if we are going to perform several combines with the same
flags array, we can precompute a selector that tells us where to get each
element. The selector contains the original flags, as well as the source
index telling us where to get each element for the result array.

flags: [F,F,T,T,F,T,F,F,T]
indices: [0,1,0,1,2,2,3,4,3]

Suppose we want to distribute the combine operation across 3 PEs. It's
easy to split the selector like so:

A segment desciptor defines an irregular 2D array based on a flat, 1D array
of elements. The defined array is a nested array of segments, where every
segment covers some of the elements from the flat array.

The starting indices must be equal to init (scanl (+) 0 lengths)

If you don't want to cover all the elements from the flat arrary then
use a SSegd instead.

A VSegd is an extension of a SSegd that allows data from the underlying
flat array to be shared between segments. For example, you can define an array
of 10 virtual segments that all have the same length and elements as a
single physical segment.

Internally we maintain the invariant that all physical segments must be
reachable by some virtual segment. This is needed to ensure that operations
such as fold_ss segmented fold have the right complexity.

If you don't need the invariant then you can sidestep the code that
maintains it by using the redundant versions of the following operators,
and sometimes get faster code.

O(1). Yield the vsegids of a VSegd, but don't require that every physical
segment is referenced by some virtual segment.

If you're just performing indexing and don't need the invariant that all
physical segments are reachable from some virtual segment, then use this
version as it's faster. This sidesteps the code that maintains the invariant.

The stated O(1) complexity assumes that the array has already been fully
evalauted. If this is not the case then we can avoid demanding the result
of a prior computation on the vsegids, thus reducing the cost attributed
to that prior computation.

O(segs).
Yield a SSegd that describes each segment of a VSegd individually.

By doing this we lose information about which virtual segments
correspond to the same physical segments.

WARNING: Trying to take the SSegd of a nested array that has been
constructed with replication can cause index space overflow. This is
because the virtual size of the corresponding flat data can be larger
than physical memory. If this happens then indices fields and
element count in the result will be invalid.

Update the vsegids of VSegd, where the result is guaranteed to
cover all physical segments.

Using this version avoids performing the cull operation which
discards unreachable physical segments.

The resulting vsegids must cover all physical segments.
If they do not then there will be physical segments that are not
reachable from some virtual segment, and subsequent operations
like fold_ss will have the wrong work complexity.