The latest commit to https://github.com/burgerdev/gsoc2014 makes the
lazy connected components operator work with arbitrary input data (as
regarding dimensionality). Initially I thought that it would be a pain
to change all the stuff I had carefully crafted to work with just 3d
data, but it turned out not to be that bad. Most of the internals handle
a thingy I called ‘ChunkIndex’ to access data, save states and so on.
This index triple is used as a key to arrays and dictionaries, which
made it easy to just switch to a quintuple. The only thing that really
needed changing was the logic behind ‘generateNeighbours’ - time and
channel neighbours are simply ignored.

With this done, there is not much keeping me from integrating the whole
thing into ilastik (lazyflow, in particular). We did not decide yet when
and how lazy connected components should be used in the software. It
could be set as the default, but we would have to accept performance
losses with small datasets and non-sparse objects. Or it could be
optional, depending on what the user wants. Which would imply heavy GUI
work, because the labeling operator is used almost everywhere I look,
and it is not certain that ‘the user’ actually knows what he wants. The
sanest way would probably be to decide automatically (i.e. hard-coded)
depending on the input data.

Once the operator is in ilastik, the last thing seperating us from
having a truely lazy thresholding applet is the applet itself. If I
remember correctly, quite a few internal design decisions rely on
labeling being a global operation, e.g.

The most recent commits in master are finally thread-safe. At least I
hope so. The snake is nicely labeled with a continuous yellow, and
everything else seems to work smooth. I had to make some sacrifices to
get this to work, though. First of all, I swapped the Vigra UnionFind
with a Python one, because I want it to be thread safe - writing a
wrapper for this seemed like overkill. The other problem I encountered
with locks and lazyflow: when I tried to use an OpCompressedcache
instead of a ChunkedArray, I ended up getting deadlocks no matter how
hard I tried to find a reason for it. These deadlocks show up when
launching requests from within critical operations. I asssume there must
be some special functionality regarding thread management that
undermines my locking policy.

But enough of the past: welcome to the future. In the future we will
have more of everything - especially more dimensions. The current
operator does only support 3d spatial data, which is a shame. It should
be able to treat 4d and even 5d data as well!

The problem with 5d support in ilastik, although problem might be too
much here, is that in principle every applet and workflow supports 5-d
data, but you might run into problems if your datasets are somewhat
ill-formed. And I’m not even speaking of the ambiguity that some
specific axes orders show. We decided a while ago that we want to handle
everything as 5d data internally, which was in principle a brilliant
decision. You could write new operators and would not have to support
anything but 5d txyzc, and ilastik would handle the rest. There is even
a wrapping operator in lazyflow that turns old 3d operators into fully
functional 5d ones.

But there’s a drawback to this. For some datasets, most of them having
many time slices, the loading times went up to hours. And that is graph
construction time, not calculation. The solution to this problem is also
clear: write 5d operators to start with. The last operators I touched
went something like this:

After a while, you memorize this pattern, and just automatically apply
it everywhere. And at some point you get frustrated, because you don’t
want to write double for loops any more, and procrastinate by writing
blog posts.