The latest commit to https://github.com/burgerdev/gsoc2014 makes the lazy connected components operator work with arbitrary input data (as regarding dimensionality). Initially I thought that it would be a pain to change all the stuff I had carefully crafted to work with just 3d data, but it turned out not to be that bad. Most of the internals handle a thingy I called ‘ChunkIndex’ to access data, save states and so on. This index triple is used as a key to arrays and dictionaries, which made it easy to just switch to a quintuple. The only thing that really needed changing was the logic behind ‘generateNeighbours’ – time and channel neighbours are simply ignored.

With this done, there is not much keeping me from integrating the whole thing into ilastik (lazyflow, in particular). We did not decide yet when and how lazy connected components should be used in the software. It could be set as the default, but we would have to accept performance losses with small datasets and non-sparse objects. Or it could be optional, depending on what the user wants. Which would imply heavy GUI work, because the labeling operator is used almost everywhere I look, and it is not certain that ‘the user’ actually knows what he wants. The sanest way would probably be to decide automatically (i.e. hard-coded) depending on the input data.

Once the operator is in ilastik, the last thing seperating us from having a truely lazy thresholding applet is the applet itself. If I remember correctly, quite a few internal design decisions rely on labeling being a global operation, e.g.

1
2
3
4
5
def execute(self, slot, subindex, roi, result):
    # labeling is global anyways, do the whole input at once
    data = self.Input[...].wait()
    newdata = self._handleData(data)
    result[...] = newdata[roi.toSlice()]

The first thing that comes to mind is the ‘OpFilterLabels’ operator, which will most likely have to be rewritten in a lazy fashion.