More Dimensions

The most recent commits in master are finally thread-safe. At least I hope so. The snake is nicely labeled with a continuous yellow, and everything else seems to work smooth. I had to make some sacrifices to get this to work, though. First of all, I swapped the Vigra UnionFind with a Python one, because I want it to be thread safe - writing a wrapper for this seemed like overkill. The other problem I encountered with locks and lazyflow: when I tried to use an OpCompressedcache instead of a ChunkedArray, I ended up getting deadlocks no matter how hard I tried to find a reason for it. These deadlocks show up when launching requests from within critical operations. I asssume there must be some special functionality regarding thread management that undermines my locking policy.

But enough of the past: welcome to the future. In the future we will have more of everything - especially more dimensions. The current operator does only support 3d spatial data, which is a shame. It should be able to treat 4d and even 5d data as well!

The problem with 5d support in ilastik, although problem might be too much here, is that in principle every applet and workflow supports 5-d data, but you might run into problems if your datasets are somewhat ill-formed. And I’m not even speaking of the ambiguity that some specific axes orders show. We decided a while ago that we want to handle everything as 5d data internally, which was in principle a brilliant decision. You could write new operators and would not have to support anything but 5d txyzc, and ilastik would handle the rest. There is even a wrapping operator in lazyflow that turns old 3d operators into fully functional 5d ones.

But there’s a drawback to this. For some datasets, most of them having many time slices, the loading times went up to hours. And that is graph construction time, not calculation. The solution to this problem is also clear: write 5d operators to start with. The last operators I touched went something like this:

def execute(self, slot, subindex, roi, result):
    for c in range(nchannels):
        for t in range(ntime):
            data = self.Input[t, ..., c].wait()
            modified = self.treatData(data)
            result[t, ..., c] = modified

After a while, you memorize this pattern, and just automatically apply it everywhere. And at some point you get frustrated, because you don’t want to write double for loops any more, and procrastinate by writing blog posts.