Numpy is pretty impressive

As it turns out, my C++ functions suck. Really bad. Since my python functions turned out to be alright, I started developing their vigra counterparts. After a long night I finally got the python interface to the UnionFind data structure somewhat right, and I turned to the implementation of the mergeLabels function.

This function should simply inspect two adjacent chunks, find labeled regions that cross the boundary and merge the corresponding labels in the global UnionFind structure. A fairly simple task, as you can see from my python implementation:

def mergeLabels(hyperplane_a, hyperplane_b,
                label_hyperplane_a, label_hyperplane_b,
                mapping_a, mapping_b, GUF):
    # the indices where objects are adjacent
    idx = hyperplane_a == hyperplane_b
    # merge each pair of labels
    for label_a, label_b in zip(label_hyperplane_a[idx],
        GUF.makeUnion(mapping_a[label_a], mapping_b[label_b])

But converting this into a vigra function turned out to be not that trivial. Consider this naive port (skip to line 24 if you are not familiar with vigra) :

template void mergeLabels(MultiArrayView const & left,
                          MultiArrayView const & right,
                          MultiArrayView const & leftLabels,
                          MultiArrayView const & rightLabels,
                          MultiArrayView<1, LabelType> const & leftMap,
                          MultiArrayView<1, LabelType> const & rightMap,
                          detail::UnionFindArray & unionFind
                          ) {   
    vigra_precondition(left.shape() == right.shape(), "Shape mismatch");
    vigra_precondition(leftLabels.shape() == rightLabels.shape(), "Shape mismatch");
    vigra_precondition(leftLabels.shape() == left.shape(), "Shape mismatch");

    const MultiArrayIndex LEFT_PIXEL = 1;
    const MultiArrayIndex RIGHT_PIXEL = 2;
    const MultiArrayIndex LEFT_LABEL = 3;
    const MultiArrayIndex RIGHT_LABEL = 4;

    typedef typename CoupledIteratorType::type Iterator;

    Iterator start = createCoupledIterator(left, right, leftLabels, rightLabels);
    Iterator end = start.getEndIterator();

    for (Iterator it = start; it < end; it++)
        if (it.get() == it.get())

This does actually do what we want, and we are using a C-loop over the blocks so we should be fine performance-wise, shouldn’t we? Guess again. A little benchmark showed these results:

pyMergeLabels() for shape (10, 100, 100):    0.014scMergeLabels() for shape (10, 100, 100):    0.293s

This is just sad. Numpy is actually about 20 times faster than my C++ function, even with a favourable axis order. I guess I’m gonna stick to my python function until I figure out why my array traversal is so slow.