Getting there

After fixing some inlining problems and comparing the Python and C++ versions in a fairer way (read: with equal amounts of makeUnion calls), the C++ implementation is now blazingly fast. Compared to raw pointer arithmetics, it takes just 1.5 times as long, and the numpy version is much slower when the arrays are non-trivial. This means I actually got time to write lazyflow operators now!