Question

I'm currently working with pointclouds alot, and I have implemented a segmentation algorithm that clusters points with a specific maximum distance into segments.

To optimize that, I've given each segment an axis-aligned bounding box,to check if the given point could possibly be a match for a segment, before looking closer and iterating over the points and calculating distances (I actually use an octree for this, to prune a majority of the points away.)

I've run my program through gnuprof and, that's the result:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 52.42      5.14     5.14 208995661     0.00     0.00  otree_node_out_of_bounds
 19.60      7.06     1.92 189594292     0.00     0.00  otree_has_point_in_range
 11.33      8.17     1.11   405834     0.00     0.00  otree_node_has_point_in_range
  9.29      9.08     0.91   352273     0.00     0.00  find_matching_segments
 [...]

As you can see, the majority of computation time is spent in otree_node_out_of_bounds which is implemented as follows:

int otree_node_out_of_bounds(struct otree_node *t, void *p)
{
    vec3 *_p = p;
    return (_p->x < t->_llf[0] - SEGMENTATION_DIST 
        || _p->x > t->_urb[0] + SEGMENTATION_DIST
        || _p->y < t->_llf[1] - SEGMENTATION_DIST 
        || _p->y > t->_urb[1] + SEGMENTATION_DIST
        || _p->z < t->_llf[2] - SEGMENTATION_DIST 
        || _p->z > t->_urb[2] + SEGMENTATION_DIST);
}

where SEGMENTATION DIST is a compile time constant, to allow gcc to do some constant folding. _llf and _urb are of type float[3] and represent the bounding box of the octree.

So, my question basically is, is it possible to do some sneaky optimization on this function, or, to be more general, is there a more efficient way to do bounds checking on AABBs, or to phrase it even differently, can I speed up the comparison somehow by using some C/gcc magic?

If you need more information to answer this question, please let me know :)

Thanks, Andy.

Was it helpful?

Solution

This is a tiny leaf function that is called a huge number of times. Profiling results always over-represent the cost of these functions because the overhead of measuring the calls is large relative to the cost of the function itself. With normal optimization the cost of the entire operation (at the level of the outer loops that ultimately invoke this test) will be a lower percentage of the overall runtime. You may be able to observe this by getting that function to inline with profiling enabled (eg with __attribute__((__always_inline__))).

Your function looks fine as written. I doubt you could optimize an individual test like that further than you have (or if you could, it would not be dramatic). If you want to optimize the whole operation you need to do it at a higher level:

  • You could try another structure (e.g. kd-tree instead of octree) or an entirely new algorithm that takes advantage of some pattern in your data.
  • You could invert the loop from "for each point check otrees" to "for each otree check points", which lets you re-use bounds data over and over.
  • You can ensure you're accessing data (points, probably) in the most efficient way (i.e. sequentially rather than randomly jumping around).
  • With a restructured loop you could use SSE to execute multiple bounds tests in a single instruction (with no branching!).

OTHER TIPS

It looks good to me. The only micro optimisation I can think of is declaring *_p as static

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top