This is a tiny leaf function that is called a huge number of times. Profiling results always over-represent the cost of these functions because the overhead of measuring the calls is large relative to the cost of the function itself. With normal optimization the cost of the entire operation (at the level of the outer loops that ultimately invoke this test) will be a lower percentage of the overall runtime. You may be able to observe this by getting that function to inline with profiling enabled (eg with __attribute__((__always_inline__))
).
Your function looks fine as written. I doubt you could optimize an individual test like that further than you have (or if you could, it would not be dramatic). If you want to optimize the whole operation you need to do it at a higher level:
- You could try another structure (e.g. kd-tree instead of octree) or an entirely new algorithm that takes advantage of some pattern in your data.
- You could invert the loop from "for each point check otrees" to "for each otree check points", which lets you re-use bounds data over and over.
- You can ensure you're accessing data (points, probably) in the most efficient way (i.e. sequentially rather than randomly jumping around).
- With a restructured loop you could use SSE to execute multiple bounds tests in a single instruction (with no branching!).