Question

I´m compiling and ios project using an opencv framework, so I´m interested to know what are the best compiler flags to my project.

The project process a lot of matrix pixels , so I need from the side of the compiler to have SIMD instructions to be able to process this matrix as efficient as possible.

I using this flags :-mfpu=neon, -mfloat-abi=softfp and -O3,

And I also find this other flags: -mno-thumb -mfpu=maverick -ftree-vectorize -DNS_BLOCK_ASSERTIONS=1

I don´t know really if it is going to save me a lot of cpu processing, I search through google, but I didn´t find something that give me good reasons to know the best compiler flags.

Thanks

Was it helpful?

Solution

I am also using the same flags that you use for neon. No optimization would be done on neon intrinsic codes according to the optimization level O3 or anything. It just optimizes the ARM code.

As said by Vasile the best performance can be gained by writing the neon codes in assembly. The easiest way is to write a program in which intrinsic neon codes are used and compile it using the flags you mentioned. Now use the assembly code generated for the code for further optimization.

A lot of optimization can be done by parallelizing or making use of dual instruction capabilities of neon.

OTHER TIPS

The problem is that compilers are not so good at generating vectorized code. So, by just enabling NEON you'll not get much improvements (maybe 10% ??)

what you can do is to profile your app and write by hand those parts that eats your time, using NEON. And if you do it, why not patch them into the public OpenCV source?

By now, OpenCV has little to no code optimized for NEON (for the x86 SSE2, it is much better optimized).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top