Start to check compiler settings, is it set to maximum performance?
Than, depending from architecture, calculation of integral image have several bottleneck.
Computations itself, some low cost CPU can't perform integer math with good performance. No solution.
Data flow is not optimal. The solution is to provide optimal data flows ( number of sequential read and write streams). For example you can process 2 rows simultaneously.
Data dependency of algorithm. On modern CPU it can be biggest problem. The solution is to change processing algorithm. For example calculate odd/even pixels without dependency (more calculations , less dependency).
Processing can be done using GPU.