As I mentioned it in my question, those ARM instructions (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0100a/armasm_cihjgdid.htm) can do the trick:
int32 *image;
for (i = 0; i < size / 4; h++) {
asm("rbit %1,%0" : "=r" (image[i]) : "r" (image[i]));
asm("rev %1,%0" : "=r" (image[i]) : "r" (image[i]));
}
rbit reverses the 32 bits, bit by bit. rev reverses the 32 bits, byte by byte. In fine, each byte is reversed independently. I'm still wondering if there is a better syntax or a better way to do this.