not sure about your whole algorithm and can't test it at the moment, but for IplImages, the memory is aligned as this:
1. row
baseadress + 0 = b of [0]
baseadress + 1 = g of [0]
baseadress + 2 = r of [0]
baseadress + 3 = b of [1]
etc
2. row
baseadress + widthStep + 0 = b
baseadress + widthStep + 1 = g
baseadress + widthStep + 2 = r
so if you have have n*m
blocks of size 8x8
unsigned char bgr data and you want to loop over variables [x,y]
in block [bx,by]
you can do it like this:
baseadress + (by*8+ y_in_block)*widthStep + (bx*8+x)*3 +0 = b
baseadress + (by*8+ y_in_block)*widthStep + (bx*8+x)*3 +1 = g
baseadress + (by*8+ y_in_block)*widthStep + (bx*8+x)*3 +2 = r
since row by*8+y is adress
baseadress + (by*8+ y_in_block)*widthStep`
and column bx*8+x
is adress offset (bx*8+x)*3