That would deserve a test but from what I've read (what is the same in various cores reference manuals and PowerISA 2.05), there is no need to align data address. The action points on the block that contains the address.
If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the instruction cache of any processors, the block is invalidated in those instruction caches, so that subsequent references cause the block to be fetched from main storage.
I don't know your code but is the alignment is done at the beginning and then a loop adds the cache block size to the EA?