Is there any fast way to exploit the fact that data is shifting through A rather than being completely new?
Even though all you're doing is the shifting and adding new elements to A, the products in C will, in general, all be different since one of the operands will generally change after each iteration. If you have additional information about the way the elements of A or B are structured, you could potentially use that structure to reduce the number of multiplications. Barring any such structural considerations, you will have to compute all 25 products each loop.
Ideally I want to minimize the number of multiplication operations (at a cost of perhaps more additions/subtractions/accumulations).
In theory, you can reduce the number of multiplications to 0 by shifting and adding the array elements to simulate multiplication. In practice, this will be slower than a hardware multiplication so you're better off just using any available hardware-based multiplication unless there's some additional, relevant constraint you haven't mentioned.