The way you'd like to do this is very slow! Imagine you'd like to multiply two 32-bit numbers (you can do this with the 8080, the 4004 has not enough memory): When both numbers are larger than 1000000 the multiplication would take a lot of time.
A better algorithm would be like this:
set result = 0
set A = first number
set B = second number
loop:
if the lowest bit of A is 0 then jump to "no_add"
add B to result
no_add:
shift A right (logic, not arithmetic!) one bit
shift B left one bit
if A is not zero then jump to "loop"
Using a "rotate through carry" operation you may do the "shift A right one bit" and the "check the (previous) value of the lowest bit of A" operations using one instruction!