The push/pop
in this case serves to run the outer loop. and only use one register for the whole delay, while having two counters (inner loop with 03FFH and outer loop with 0FFH).
Maybe the author didn't want to spoil another register, or he wanted to use the loop instruction which requires the cx
register.