Your code works exactly as expected.
Trying your function out in the simulator from GHCi, the result is:
*Main> sipo 3 (high :: Signal CLK Bool)
[high,? | high .,? | ? | high .]
The way to read it is:
sipo 3 high !! 0 = high
sipo 3 high !! 1 = ? | high
sipo 3 high !! 2 = ? | ? | high
This output from the Lava simulator means the first output is high
in the first cycle, and there's no simulator input to tell futher values. Similarly, the second output is undefined in the first cycle and high
in the second; and the third output is undefined for two cycles and high
in the third.
This makes perfect sense, since the second output is not set to anything in the first cycle: the delayed input signal hasn't had time yet to get there.
The reason the result is different from York Lava is that York Lava's delay
primitive seems to take an extra value to be used before the first clock cycle. I'm not sure that is synthesizable, though.