IDK why it's taking that long (because I don't know windows), but I can say you are going about this the wrong way. Specifically, you shouldn't make any timing expectations about opening a new stream. For example, I would expect similar issues (albeit to a much lesser degree) on OS X.
The correct implementation would be to always have a stream open, playing silence. Then, when you need to play a sound, you can play it right away. For best latency, you should pre-load the first few buffers from the file so you don't need to access the disk when playback starts. I don't know what the exact overhead is on windows for opening a stream (I'm sure it depends on the API), but on some versions of OS X, it's huge (the entire kernel switches into preemptive mode if no audio was running before).
That said, 1.3 seconds is insane. I recommend asking on the mailing list. Be sure to say what host-API you are using because you didn't say that here, and, for Windows, it matters. Also, what version of windows.