I'm an amateur at this, but wouldn't you get a better result if you started with a much bigger overlap, e.g. a "hop size" of N/10 or something like that? Then you'd have more freedom to adjust it on output while still keeping a substantial overlap.
Also, it might pay to adjust the steepness of the window depending on how much you're expanding/compressing time.