The format tells you how the audio is formatted: sp_audioformat
I think the only current format is 16-bit signed pcm samples. There might be one or two channels. So if there are two channels, each frame consists of two consecutive 16-bit signed integers. The frames pointer points to the start of a contiguous array of such frames, with a number of elements equal to num_frames. I don't know how you're getting NaN - only floating point types can be NaN, and you shouldn't have floating point types involved here at all.
You can see an example of implementing the music_delivery callback in jukebox.c.