This is not a libsndfile issue. This is a general audio synthesis question.
Whenever you truncate a sample to an arbitrary value (e.g., 1 second), you can expect to hear (or see, if you were to load the resulting file in, say, Audacity and check-out the spectrogram and waveform at the transition boundaries) an artifact. This is because of the abrupt change in the sample waveform. I will skip an attempt at discussing the issues surrounding bandlimiting, and simply urge you to do a rapid fade-out of the samples rather than just truncating them. This forces your audio waveform to [rapidly] approach zero just before the transition-- smoothly.
You may find that you also need to fade-in (or cross-fade, if you overlap the smooth transitions) the next sample by weighting its first few samples by a value close to zero, and ramping that up [rapidly, or you'll miss the attack] to full-scale. First, start with just doing a rapid fade-out before each transition, and only if you need to, worry about the fade-in. The implementation is the same (a sample scaling value that ramps-up or -down), but it's the arbitrary truncation (ending) of the samples at 1 second that's likely causing the most trouble.
You'll need to play with a few different parameters to see what works. E.g., you might want to start with a linear ramp-down rather than an exponential or parabolic decay function for simplicity. In any case, you will have to decide how many samples (or how many milliseconds) from the transition point to start scaling down the sample values.
EDIT:
I initially assumed that your mixing was fine, because you only asked about the transition artifacts. My answer addresses that. It is worthy of note, however, that I've no clue why you're mixing into the buff3
exactly as you are, given your stated aim. If I understand correctly that you want to simply combine the two sounds into buff3
, just add the other two corresponding samples together and ensure they don't clip (i.e., exceed the range [-1.0, +1.0]). libsndfile automatically "guards" against clipping, but it's only able to set, e.g., a sample value above 1.0 to 1.0-- not ensure both audio waveforms are mixed equally.
If the level on either of your two input sounds was high enough, the simple additive mix would clip, which would be another type of "artifact" (except likely to ruin the whole sound, so in this particular case, you probably would have noted as much). For general mixing, however, your loop would be:
for(int i=0; i<88200; i++)
{
/* multiply sum of signals by factor slightly
less than reciprocal of their count to guard
also against floating-point error. */
buff3[i] = (buff1[i] + buff2[i]) * 0.499;
}