Question

I am trying to implement a piece of software that mix piano samples. I want to create a wav file containing one sound, the other and a mix of both.

I trunc the samples at one second so I have the following : [one second of sound 1][one second of sound 2][one second of soud 1 + sound 2]

The problem is that there is a weird sound artifact at each transition. Does somebody know where it comes from ?

Thanks in advance.

Here is the code I am using :

#include "stdafx.h"
#include <cstdlib>
#include <sndfile.h>


int _tmain(int argc, _TCHAR* argv[])
{
    SF_INFO sInfo1;
    SF_INFO sInfo2;
    SF_INFO sInfo3;

    SNDFILE *sFile1 = NULL;
    SNDFILE *sFile2 = NULL;
    SNDFILE *sFile3 = NULL;

    double *buff1;
    double *buff2;
    double *buff3;

    sf_count_t count1 = 0;
    sf_count_t count2 = 0;
    sf_count_t count3 = 0;

    buff1 = (double*)malloc(88200*sizeof(double));
    buff2 = (double*)malloc(88200*sizeof(double));
    buff3 = (double*)malloc(88200*sizeof(double));

    sInfo1.format = 0;
    sInfo2.format = 0;
    sFile1 = sf_open("C:/samples/mezzo forte/mcg_mf_022.wav", SFM_READ, &sInfo1);
    sFile2 = sf_open("C:/samples/mezzo forte/mcg_mf_046.wav", SFM_READ, &sInfo2);

    sInfo3 = sInfo2;
    sFile3 = sf_open("C:/samples/test1.wav", SFM_WRITE, &sInfo3);

    count1 = sf_read_double(sFile1, buff1, 88200);
    count2 = sf_read_double(sFile2, buff2, 88200);

    for(int i=0; i<88200; i++)
    {
        buff3[i] = buff1[i] + buff2[i] - ( buff1[i] * buff2[i] );
    }

    count1 = sf_write_double(sFile3, buff1, 88200);
    count2 = sf_write_double(sFile3, buff2, 88200);
    count3 = sf_write_double(sFile3, buff3, 88200);

    sf_close(sFile1);
    sf_close(sFile2);
    sf_close(sFile3);

    free(buff1);
    free(buff2);
    free(buff3);

    //getchar();
    return 0;
}
Was it helpful?

Solution

This is not a libsndfile issue. This is a general audio synthesis question.

Whenever you truncate a sample to an arbitrary value (e.g., 1 second), you can expect to hear (or see, if you were to load the resulting file in, say, Audacity and check-out the spectrogram and waveform at the transition boundaries) an artifact. This is because of the abrupt change in the sample waveform. I will skip an attempt at discussing the issues surrounding bandlimiting, and simply urge you to do a rapid fade-out of the samples rather than just truncating them. This forces your audio waveform to [rapidly] approach zero just before the transition-- smoothly.

You may find that you also need to fade-in (or cross-fade, if you overlap the smooth transitions) the next sample by weighting its first few samples by a value close to zero, and ramping that up [rapidly, or you'll miss the attack] to full-scale. First, start with just doing a rapid fade-out before each transition, and only if you need to, worry about the fade-in. The implementation is the same (a sample scaling value that ramps-up or -down), but it's the arbitrary truncation (ending) of the samples at 1 second that's likely causing the most trouble.

You'll need to play with a few different parameters to see what works. E.g., you might want to start with a linear ramp-down rather than an exponential or parabolic decay function for simplicity. In any case, you will have to decide how many samples (or how many milliseconds) from the transition point to start scaling down the sample values.

EDIT:

I initially assumed that your mixing was fine, because you only asked about the transition artifacts. My answer addresses that. It is worthy of note, however, that I've no clue why you're mixing into the buff3 exactly as you are, given your stated aim. If I understand correctly that you want to simply combine the two sounds into buff3, just add the other two corresponding samples together and ensure they don't clip (i.e., exceed the range [-1.0, +1.0]). libsndfile automatically "guards" against clipping, but it's only able to set, e.g., a sample value above 1.0 to 1.0-- not ensure both audio waveforms are mixed equally.

If the level on either of your two input sounds was high enough, the simple additive mix would clip, which would be another type of "artifact" (except likely to ruin the whole sound, so in this particular case, you probably would have noted as much). For general mixing, however, your loop would be:

for(int i=0; i<88200; i++)
{
    /* multiply sum of signals by factor slightly
       less than reciprocal of their count to guard
       also against floating-point error. */

    buff3[i] = (buff1[i] + buff2[i]) * 0.499;
}

OTHER TIPS

It's not at all clear to me what you are trying to do. Your description says: "I want to create a wav file containing one sound, the other and a mix of both." If you want to combine one sound and another, why also mix in a combination of the two? That's like taking a mixture of milk and cream and adding in some half and half.

Maybe you were trying to make the output be three seconds long, with the first second containing the first sound, the second second containing the second sound and third second containing a mix of the two, but your code isn't anything like that, so I'm going to ignore that possibility.

Let's do a little basics.

To copy the first second of the first file, your loop would look like this:

buff3[i] = buff1[i] ;

To copy the first second of second file, your loop would look like this:

buff3[i] = buff2[i] ;

To mix the two, you simply add them. Mixing is the same as combining. Sometimes we say we are taking a "superposition" of the two signals:

buff3[i] = buff1[i] + buff2[i] ;

You usually want to divide by two to prevent the signal from going "out of bounds":

buff3[i] = ( buff1[i] + buff2[i] ) / 2  ;

Note that nowhere have we multiplied the sample-values of the signals by each other. Sample-by sample multiplication like you were doing is reserved for very unusual circumstances, like AM synthesis.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top