How to record the microphone untill there is no sound?

Question 1

Because sound is a wave, it oscillates between high and low pressures. This waveform is usually recorded as positive and negative numbers, with zero being the neutral pressure. If you take the absolute value of the signal and keep a running average it should be sufficient.

The average should be taken over a long enough period that you account for the appropriate amount of silence. A very cheap way to keep an estimate of the running average is like this:

const double threshold = 50;    // Whatever threshold you need
const int max_samples = 10000;  // The representative running average size

double average = 0;             // The running average
int sample_count = 0;           // When we are building the average

while( sample_count < max_samples || average > threshold ) {
    // New sample arrives, stored in 'sample'

    // Adjust the running absolute average
    if( sample_count < max_samples ) sample_count++;
    average *= double(sample_count-1) / sample_count;
    average += std::abs(sample) / sample_count;
}

The larger max_samples, the slower average will respond to a signal. After the sound stops, it will slowly trail off. However, it will be slow to rise again too. This would be fine for reasonably continuous sound.

With something like speech, which can have short or long pauses, you may want to use an impulse-based approach. You can just define the number of samples of 'silence' that you expect, and reset it whenever you receive an impulse that exceeds the threshold. Using the running average above with a much shorter window size will give you a simple way of detecting an impulse. Then you just need to count...

const int max_samples = 100;             // Smaller window size for impulse
const int max_silence_samples = 10000;   // Maximum samples below threshold
int silence = 0;                         // Number of samples below threshold

while( silence < max_silence_samples ) {
    // Compute running average as before

    //...

    // Check for silence.  If there's a signal, reset the counter.
    if( average > threshold ) silence = 0;
    else ++silence;
}

Adjusting threshold and max_samples will control the sensitivity to pops and clicks, while max_silence_samples gives you control over how much silence is allowed before you stop recording.

There are undoubtedly more technical ways to achieve your goals, but it's always good to try the simple one first. See how you go with this.

Question 2

I suggest you to do it via DirectShow. You should create an instance of microphone, SampleGrabber, audio encoder and file writer. Your graph should be like this:

Microphone -> SampleGrabber -> Audio Encoder -> File Writer

Every sample passes through SampleGrabber and you can read all raw samples and check if you should continue record or not. This is the best way you and both record and check it's contents.