Log4cxx is slow when used without critical section

Question

EDIT:

OK, I had mistaken the timings by a factor 1000 so I was thinking the logger was not outputting anything.

What you are witnessing is the cost of the condition variable hidden beneath the log dispatcher thread.

You can read this excellent analysis if you want to go into more details.

Basically, when you synchronize the printer task before calling the logger, all the printers will queue on your own lock and the logger will be called more efficiently.

If your printers are accessing the logger asynchronously, there will be more occurences of the logger finding there is no work to be done after finishing the last output, so the logger thread will go to sleep on the condition variable, which will consume more CPU when the next printer has to wake the logger.

See this sample program that emulates a barebone waiting queue implemented with condition variables.
Here the logger thread queue has a length of 1, but when you stuff thousands of messages into the system, the queue of the real logger will be filled to the brim in no time, so most of your test program will run with a saturated queue, thus reproducing the conditions in this small example.

#include <mutex>
#include <condition_variable>
#include <thread>
#include <ctime>
#include <vector>
#include <iostream>

#define thread_count 100
#define log_count 600000/thread_count

const char * pending;
bool kill = false;
std::mutex              lock;
std::condition_variable work;
int logger_waits;
void logger (void)
{ 
    // process requests
    for (;;)
    {
        std::unique_lock<std::mutex> waiter(lock);
        while (pending == nullptr && !kill)
        {
            logger_waits++;
            work.wait(waiter);
        }
        if (kill) break;
        std::cout << pending << std::endl;
        pending = nullptr;       
    }
}

void console_log (const char * format)
{
    std::unique_lock<std::mutex> waiter(lock);
    pending = format;
    work.notify_one();
}

void printer1(void)
{
    for (int i = 0; i < log_count; i++)
    {
        console_log("Logging check");
    }
}

std::mutex m_guard;
void printer2(void)
{
    for (int i = 0; i < log_count; i++)
    {
        std::unique_lock<std::mutex> waiter(m_guard);
        console_log("Logging check");
    }
}

int bench (void (*printer)(void))
{
    std::vector<std::thread> printers;
    printers.resize(thread_count);
    logger_waits = 0;

    clock_t start =clock();

    for (int i = 0; i != thread_count; ++i)
        printers[i] = std::thread(printer);

    for (int i = 0; i != thread_count; ++i)
        printers[i].join();

    return (clock()-start)/(CLOCKS_PER_SEC/1000);

}

int main(int argc, char **argv)
{  
    std::thread logthread (logger);

    int time2 = bench (printer2);
    int lw2 = logger_waits;
    int time1 = bench (printer1);
    int lw1 = logger_waits;
    fprintf (stderr, "\n   lock : %d ms %d waits\n", time1, lw1);
    fprintf (stderr,   "no lock : %d ms %d waits\n", time2, lw2);

    kill = true;
    work.notify_one();
    logthread.join();

    return 0;
}

typical output:

   lock : 3940 ms 183 waits
no lock : 4820 ms 16093 waits

All this being said, this test case does not reflect the proper use of a log system. It is meant to output a few messages at a time, so that the main application is never slowed down by log outputs.