Вопрос

I am doing performance testing on log4cxx. When I call LOG4CXX_XXXXX directly the following code takes around 10 secs to finish logging everything. But when I enclose the call with boost::lockguard (which I think is a critical section) it takes around 5 secs. Can somebody explain why is it so and not the other way around.

#include "log4cxx/logger.h"
#include "log4cxx/xml/domconfigurator.h"
#include <boost/thread/thread.hpp>
#include <boost/timer/timer.hpp>
#include <boost/thread/recursive_mutex.hpp>

//#define ENHANCEDLOGGING

using namespace log4cxx;
using namespace log4cxx::helpers;

#define LOG_TMP_BUF_SIZE        3000
LoggerPtr logger(Logger::getRootLogger());

#ifdef ENHANCEDLOGGING
void LOG_DEBUG(const char *format)
{
    LOG4CXX_DEBUG(logger, format);
}
#else
boost::recursive_mutex m_guard;
#define LOG_COMM 0

void LOG_DEBUG(const char *format)
{
    boost::lock_guard<boost::recursive_mutex> lock(m_guard);
    LOG4CXX_DEBUG(logger, format);
}
#endif

const int thread_count = 100;

void printer(void)
{
    for (int i = 0; i < 4000; i++)
        LOG_DEBUG("Logging performance check");
}

int main(int argc, char **argv)
{
    boost::timer::auto_cpu_timer t;
    xml::DOMConfigurator::configure("config.xml");

    boost::thread_group threads;
    for (int i = 0; i != thread_count; ++i)
        threads.create_thread(printer);

    threads.join_all();

    std::cout << t.elapsed().user << std::endl;
    std::cout << t.elapsed().wall << std::endl;
    std::cout << t.elapsed().system << std::endl;
    return 0;
}

I am using the following configuration. Running with/without org.apache.log4j.AsyncAppender doesn't make a difference

<appender name="appxNormalAppender" class="org.apache.log4j.FileAppender">
    <param name="file" value="appxLogFile.log" />
    <param name="append" value="true" />
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d %-5p %C{2} (%F:%L) %t - %m%n" />
    </layout>
</appender>
<appender name="async" class="org.apache.log4j.AsyncAppender">
    <param name="BufferSize" value="1000"/>
    <appender-ref ref="appxNormalAppender"/>
</appender>
<root>
    <priority value="all" />
        <appender-ref ref="appxNormalAppender"/>
</root>
</log4j:configuration>
Это было полезно?

Решение

EDIT:

OK, I had mistaken the timings by a factor 1000 so I was thinking the logger was not outputting anything.

What you are witnessing is the cost of the condition variable hidden beneath the log dispatcher thread.

You can read this excellent analysis if you want to go into more details.

Basically, when you synchronize the printer task before calling the logger, all the printers will queue on your own lock and the logger will be called more efficiently.

If your printers are accessing the logger asynchronously, there will be more occurences of the logger finding there is no work to be done after finishing the last output, so the logger thread will go to sleep on the condition variable, which will consume more CPU when the next printer has to wake the logger.

See this sample program that emulates a barebone waiting queue implemented with condition variables.
Here the logger thread queue has a length of 1, but when you stuff thousands of messages into the system, the queue of the real logger will be filled to the brim in no time, so most of your test program will run with a saturated queue, thus reproducing the conditions in this small example.

#include <mutex>
#include <condition_variable>
#include <thread>
#include <ctime>
#include <vector>
#include <iostream>

#define thread_count 100
#define log_count 600000/thread_count

const char * pending;
bool kill = false;
std::mutex              lock;
std::condition_variable work;
int logger_waits;
void logger (void)
{ 
    // process requests
    for (;;)
    {
        std::unique_lock<std::mutex> waiter(lock);
        while (pending == nullptr && !kill)
        {
            logger_waits++;
            work.wait(waiter);
        }
        if (kill) break;
        std::cout << pending << std::endl;
        pending = nullptr;       
    }
}

void console_log (const char * format)
{
    std::unique_lock<std::mutex> waiter(lock);
    pending = format;
    work.notify_one();
}

void printer1(void)
{
    for (int i = 0; i < log_count; i++)
    {
        console_log("Logging check");
    }
}

std::mutex m_guard;
void printer2(void)
{
    for (int i = 0; i < log_count; i++)
    {
        std::unique_lock<std::mutex> waiter(m_guard);
        console_log("Logging check");
    }
}

int bench (void (*printer)(void))
{
    std::vector<std::thread> printers;
    printers.resize(thread_count);
    logger_waits = 0;

    clock_t start =clock();

    for (int i = 0; i != thread_count; ++i)
        printers[i] = std::thread(printer);

    for (int i = 0; i != thread_count; ++i)
        printers[i].join();

    return (clock()-start)/(CLOCKS_PER_SEC/1000);

}

int main(int argc, char **argv)
{  
    std::thread logthread (logger);

    int time2 = bench (printer2);
    int lw2 = logger_waits;
    int time1 = bench (printer1);
    int lw1 = logger_waits;
    fprintf (stderr, "\n   lock : %d ms %d waits\n", time1, lw1);
    fprintf (stderr,   "no lock : %d ms %d waits\n", time2, lw2);

    kill = true;
    work.notify_one();
    logthread.join();

    return 0;
}

typical output:

   lock : 3940 ms 183 waits
no lock : 4820 ms 16093 waits

All this being said, this test case does not reflect the proper use of a log system. It is meant to output a few messages at a time, so that the main application is never slowed down by log outputs.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top