Maneira rápida e suja de criar o perfil do seu código

https://stackoverflow.com/questions/61278

09-06-2019
|

Pergunta

Qual método você usa quando deseja obter dados de desempenho sobre caminhos de código específicos?

Solução

Este método tem várias limitações, mas ainda o considero muito útil.Vou listar as limitações (que conheço) antecipadamente e deixar que quem quiser usá-lo o faça por sua própria conta e risco.

A versão original que publiquei relatou excessivamente o tempo gasto em chamadas recursivas (conforme apontado nos comentários da resposta).
Não é thread-safe, não era thread-safe antes de adicionar o código para ignorar a recursão e é ainda menos thread-safe agora.
Embora seja muito eficiente se for chamado muitas vezes (milhões), terá um efeito mensurável no resultado, de modo que os escopos que você mede levarão mais tempo do que aqueles que você não mede.

Eu uso essa classe quando o problema em questão não justifica a criação de perfil de todo o meu código ou quando obtenho alguns dados de um criador de perfil que desejo verificar.Basicamente, ele resume o tempo gasto em um bloco específico e, no final do programa, envia-o para o fluxo de depuração (visível com DbgView), incluindo quantas vezes o código foi executado (e o tempo médio gasto, é claro)).

#pragma once
#include <tchar.h>
#include <windows.h>
#include <sstream>
#include <boost/noncopyable.hpp>

namespace scope_timer {
    class time_collector : boost::noncopyable {
        __int64 total;
        LARGE_INTEGER start;
        size_t times;
        const TCHAR* name;

        double cpu_frequency()
        { // cache the CPU frequency, which doesn't change.
            static double ret = 0; // store as double so devision later on is floating point and not truncating
            if (ret == 0) {
                LARGE_INTEGER freq;
                QueryPerformanceFrequency(&freq);
                ret = static_cast<double>(freq.QuadPart);
            }
            return ret;
        }
        bool in_use;

    public:
        time_collector(const TCHAR* n)
            : times(0)
            , name(n)
            , total(0)
            , start(LARGE_INTEGER())
            , in_use(false)
        {
        }

        ~time_collector()
        {
            std::basic_ostringstream<TCHAR> msg;
            msg << _T("scope_timer> ") <<  name << _T(" called: ");

            double seconds = total / cpu_frequency();
            double average = seconds / times;

            msg << times << _T(" times total time: ") << seconds << _T(" seconds  ")
                << _T(" (avg ") << average <<_T(")\n");
            OutputDebugString(msg.str().c_str());
        }

        void add_time(__int64 ticks)
        {
            total += ticks;
            ++times;
            in_use = false;
        }

        bool aquire()
        {
            if (in_use)
                return false;
            in_use = true;
            return true;
        }
    };

    class one_time : boost::noncopyable {
        LARGE_INTEGER start;
        time_collector* collector;
    public:
        one_time(time_collector& tc)
        {
            if (tc.aquire()) {
                collector = &tc;
                QueryPerformanceCounter(&start);
            }
            else
                collector = 0;
        }

        ~one_time()
        {
            if (collector) {
                LARGE_INTEGER end;
                QueryPerformanceCounter(&end);
                collector->add_time(end.QuadPart - start.QuadPart);
            }
        }
    };
}

// Usage TIME_THIS_SCOPE(XX); where XX is a C variable name (can begin with a number)
#define TIME_THIS_SCOPE(name) \
    static scope_timer::time_collector st_time_collector_##name(_T(#name)); \
    scope_timer::one_time st_one_time_##name(st_time_collector_##name)

Outras dicas

Eu faço meus perfis criando duas classes: cProfile e cProfileManager.

cProfileManager irá armazenar todos os dados que resultaram de cProfile.

cProfile com os seguintes requisitos:

cProfile tem um construtor que inicializa a hora atual.
cProfile tem um desconstrutor que envia o tempo total que a classe esteve viva para cProfileManager

Para usar essas classes de perfil, primeiro faço uma instância de cProfileManager.Em seguida, coloco o bloco de código que desejo criar o perfil entre chaves.Dentro das chaves, eu crio um cProfile instância.Quando o bloco de código terminar, cProfile enviará o tempo que o bloco de código levou para terminar para cProfileManager.

Código de exemploAqui está um exemplo do código (simplificado):

class cProfile
{
    cProfile()
    {
        TimeStart = GetTime();
    };

    ~cProfile()
    {
        ProfileManager->AddProfile (GetTime() - TimeStart);
    }

    float TimeStart;
}

Usar cProfile, eu faria algo assim:

int main()
{
    printf("Start test");
    {
        cProfile Profile;
        Calculate();
    }
    ProfileManager->OutputData();
}

ou isto:

void foobar()
{
    cProfile ProfileFoobar;

    foo();
    {
        cProfile ProfileBarCheck;
        while (bar())
        {
            cProfile ProfileSpam;
            spam();
        }
    }
}

Nota técnica

Este código é na verdade um abuso da maneira como o escopo, os construtores e os desconstrutores funcionam. C++. cProfile existe apenas dentro do escopo do bloco (o bloco de código que queremos testar).Assim que o programa sair do escopo do bloco, cProfile registra o resultado.

Melhorias Adicionais

Você pode adicionar um parâmetro de string ao construtor para fazer algo assim:cProfile Profile("Perfil para cálculos complicados");
Você pode usar uma macro para deixar o código mais limpo (tenha cuidado para não abusar disso.Ao contrário de nossos outros abusos na linguagem, as macros podem ser perigosas quando usadas).

Exemplo:

#define START_PROFILE cProfile Profile();{#Define end_profile}
cProfileManager pode verificar quantas vezes um bloco de código é chamado.Mas você precisaria de um identificador para o bloco de código.A primeira melhoria pode ajudar a identificar o bloco.Isso pode ser útil nos casos em que o código que você deseja criar o perfil está dentro de um loop (como o segundo exemplo acima).Você também pode adicionar o tempo de execução médio, mais rápido e mais longo que o bloco de código levou.
Não se esqueça de adicionar uma marca para pular a criação de perfil se você estiver no modo de depuração.

Observe que o texto a seguir foi escrito especificamente para Windows.

Também tenho uma classe de timer que escrevi para criar perfis rápidos e sujos que usa QueryPerformanceCounter() para obter tempos de alta precisão, mas com uma pequena diferença.Minha classe de timer não descarta o tempo decorrido quando o objeto Timer fica fora do escopo.Em vez disso, acumula os tempos decorridos em uma coleção.Adicionei uma função de membro estático, Dump(), que cria uma tabela de tempos decorridos, classificados por categoria de tempo (especificada no construtor do Timer como uma string) junto com algumas análises estatísticas, como tempo médio decorrido, desvio padrão, máximo e mínimo.Também adicionei uma função de membro estático Clear() que limpa a coleção e permite começar tudo de novo.

Como usar a classe Timer (psudocódigo):

int CInsertBuffer::Read(char* pBuf)
{
       // TIMER NOTES: Avg Execution Time = ~1 ms
       Timer timer("BufferRead");
       :      :
       return -1;
}

Exemplo de saída:

Timer Precision = 418.0095 ps

=== Item               Trials    Ttl Time  Avg Time  Mean Time StdDev    ===
    AddTrade           500       7 ms      14 us     12 us     24 us
    BufferRead         511       1:19.25   0.16 s    621 ns    2.48 s
    BufferWrite        516       511 us    991 ns    482 ns    11 us
    ImportPos Loop     1002      18.62 s   19 ms     77 us     0.51 s
    ImportPosition     2         18.75 s   9.38 s    16.17 s   13.59 s
    Insert             515       4.26 s    8 ms      5 ms      27 ms
    recv               101       18.54 s   0.18 s    2603 ns   1.63 s

arquivo Temporizador.inl:

#include <map>
#include "x:\utils\stlext\stringext.h"
#include <iterator>
#include <set>
#include <vector>
#include <numeric>
#include "x:\utils\stlext\algorithmext.h"
#include <math.h>

    class Timer
    {
    public:
        Timer(const char* name)
        {
            label = std::safe_string(name);
            QueryPerformanceCounter(&startTime);
        }

        virtual ~Timer()
        {
            QueryPerformanceCounter(&stopTime);
            __int64 clocks = stopTime.QuadPart-startTime.QuadPart;
            double elapsed = (double)clocks/(double)TimerFreq();
            TimeMap().insert(std::make_pair(label,elapsed));
        };

        static std::string Dump(bool ClipboardAlso=true)
        {
            static const std::string loc = "Timer::Dump";

            if( TimeMap().empty() )
            {
                return "No trials\r\n";
            }

            std::string ret = std::formatstr("\r\n\r\nTimer Precision = %s\r\n\r\n", format_elapsed(1.0/(double)TimerFreq()).c_str());

            // get a list of keys
            typedef std::set<std::string> keyset;
            keyset keys;
            std::transform(TimeMap().begin(), TimeMap().end(), std::inserter(keys, keys.begin()), extract_key());

            size_t maxrows = 0;

            typedef std::vector<std::string> strings;
            strings lines;

            static const size_t tabWidth = 9;

            std::string head = std::formatstr("=== %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s ===", tabWidth*2, tabWidth*2, "Item", tabWidth, tabWidth, "Trials", tabWidth, tabWidth, "Ttl Time", tabWidth, tabWidth, "Avg Time", tabWidth, tabWidth, "Mean Time", tabWidth, tabWidth, "StdDev");
            ret += std::formatstr("\r\n%s\r\n", head.c_str());
            if( ClipboardAlso ) 
                lines.push_back("Item\tTrials\tTtl Time\tAvg Time\tMean Time\tStdDev\r\n");
            // dump the values for each key
            {for( keyset::iterator key = keys.begin(); keys.end() != key; ++key )
            {
                time_type ttl = 0;
                ttl = std::accumulate(TimeMap().begin(), TimeMap().end(), ttl, accum_key(*key));
                size_t num = std::count_if( TimeMap().begin(), TimeMap().end(), match_key(*key));
                if( num > maxrows ) 
                    maxrows = num;
                time_type avg = ttl / num;

                // compute mean
                std::vector<time_type> sortedTimes;
                std::transform_if(TimeMap().begin(), TimeMap().end(), std::inserter(sortedTimes, sortedTimes.begin()), extract_val(), match_key(*key));
                std::sort(sortedTimes.begin(), sortedTimes.end());
                size_t mid = (size_t)floor((double)num/2.0);
                double mean = ( num > 1 && (num % 2) != 0 ) ? (sortedTimes[mid]+sortedTimes[mid+1])/2.0 : sortedTimes[mid];
                // compute variance
                double sum = 0.0;
                if( num > 1 )
                {
                    for( std::vector<time_type>::iterator timeIt = sortedTimes.begin(); sortedTimes.end() != timeIt; ++timeIt )
                        sum += pow(*timeIt-mean,2.0);
                }
                // compute std dev
                double stddev = num > 1 ? sqrt(sum/((double)num-1.0)) : 0.0;

                ret += std::formatstr("    %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s\r\n", tabWidth*2, tabWidth*2, key->c_str(), tabWidth, tabWidth, std::formatstr("%d",num).c_str(), tabWidth, tabWidth, format_elapsed(ttl).c_str(), tabWidth, tabWidth, format_elapsed(avg).c_str(), tabWidth, tabWidth, format_elapsed(mean).c_str(), tabWidth, tabWidth, format_elapsed(stddev).c_str()); 
                if( ClipboardAlso )
                    lines.push_back(std::formatstr("%s\t%s\t%s\t%s\t%s\t%s\r\n", key->c_str(), std::formatstr("%d",num).c_str(), format_elapsed(ttl).c_str(), format_elapsed(avg).c_str(), format_elapsed(mean).c_str(), format_elapsed(stddev).c_str())); 

            }
            }
            ret += std::formatstr("%s\r\n", std::string(head.length(),'=').c_str());

            if( ClipboardAlso )
            {
                // dump header row of data block
                lines.push_back("");
                {
                    std::string s;
                    for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                    {
                        if( key != keys.begin() )
                            s.append("\t");
                        s.append(*key);
                    }
                    s.append("\r\n");
                    lines.push_back(s);
                }

                // blow out the flat map of time values to a seperate vector of times for each key
                typedef std::map<std::string, std::vector<time_type> > nodematrix;
                nodematrix nodes;
                for( Times::iterator time = TimeMap().begin(); time != TimeMap().end(); ++time )
                    nodes[time->first].push_back(time->second);

                // dump each data point
                for( size_t row = 0; row < maxrows; ++row )
                {
                    std::string rowDump;
                    for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                    {
                        if( key != keys.begin() )
                            rowDump.append("\t");
                        if( nodes[*key].size() > row )
                            rowDump.append(std::formatstr("%f", nodes[*key][row]));
                    }
                    rowDump.append("\r\n");
                    lines.push_back(rowDump);
                }

                // dump to the clipboard
                std::string dump;
                for( strings::iterator s = lines.begin(); s != lines.end(); ++s )
                {
                    dump.append(*s);
                }

                OpenClipboard(0);
                EmptyClipboard();
                HGLOBAL hg = GlobalAlloc(GMEM_MOVEABLE, dump.length()+1);
                if( hg != 0 )
                {
                    char* buf = (char*)GlobalLock(hg);
                    if( buf != 0 )
                    {
                        std::copy(dump.begin(), dump.end(), buf);
                        buf[dump.length()] = 0;
                        GlobalUnlock(hg);
                        SetClipboardData(CF_TEXT, hg);
                    }
                }
                CloseClipboard();
            }

            return ret;
        }

        static void Reset()
        {
            TimeMap().clear();
        }

        static std::string format_elapsed(double d) 
        {
            if( d < 0.00000001 )
            {
                // show in ps with 4 digits
                return std::formatstr("%0.4f ps", d * 1000000000000.0);
            }
            if( d < 0.00001 )
            {
                // show in ns
                return std::formatstr("%0.0f ns", d * 1000000000.0);
            }
            if( d < 0.001 )
            {
                // show in us
                return std::formatstr("%0.0f us", d * 1000000.0);
            }
            if( d < 0.1 )
            {
                // show in ms
                return std::formatstr("%0.0f ms", d * 1000.0);
            }
            if( d <= 60.0 )
            {
                // show in seconds
                return std::formatstr("%0.2f s", d);
            }
            if( d < 3600.0 )
            {
                // show in min:sec
                return std::formatstr("%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0));
            }
            // show in h:min:sec
            return std::formatstr("%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0));
        }

    private:
        static __int64 TimerFreq()
        {
            static __int64 freq = 0;
            static bool init = false;
            if( !init )
            {
                LARGE_INTEGER li;
                QueryPerformanceFrequency(&li);
                freq = li.QuadPart;
                init = true;
            }
            return freq;
        }
        LARGE_INTEGER startTime, stopTime;
        std::string label;

        typedef std::string key_type;
        typedef double time_type;
        typedef std::multimap<key_type, time_type> Times;
//      static Times times;
        static Times& TimeMap()
        {
            static Times times_;
            return times_;
        }

        struct extract_key : public std::unary_function<Times::value_type, key_type>
        {
            std::string operator()(Times::value_type const & r) const
            {
                return r.first;
            }
        };

        struct extract_val : public std::unary_function<Times::value_type, time_type>
        {
            time_type operator()(Times::value_type const & r) const
            {
                return r.second;
            }
        };
        struct match_key : public std::unary_function<Times::value_type, bool>
        {
            match_key(key_type const & key_) : key(key_) {};
            bool operator()(Times::value_type const & rhs) const
            {
                return key == rhs.first;
            }
        private:
            match_key& operator=(match_key&) { return * this; }
            const key_type key;
        };

        struct accum_key : public std::binary_function<time_type, Times::value_type, time_type>
        {
            accum_key(key_type const & key_) : key(key_), n(0) {};
            time_type operator()(time_type const & v, Times::value_type const & rhs) const
            {
                if( key == rhs.first )
                {
                    ++n;
                    return rhs.second + v;
                }
                return v;
            }
        private:
            accum_key& operator=(accum_key&) { return * this; }
            const Times::key_type key;
            mutable size_t n;
        };
    };

arquivo stringext.h (fornece a função formatstr()):

namespace std
{
    /*  ---

    Formatted Print

        template<class C>
        int strprintf(basic_string<C>* pString, const C* pFmt, ...);

        template<class C>
        int vstrprintf(basic_string<C>* pString, const C* pFmt, va_list args);

    Returns :

        # characters printed to output


    Effects :

        Writes formatted data to a string.  strprintf() works exactly the same as sprintf(); see your
        documentation for sprintf() for details of peration.  vstrprintf() also works the same as sprintf(), 
        but instead of accepting a variable paramater list it accepts a va_list argument.

    Requires :

        pString is a pointer to a basic_string<>

    --- */

    template<class char_type> int vprintf_generic(char_type* buffer, size_t bufferSize, const char_type* format, va_list argptr);

    template<> inline int vprintf_generic<char>(char* buffer, size_t bufferSize, const char* format, va_list argptr)
    {
#       ifdef SECURE_VSPRINTF
        return _vsnprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
#       else
        return _vsnprintf(buffer, bufferSize-1, format, argptr);
#       endif
    }

    template<> inline int vprintf_generic<wchar_t>(wchar_t* buffer, size_t bufferSize, const wchar_t* format, va_list argptr)
    {
#       ifdef SECURE_VSPRINTF
        return _vsnwprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
#       else
        return _vsnwprintf(buffer, bufferSize-1, format, argptr);
#       endif
    }

    template<class Type, class Traits>
    inline int vstringprintf(basic_string<Type,Traits> & outStr, const Type* format, va_list args)
    {
        // prologue
        static const size_t ChunkSize = 1024;
        size_t curBufSize = 0;
        outStr.erase(); 

        if( !format )
        {
            return 0;
        }

        // keep trying to write the string to an ever-increasing buffer until
        // either we get the string written or we run out of memory
        while( bool cont = true )
        {
            // allocate a local buffer
            curBufSize += ChunkSize;
            std::ref_ptr<Type> localBuffer = new Type[curBufSize];
            if( localBuffer.get() == 0 )
            {
                // we ran out of memory -- nice goin'!
                return -1;
            }
            // format output to local buffer
            int i = vprintf_generic(localBuffer.get(), curBufSize * sizeof(Type), format, args);
            if( -1 == i )
            {
                // the buffer wasn't big enough -- try again
                continue;
            }
            else if( i < 0 )
            {
                // something wierd happened -- bail
                return i;
            }
            // if we get to this point the string was written completely -- stop looping
            outStr.assign(localBuffer.get(),i);
            return i;
        }
        // unreachable code
        return -1;
    };

    // provided for backward-compatibility
    template<class Type, class Traits>
    inline int vstrprintf(basic_string<Type,Traits> * outStr, const Type* format, va_list args)
    {
        return vstringprintf(*outStr, format, args);
    }

    template<class Char, class Traits>
    inline int stringprintf(std::basic_string<Char, Traits> & outString, const Char* format, ...)
    {
        va_list args;
        va_start(args, format);
        int retval = vstringprintf(outString, format, args);
        va_end(args);
        return retval;
    }

    // old function provided for backward-compatibility
    template<class Char, class Traits>
    inline int strprintf(std::basic_string<Char, Traits> * outString, const Char* format, ...)
    {
        va_list args;
        va_start(args, format);
        int retval = vstringprintf(*outString, format, args);
        va_end(args);
        return retval;
    }

    /*  ---

    Inline Formatted Print

        string strprintf(const char* Format, ...);

    Returns :

        Formatted string


    Effects :

        Writes formatted data to a string.  formatstr() works the same as sprintf(); see your
        documentation for sprintf() for details of operation.  

    --- */

    template<class Char>
    inline std::basic_string<Char> formatstr(const Char * format, ...)
    {
        std::string outString;

        va_list args;
        va_start(args, format);
        vstringprintf(outString, format, args);
        va_end(args);
        return outString;
    }
};

Arquivo algoritmoext.h (fornece a função transform_if()):

/*  ---

Transform
25.2.3

    template<class InputIterator, class OutputIterator, class UnaryOperation, class Predicate>
        OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op, Predicate pred)

    template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
        OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op, Predicate pred)

Requires:   

    T is of type EqualityComparable (20.1.1) 
    op and binary_op have no side effects

Effects :

    Assigns through every iterator i in the range [result, result + (last1-first1)) a new corresponding value equal to one of:
        1:  op( *(first1 + (i - result)) 
        2:  binary_op( *(first1 + (i - result), *(first2 + (i - result))

Returns :

    result + (last1 - first1)

Complexity :

    At most last1 - first1 applications of op or binary_op

--- */

template<class InputIterator, class OutputIterator, class UnaryFunction, class Predicate>
OutputIterator transform_if(InputIterator first, 
                            InputIterator last, 
                            OutputIterator result, 
                            UnaryFunction f, 
                            Predicate pred)
{
    for (; first != last; ++first)
    {
        if( pred(*first) )
            *result++ = f(*first);
    }
    return result; 
}

template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
OutputIterator transform_if(InputIterator1 first1, 
                            InputIterator1 last1, 
                            InputIterator2 first2, 
                            OutputIterator result, 
                            BinaryOperation binary_op, 
                            Predicate pred)
{
    for (; first1 != last1 ; ++first1, ++first2)
    {
        if( pred(*first1) )
            *result++ = binary_op(*first1,*first2);
    }
    return result;
}

Bem, eu tenho dois trechos de código.Em pseudo-código eles estão parecendo (é uma versão simplificada, estou usando ConsultaPerformanceFrequência na verdade):

Primeiro trecho:

Timer timer = new Timer
timer.Start

Segundo trecho:

timer.Stop
show elapsed time

Um pouco de kung fu com teclas de atalho e posso dizer quanto tempo esse trecho de código roubou da minha CPU.

O artigo Perfilador de código e otimizações tem muitas informações sobre criação de perfil de código C++ e também possui um link de download gratuito para um programa/classe que mostrará uma apresentação gráfica para diferentes caminhos/métodos de código.

Eu tenho uma classe de criação de perfil rápida e suja que pode ser usada na criação de perfil até mesmo nos loops internos mais estreitos.A ênfase está no peso extremamente leve e no código simples.A classe aloca um array bidimensional de tamanho fixo.Em seguida, adiciono chamadas de "ponto de verificação" em todos os lugares.Quando o ponto de verificação N é alcançado imediatamente após o ponto de verificação M, adiciono o tempo decorrido (em microssegundos) ao item da matriz [M,N].Como isso foi projetado para criar perfis de loops estreitos, também tenho uma chamada de "início da iteração" que redefine a variável "último ponto de verificação".Ao final do teste, o dumpResults() call produz a lista de todos os pares de pontos de verificação que se seguiram, juntamente com o tempo total contabilizado e não contabilizado.

Eu escrevi uma classe simples de plataforma cruzada chamada nanotímero por esta razão.O objetivo era ser o mais leve possível para não interferir no desempenho real do código, adicionando muitas instruções e, assim, influenciando o cache de instruções.É capaz de obter precisão de microssegundos em Windows, Mac e Linux (e provavelmente em algumas variantes Unix).

Uso básico:

plf::timer t;
timer.start();

// stuff

double elapsed = t.get_elapsed_ns(); // Get nanoseconds

start() também reinicia o cronômetro quando necessário.A "pausa" do cronômetro pode ser obtida armazenando o tempo decorrido e, em seguida, reiniciando o cronômetro ao "retomar" e adicionando ao resultado armazenado na próxima vez que você verificar o tempo decorrido.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow