Boost::multi_array 성능 질문

https://stackoverflow.com/questions/446866

22-07-2019
|

문제

다음 테스트 프로그램을 사용하여 Boost::multi_array의 성능을 기본 동적 할당 배열과 비교하려고 합니다.

#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS 
#include <boost/multi_array.hpp>

int main(int argc, char* argv[])
{
    const int X_SIZE = 200;
    const int Y_SIZE = 200;
    const int ITERATIONS = 500;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array
    typedef boost::multi_array<double, 2> ImageArrayType;
    ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);

    // Create the native array
    double *nativeMatrix = new double [X_SIZE * Y_SIZE];

    //------------------Measure boost----------------------------------------------
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                boostMatrix[x][y] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);

    //------------------Measure native-----------------------------------------------
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                nativeMatrix[x + (y * X_SIZE)] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);

    return 0;
}

나는 다음과 같은 결과를 얻습니다.

[Boost] Elapsed time: 12.500 seconds
[Native]Elapsed time:  0.062 seconds

나는 multi_arrays가 훨씬 느리다는 것을 믿을 수 없습니다.내가 뭘 잘못하고 있는지 알아낼 수 있는 사람이 있나요?

나는 메모리에 쓰기를 하고 있기 때문에 캐싱은 문제가 되지 않는다고 가정합니다.

편집하다:이것은 디버그 빌드였습니다.Laserallan의 제안에 따라 릴리스 빌드를 수행했습니다.

[Boost] Elapsed time:  0.266 seconds
[Native]Elapsed time:  0.016 seconds

훨씬 더 가깝습니다.하지만 16 대 1은 여전히 나에게 높은 것 같습니다.

글쎄, 확실한 대답은 없지만 지금은 실제 코드를 기본 배열로 남겨 두겠습니다.

내 테스트에서 가장 큰 결함이기 때문에 Laserallan의 답변을 수락합니다.

모두에게 감사드립니다.

해결책

릴리스를 빌드 중인가요, 아니면 디버그 중인가요?

디버그 모드에서 실행하는 경우 템플릿 매직이 제대로 인라인되지 않아 함수 호출에 많은 오버헤드가 발생하므로 부스트 배열이 정말 느려질 수 있습니다.다중 배열이 어떻게 구현되는지 잘 모르겠으므로 완전히 꺼져 있을 수도 있습니다. :)

아마도 저장 순서에도 약간의 차이가 있으므로 이미지를 열별로 저장하고 행별로 쓸 수도 있습니다.이로 인해 캐시 동작이 좋지 않아 작업 속도가 느려질 수 있습니다.

X와 Y 루프의 순서를 바꿔보고 어떤 결과가 나오는지 확인하세요.여기에 스토리지 주문에 대한 몇 가지 정보가 있습니다.http://www.boost.org/doc/libs/1_37_0/libs/multi_array/doc/user.html

편집하다:이미지 처리를 위해 2차원 배열을 사용하는 것 같으므로 Boosts 이미지 처리 라이브러리를 확인해 보는 것이 좋습니다. 길.

귀하의 상황에 완벽하게 작동하는 오버헤드가 적은 배열이 있을 수 있습니다.

다른 팁

내 컴퓨터에서

g++ -O3 -march=native -mtune=native --fast-math -DNDEBUG test.cpp -o test && ./test

나는 얻다

[Boost] Elapsed time:  0.020 seconds
[Native]Elapsed time:  0.020 seconds

그러나 변화하는 const int ITERATIONS 에게 5000 나는 얻다

[Boost] Elapsed time:  0.240 seconds
[Native]Elapsed time:  0.180 seconds

그럼 ITERATIONS 돌아가다 500 하지만 X_SIZE 그리고 Y_SIZE 로 설정 400 훨씬 더 중요한 차이를 얻습니다

[Boost] Elapsed time:  0.460 seconds
[Native]Elapsed time:  0.070 seconds

마지막으로 내부 루프를 반전시킵니다. [Boost] 케이스는 그렇게 생겼어

    for (int x = 0; x < X_SIZE; ++x)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {

그리고 유지 ITERATIONS, X_SIZE 그리고 Y_SIZE 에게 500, 400 그리고 400 나는 얻다

[Boost] Elapsed time:  0.060 seconds
[Native]Elapsed time:  0.080 seconds

내부 루프를 반전시키면 [Native] 경우(그래서 해당 경우의 순서가 잘못되었습니다), 당연히,

[Boost] Elapsed time:  0.070 seconds
[Native]Elapsed time:  0.450 seconds

나는 사용하고있다 gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 우분투 10.10에서

결론적으로:

와 함께 적절한 최적화 Boost::multi_array가 예상대로 작동합니다.
데이터에 액세스하는 순서 중요하다

테스트에 결함이 있습니다.

DEBUG 빌드에서 Boost::MultiArray에는 꼭 필요한 최적화 패스가 부족합니다.(네이티브 배열보다 훨씬 더 많습니다)
RELEASE 빌드에서 컴파일러는 완전히 제거할 수 있는 코드를 찾고 대부분의 코드가 해당 카테고리에 있습니다.

당신이 보게 될 것은 최적화 컴파일러가 "네이티브 배열" 루프의 대부분 또는 전부를 제거할 수 있다는 결과입니다.Boost::MultiArray 루프의 경우에도 이론적으로는 마찬가지이지만 MultiArray는 아마도 최적화 프로그램을 무력화할 만큼 충분히 복잡할 것입니다.

테스트베드에 작은 변화를 주세요 더욱 실제와 같은 결과를 확인할 수 있습니다."의 두 발생을 모두 변경합니다.= 2.345 " 와 함께 "*= 2.345 "를 선택하고 최적화를 통해 다시 컴파일하세요.이렇게 하면 컴파일러가 각 테스트의 외부 루프가 중복된다는 사실을 발견하지 못하게 됩니다.

그렇게 해서 2:1에 가까운 속도 비교를 얻었습니다.

두 가지가 궁금합니다.

1) 경계 확인:애플리케이션에 multi_array.hpp를 포함하기 전에 BOOST_DISABLE_ASSERTS 전처리기 매크로를 정의하십시오.그러면 바운드 검사가 꺼집니다.NDEBUG가 있을 때 이것이 비활성화되는지 확실하지 않습니다.

2) 기본 지수:MultiArray는 0이 아닌 베이스의 배열을 인덱싱할 수 있습니다.이는 multi_array가 (각 차원에) 기본 숫자를 저장하고 더 복잡한 공식을 사용하여 메모리의 정확한 위치를 얻는다는 것을 의미합니다. 그게 전부인지 궁금합니다.

그렇지 않으면 다중 배열이 C 배열보다 느린 이유를 이해할 수 없습니다.

대신 Blitz++를 사용해 보세요.Blitz를 사용해 보았는데 그 성능은 C 스타일 배열과 동등합니다!

아래에 Blitz가 추가된 코드를 확인하세요.

#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS 
#include <boost/multi_array.hpp>
#include <blitz/array.h>

int main(int argc, char* argv[])
{
    const int X_SIZE = 200;
    const int Y_SIZE = 200;
    const int ITERATIONS = 500;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array
    typedef boost::multi_array<double, 2> ImageArrayType;
    ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);


    //------------------Measure boost----------------------------------------------
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                boostMatrix[x][y] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);

    //------------------Measure blitz-----------------------------------------------
    blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                blitzArray(x,y) = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Blitz] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);


    //------------------Measure native-----------------------------------------------
    // Create the native array
    double *nativeMatrix = new double [X_SIZE * Y_SIZE];

    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                nativeMatrix[x + (y * X_SIZE)] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);



    return 0;
}

디버그 및 릴리스 결과는 다음과 같습니다.

디버그:

Boost  2.093 secs 
Blitz  0.375 secs 
Native 0.078 secs

풀어 주다:

Boost  0.266 secs
Blitz  0.016 secs
Native 0.015 secs

이를 위해 MSVC 2008 SP1 컴파일러를 사용했습니다.

이제 C-stlye 어레이에 작별 인사를 할 수 있습니까?=p

같은 질문이 있어서 이 질문을 보고 있었습니다.좀 더 엄격한 테스트를 해보고 싶다는 생각이 들었습니다.

처럼 로드리고 지적했듯이 원래 첨부한 코드의 결과가 잘못된 데이터를 제공하는 루프 순서에 결함이 있습니다.
또한 상수를 사용하여 설정되는 다소 작은 크기의 배열이 있습니다.실제로 컴파일러는 배열의 크기를 알 수 없지만 컴파일러는 루프를 최적화할 수 있습니다.만약을 대비해 배열의 크기와 반복 횟수는 런타임 입력이어야 합니다.

Mac에서는 다음 코드가 보다 의미 있는 답변을 제공하도록 구성됩니다.여기에는 4가지 테스트가 있습니다.

#define BOOST_DISABLE_ASSERTS
#include "boost/multi_array.hpp"
#include <sys/time.h>
#include <stdint.h>
#include<string>

uint64_t GetTimeMs64()
{
  struct timeval tv;

  gettimeofday( &tv, NULL );

  uint64_t ret = tv.tv_usec;
  /* Convert from micro seconds (10^-6) to milliseconds (10^-3) */
  ret /= 1000;

  /* Adds the seconds (10^0) after converting them to milliseconds (10^-3) */
  ret += ( tv.tv_sec * 1000 );

  return ret;

}


void function1( const int X_SIZE, const int Y_SIZE, const int ITERATIONS )
{

  double nativeMatrix1add[X_SIZE*Y_SIZE];

  for( int x = 0 ; x < X_SIZE ; ++x )
  {
    for( int y = 0 ; y < Y_SIZE ; ++y )
    {
      nativeMatrix1add[y + ( x * Y_SIZE )] = rand();
    }
  }

  // Create the native array
  double* __restrict const nativeMatrix1p = new double[X_SIZE * Y_SIZE];
  uint64_t startTime = GetTimeMs64();
  for( int i = 0 ; i < ITERATIONS ; ++i )
  {
    for( int xy = 0 ; xy < X_SIZE*Y_SIZE ; ++xy )
    {
      nativeMatrix1p[xy] += nativeMatrix1add[xy];
    }
  }
  uint64_t endTime = GetTimeMs64();
  printf( "[Native Pointer]    Elapsed time: %6.3f seconds\n", ( endTime - startTime ) / 1000.0 );

}

void function2( const int X_SIZE, const int Y_SIZE, const int ITERATIONS )
{

  double nativeMatrix1add[X_SIZE*Y_SIZE];

  for( int x = 0 ; x < X_SIZE ; ++x )
  {
    for( int y = 0 ; y < Y_SIZE ; ++y )
    {
      nativeMatrix1add[y + ( x * Y_SIZE )] = rand();
    }
  }

  // Create the native array
  double* __restrict const nativeMatrix1 = new double[X_SIZE * Y_SIZE];
  uint64_t startTime = GetTimeMs64();
  for( int i = 0 ; i < ITERATIONS ; ++i )
  {
    for( int x = 0 ; x < X_SIZE ; ++x )
    {
      for( int y = 0 ; y < Y_SIZE ; ++y )
      {
        nativeMatrix1[y + ( x * Y_SIZE )] += nativeMatrix1add[y + ( x * Y_SIZE )];
      }
    }
  }
  uint64_t endTime = GetTimeMs64();
  printf( "[Native 1D Array]   Elapsed time: %6.3f seconds\n", ( endTime - startTime ) / 1000.0 );

}


void function3( const int X_SIZE, const int Y_SIZE, const int ITERATIONS )
{

  double nativeMatrix2add[X_SIZE][Y_SIZE];

  for( int x = 0 ; x < X_SIZE ; ++x )
  {
    for( int y = 0 ; y < Y_SIZE ; ++y )
    {
      nativeMatrix2add[x][y] = rand();
    }
  }

  // Create the native array
  double nativeMatrix2[X_SIZE][Y_SIZE];
  uint64_t startTime = GetTimeMs64();
  for( int i = 0 ; i < ITERATIONS ; ++i )
  {
    for( int x = 0 ; x < X_SIZE ; ++x )
    {
      for( int y = 0 ; y < Y_SIZE ; ++y )
      {
        nativeMatrix2[x][y] += nativeMatrix2add[x][y];
      }
    }
  }
  uint64_t endTime = GetTimeMs64();
  printf( "[Native 2D Array]   Elapsed time: %6.3f seconds\n", ( endTime - startTime ) / 1000.0 );

}



void function4( const int X_SIZE, const int Y_SIZE, const int ITERATIONS )
{

  boost::multi_array<double, 2> boostMatrix2add( boost::extents[X_SIZE][Y_SIZE] );

  for( int x = 0 ; x < X_SIZE ; ++x )
  {
    for( int y = 0 ; y < Y_SIZE ; ++y )
    {
      boostMatrix2add[x][y] = rand();
    }
  }

  // Create the native array
  boost::multi_array<double, 2> boostMatrix( boost::extents[X_SIZE][Y_SIZE] );
  uint64_t startTime = GetTimeMs64();
  for( int i = 0 ; i < ITERATIONS ; ++i )
  {
    for( int x = 0 ; x < X_SIZE ; ++x )
    {
      for( int y = 0 ; y < Y_SIZE ; ++y )
      {
        boostMatrix[x][y] += boostMatrix2add[x][y];
      }
    }
  }
  uint64_t endTime = GetTimeMs64();
  printf( "[Boost Array]       Elapsed time: %6.3f seconds\n", ( endTime - startTime ) / 1000.0 );

}

int main( int argc, char* argv[] )
{

  srand( time( NULL ) );

  const int X_SIZE = std::stoi( argv[1] );
  const int Y_SIZE = std::stoi( argv[2] );
  const int ITERATIONS = std::stoi( argv[3] );

  function1( X_SIZE, Y_SIZE, ITERATIONS );
  function2( X_SIZE, Y_SIZE, ITERATIONS );
  function3( X_SIZE, Y_SIZE, ITERATIONS );
  function4( X_SIZE, Y_SIZE, ITERATIONS );

  return 0;
}

정수 수학 및 이중 루프와 함께 []를 사용하는 단일 차원 배열이 있는 것
포인터 증가를 사용하는 동일한 1차원 배열을 가진 것
다차원 C 배열
부스트 multi_array

따라서 명령줄에서 실행하고

./test_array xsize ysize iterations"

이러한 접근 방식이 어떻게 수행될 것인지에 대한 좋은 아이디어를 얻을 수 있습니다.다음은 다음 컴파일러 플래그로 얻은 것입니다.

g++4.9.2 -O3 -march=native -funroll-loops -mno-avx --fast-math -DNDEBUG  -c -std=c++11


./test_array 51200 1 20000
[Native 1-Loop ]    Elapsed time:  0.537 seconds
[Native 1D Array]   Elapsed time:  2.045 seconds
[Native 2D Array]   Elapsed time:  2.749 seconds
[Boost Array]       Elapsed time:  1.167 seconds

./test_array 25600 2 20000
[Native 1-Loop ]    Elapsed time:  0.531 seconds
[Native 1D Array]   Elapsed time:  1.241 seconds
[Native 2D Array]   Elapsed time:  1.631 seconds
[Boost Array]       Elapsed time:  0.954 seconds

./test_array 12800 4 20000
[Native 1-Loop ]    Elapsed time:  0.536 seconds
[Native 1D Array]   Elapsed time:  1.214 seconds
[Native 2D Array]   Elapsed time:  1.223 seconds
[Boost Array]       Elapsed time:  0.798 seconds

./test_array 6400 8 20000
[Native 1-Loop ]    Elapsed time:  0.540 seconds
[Native 1D Array]   Elapsed time:  0.845 seconds
[Native 2D Array]   Elapsed time:  0.878 seconds
[Boost Array]       Elapsed time:  0.803 seconds

./test_array 3200 16 20000
[Native 1-Loop ]    Elapsed time:  0.537 seconds
[Native 1D Array]   Elapsed time:  0.661 seconds
[Native 2D Array]   Elapsed time:  0.673 seconds
[Boost Array]       Elapsed time:  0.708 seconds

./test_array 1600 32 20000
[Native 1-Loop ]    Elapsed time:  0.532 seconds
[Native 1D Array]   Elapsed time:  0.592 seconds
[Native 2D Array]   Elapsed time:  0.596 seconds
[Boost Array]       Elapsed time:  0.764 seconds

./test_array 800 64 20000
[Native 1-Loop ]    Elapsed time:  0.546 seconds
[Native 1D Array]   Elapsed time:  0.594 seconds
[Native 2D Array]   Elapsed time:  0.606 seconds
[Boost Array]       Elapsed time:  0.764 seconds

./test_array 400 128 20000
[Native 1-Loop ]    Elapsed time:  0.536 seconds
[Native 1D Array]   Elapsed time:  0.560 seconds
[Native 2D Array]   Elapsed time:  0.564 seconds
[Boost Array]       Elapsed time:  0.746 seconds

따라서 Boost multi_array의 성능이 꽤 좋다고 해도 무방할 것 같습니다.단일 루프 평가를 능가하는 것은 없지만 배열의 차원에 따라 Boost::multi_array는 이중 루프를 사용하는 표준 c-배열을 능가할 수 있습니다.

시도해 볼 또 다른 방법은 부스트 배열에 직선 인덱스 대신 반복자를 사용하는 것입니다.

나는 다중 어레이가 그만큼 효율적일 것이라고 기대했을 것입니다.하지만 gcc를 사용하는 PPC Mac에서도 비슷한 결과를 얻고 있습니다.또한 multiarrayref를 시도하여 두 버전 모두 차이 없이 동일한 저장소를 사용했습니다.내 코드 중 일부에서 다중 배열을 사용하고 수동 코딩과 유사하다고 가정했기 때문에 이것은 알아두면 좋습니다.

문제가 무엇인지 알 것 같아요...아마도요.

부스트 구현이 다음과 같은 구문을 갖도록 하려면:행렬[x][y].이는 행렬[x]가 1D 배열처럼 작동하는 객체에 대한 참조를 반환해야 함을 의미합니다. 열, 이 시점에서 reference[y]는 요소를 제공합니다.

여기서 문제는 행 전공 순서(네이티브 배열은 행 주요 IIRC이므로 c/C++에서 일반적입니다.이 경우 컴파일러는 각 y에 대해 행렬[x]를 다시 실행해야 합니다.부스트 행렬을 사용할 때 열 주요 순서로 반복하면 더 나은 성능을 볼 수 있습니다.

단지 이론일 뿐입니다.

편집하다:내 리눅스 시스템에서 (일부 사소한 변경 포함) 내 이론을 테스트하고 보여주었습니다. 일부 x와 y를 전환하여 성능을 향상했지만 여전히 기본 배열보다 느렸습니다.이는 임시 참조 유형을 최적화할 수 없는 컴파일러의 단순한 문제일 수 있습니다.

릴리스 모드에서 빌드하고, objdump를 사용하고, 어셈블리를 살펴보세요.그들은 완전히 다른 작업을 수행할 수 있으며 컴파일러가 어떤 최적화를 사용하고 있는지 확인할 수 있습니다.

비슷한 질문이 여기에서 제기되고 답변되었습니다.

http://www.codeguru.com/forum/archive/index.php/t-300014.html

짧은 대답은 컴파일러가 간단한 배열을 최적화하는 것이 가장 쉽고 Boost 버전을 최적화하는 것은 쉽지 않다는 것입니다.따라서 특정 컴파일러는 Boost 버전에 동일한 최적화 이점을 모두 제공하지 못할 수 있습니다.

컴파일러는 최적화 수준과 최적화 수준이 다를 수 있습니다.얼마나 보수적일지(예:템플릿 코드 또는 기타 합병증이 있는 경우).

Snow Leopard Mac OS에서 다음을 사용하여 테스트했습니다. gcc 4.2.1

Debug:
[Boost] Elapsed time:  2.268 seconds
[Native]Elapsed time:  0.076 seconds

Release:
[Boost] Elapsed time:  0.065 seconds
[Native]Elapsed time:  0.020 seconds

다음은 코드입니다(Unix에서 컴파일할 수 있도록 수정됨).

#define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>
#include <ctime>

int main(int argc, char* argv[])
{
    const int X_SIZE = 200;
    const int Y_SIZE = 200;
    const int ITERATIONS = 500;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array
    typedef boost::multi_array<double, 2> ImageArrayType;
    ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);

    // Create the native array
    double *nativeMatrix = new double [X_SIZE * Y_SIZE];

    //------------------Measure boost----------------------------------------------
    startTime = clock();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                boostMatrix[x][y] = 2.345;
            }
        }
    }
    endTime = clock();
    printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / (double)CLOCKS_PER_SEC);

    //------------------Measure native-----------------------------------------------
    startTime = clock();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                nativeMatrix[x + (y * X_SIZE)] = 2.345;
            }
        }
    }
    endTime = clock();
    printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / (double)CLOCKS_PER_SEC);

    return 0;
}

g++ 4.8.2에서 생성된 어셈블리를 보면 -O3 -DBOOST_DISABLE_ASSERTS 그리고 두 가지를 모두 사용하여 operator() 그리고 [][] 요소에 액세스하는 방법을 살펴보면 기본 배열 및 수동 인덱스 계산과 비교할 때 유일한 추가 작업은 기본을 추가하는 것임이 분명합니다.하지만 나는 이것의 비용을 측정하지 않았습니다.

Visual Studio 2008 v9.0.21022에서 위 코드를 수정하고 C 및 C++용 Numerical Recipe 루틴의 컨테이너 루틴을 적용했습니다.

http://www.nrbook.com/nr3/ 라이센스가 부여된 루틴 dmatrix 및 MatDoub를 각각 사용

dmatrix는 오래된 구문 malloc 연산자를 사용하므로 권장되지 않습니다.MatDoub은 New 명령을 사용합니다.

초 단위의 속도는 릴리스 버전에 있습니다.

후원:0.437

토종의:0.032

수치적 레시피 C:0.031

수치 레시피 C++:0.031

따라서 위의 블리츠는 최고의 무료 대안처럼 보입니다.

최적화를 켠 상태("모든 적합한" 함수 및 "빠른 코드 선호"와 함께 "속도 최대화")를 VC++ 2010에서 약간 수정하여 코드를 컴파일했는데 시간은 0.015/0.391입니다.어셈블리 목록을 생성했는데, 비록 제가 끔찍한 어셈블리 초보이긴 하지만, 부스트 측정 루프 내부에 나에게는 좋지 않은 한 줄이 있습니다.

call    ??A?$multi_array_ref@N$01@boost@@QAE?AV?$sub_array@N$00@multi_array@detail@1@H@Z ; boost::multi_array_ref<double,2>::operator[]

[] 연산자 중 하나가 인라인되지 않았습니다!호출된 프로시저는 또 다른 호출을 합니다. 이번에는 multi_array::value_accessor_n<...>::access<...>():

call    ??$access@V?$sub_array@N$00@multi_array@detail@boost@@PAN@?$value_accessor_n@N$01@multi_array@detail@boost@@IBE?AV?$sub_array@N$00@123@U?$type@V?$sub_array@N$00@multi_array@detail@boost@@@3@HPANPBIPBH3@Z ; boost::detail::multi_array::value_accessor_n<double,2>::access<boost::detail::multi_array::sub_array<double,1>,double *>

전체적으로 두 프로시저는 단순히 배열의 단일 요소에 액세스하기 위한 코드가 상당히 많습니다.내 일반적인 인상은 라이브러리가 너무 복잡하고 수준이 높아서 Visual Studio가 원하는 만큼 최적화할 수 없다는 것입니다(gcc를 사용하는 포스터는 분명히 더 나은 결과를 얻었습니다).

좋은 컴파일러인 IMHO는 실제로 두 프로시저를 인라인하고 최적화했어야 합니다. 둘 다 매우 짧고 간단하며 루프 등을 포함하지 않습니다.단순히 주장과 결과를 전달하는 데 많은 시간이 낭비될 수 있습니다.

Rodrigob의 답변에 따르면 적절한 최적화(GCC의 기본값은 -O0)를 활성화하는 것이 좋은 성능을 얻는 열쇠입니다.게다가, 나는 또한 블레이즈 다이나믹매트릭스 이는 정확히 동일한 최적화 플래그를 사용하여 추가적인 요소 2 성능 향상을 가져왔습니다. https://bitbucket.org/account/user/blaze-lib/projects/BLAZE

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow