質問

i tried to cause a performance degradation by misaligning an array in C. My machine has a 64 byte cache, thus i used a step size of 64 byte in the program, starting from the misaligned address. the results however, stayed the same as when using correctly aligned accesses. also using multiple arrays didn't change anything.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define N 10000000
#define DATATYPE long
#define ALIGNMENT __alignof__(DATATYPE)
#define CACHE_SIZE 64
#define STEP_SIZE (CACHE_SIZE / sizeof(DATATYPE))
#define NR_ARRAYS 20
#define ALIGNMENT_OFFSET 1

DATATYPE arr[N];

DATATYPE sum(DATATYPE **ptr, int size) {
    DATATYPE sum = 0;
    int i, j;
    for (i = 0; i < size; i += STEP_SIZE) {
        for (j = 0; j < NR_ARRAYS; j++) {
            sum += ptr[j][i];   
        }
    }   
    return sum;
}

int main() {
    DATATYPE *arrs[20];
    int i;
    for (i = 0; i < NR_ARRAYS; i++) {
        arrs[i] = (DATATYPE*)((long) malloc(N * sizeof(DATATYPE)) + ALIGNMENT_OFFSET);
    }

    long result = 0;
    clock_t tic = clock();
    for (i = 0; i < 100; i++) {
        result += sum(arrs, N-1);
    }
    clock_t toc = clock();
    printf("result: %ld ", result);
    printf("elapsed: %f seconds\n", (double)(toc - tic) / CLOCKS_PER_SEC);
}

i have the following questions:

  1. are the array accesses in the program really misaligned?
  2. did i consider the main things that likely influence the performance in this program?
  3. is it possible, that the performance is the same for an ALIGNMENT_OFFSET of 0 and 1, because of some "magic", that my CPU performs?
役に立ちましたか?

解決

According to perf (see https://perf.wiki.kernel.org/index.php/Main_Page) most of the time in your code is taken by the loop instructions (comparison + jump) associated with:

for (i = 0; i < size; i += STEP_SIZE)


       │    DATATYPE sum(DATATYPE **ptr, int size) {                                        ▒
       │        DATATYPE sum = 0;                                                           ▒
       │        int i, j;                                                                   ▒
       │        for (i = 0; i < size; i += STEP_SIZE) {                                     ▒
       │            for (j = 0; j < NR_ARRAYS; j++) {                                       ▒
       │                sum += ptr[j][i];                                                   ▒
  2.83 │60:   mov    (%rdx),%rdi                                                            ▒
  4.37 │      add    $0x8,%rdx                                                              ▒
  5.50 │      add    (%rdi,%r8,1),%rcx                                                      ▒
       │                                                                                    ▒
       │    DATATYPE sum(DATATYPE **ptr, int size) {                                        ▒
       │        DATATYPE sum = 0;                                                           ▒
       │        int i, j;                                                                   ▒
       │        for (i = 0; i < size; i += STEP_SIZE) {                                     ▒
       │            for (j = 0; j < NR_ARRAYS; j++) {                                       ▒
 86.29 │      cmp    %r12,%rdx                                                              ▒
       │    ↑ jne    60                                                                     ▒
  0.10 │      add    $0x40,%r8                                                              ▒

As a consequence you don't see the influence of bad alignment.

他のヒント

are the array accesses in the program really misaligned?

Yes, they are, if sizeof(DATATYPE) is greater than 1.

did i consider the main things that likely influence the performance in this program?

No, you didn't. For example, 100 iterations is nothing. Write a loop of 100 million iterations and you will get a more realistic result. Nevermind, I misread the code. The benchmark so far looks "fine" (apart from the UB), however, there may still be other factors to take in consideration.

Is it possible, that the performance is the same for an ALIGNMENT_OFFSET of 0 and 1

Everything is possible, since your program invokes undefined behavior.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top