AMD64 misalignment of array in C: why no performance degradation?

Question 1

According to perf (see https://perf.wiki.kernel.org/index.php/Main_Page) most of the time in your code is taken by the loop instructions (comparison + jump) associated with:

for (i = 0; i < size; i += STEP_SIZE)


       │    DATATYPE sum(DATATYPE **ptr, int size) {                                        ▒
       │        DATATYPE sum = 0;                                                           ▒
       │        int i, j;                                                                   ▒
       │        for (i = 0; i < size; i += STEP_SIZE) {                                     ▒
       │            for (j = 0; j < NR_ARRAYS; j++) {                                       ▒
       │                sum += ptr[j][i];                                                   ▒
  2.83 │60:   mov    (%rdx),%rdi                                                            ▒
  4.37 │      add    $0x8,%rdx                                                              ▒
  5.50 │      add    (%rdi,%r8,1),%rcx                                                      ▒
       │                                                                                    ▒
       │    DATATYPE sum(DATATYPE **ptr, int size) {                                        ▒
       │        DATATYPE sum = 0;                                                           ▒
       │        int i, j;                                                                   ▒
       │        for (i = 0; i < size; i += STEP_SIZE) {                                     ▒
       │            for (j = 0; j < NR_ARRAYS; j++) {                                       ▒
 86.29 │      cmp    %r12,%rdx                                                              ▒
       │    ↑ jne    60                                                                     ▒
  0.10 │      add    $0x40,%r8                                                              ▒

As a consequence you don't see the influence of bad alignment.

Question 2

are the array accesses in the program really misaligned?

Yes, they are, if sizeof(DATATYPE) is greater than 1.

did i consider the main things that likely influence the performance in this program?

~~No, you didn't. For example, 100 iterations is nothing. Write a loop of 100 million iterations and you will get a more realistic result.~~ Nevermind, I misread the code. The benchmark so far looks "fine" (apart from the UB), however, there may still be other factors to take in consideration.

Is it possible, that the performance is the same for an ALIGNMENT_OFFSET of 0 and 1

Everything is possible, since your program invokes undefined behavior.