numa, mbind, segfault

Question

For this to work, you need to deal with chunks of memory that are at least page-size and page-aligned - that means 4KB in most systems. In your case, I suspect the page gets moved twice (possibly three times), due to you calling mbind() three times over.

The way numa memory is located is that CPU socket 0 has a range of 0..X-1 MB, socket 1 has X..2X-1, socket three has 2X-3X-1, etc. Of course, if you stick a 4GB stick of ram next to socket 0 and a 16GB in the socket 1, then the distribution isn't even. But the principle still stands that a large chunk of memory is allocated for each socket, in accordance to where the memory is actually located.

As a consequence of how the memory is located, the physical location of the memory you are using will have to be placed in the linear (virtual) address space by page-mapping.

So, for large "chunks" of memory, it is fine to move it around, but for small chunks, it won't work quite right - you certainly can't "split" a page into something that is affine to two different CPU sockets.

Edit:

To split an array, you first need to find the page-aligned size.

page_size = sysconf(_SC_PAGESIZE);

objs_per_page = page_size / sizeof(A[0]); 
// We should be an even number of "objects" per page. This checks that that 
// no object straddles a page-boundary
ASSERT(page_size % sizeof(A[0]));   

split_three = SIZE / 3; 

aligned_size = (split_three / objs_per_page) * objs_per_page;

remnant = SIZE - (aligned_size * 3);

piece = aligned_size;

mbind(&A[0],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

mbind(&A[aligned_size],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

mbind(&A[aligned_size*2 + remnant],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

Obviously, you will now need to split the three threads similarly using the aligned size and remnant as needed.