understanding results of nvprof events "l2_subp0_write_sector_misses" and "l2_subp1_write_sector_misses"

https://stackoverflow.com/questions/21192342

29-09-2022
|

Question

I was able to understand the "l2_subp0_read_sector_misses" and "l2_subp1_read_sector_misses" by going through this post. Now I have a similar question about events "l2_subp0_write_sector_misses" and "l2_subp1_write_sector_misses".

Lets first take the same example as in the given link (vector add)

Kernel code:

__global__ void AddVectors(const float* A, const float* B, float* C, int N)
{
    int blockStartIndex  = blockIdx.x * blockDim.x * N;
    int threadStartIndex = blockStartIndex + threadIdx.x;
    int threadEndIndex   = threadStartIndex + ( N * blockDim.x );
    int i;

    for( i=threadStartIndex; i<threadEndIndex; i+=blockDim.x ){
        C[i] = A[i] + B[i];
    }
}

Here, I also copy array C from host to device. Therefore, C array must be in L2 cache (total size of 3 arrays is less than the size of L2 cache.). But still I see all the write accesses to C as L2 cache misses according to the nvprof results.

Is this the expected behavior? Are there any situations where we can expect L2 write cache hits or are L2 cache write access always become misses?

Thanks.

Solution

Found out that L2 is a write through cache, therefore all the write accesses to L2 is reported as L2 misses.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow