Two almost identical kernels in OpenCL (some tabs instead of spaces) yet one will build and one won't

https://stackoverflow.com/questions/23232747

07-07-2023
|

Question

Ignore the code, that's a whole other issue. I've looked at the spec and I don't see anything indicating whitespace will break a build or whatnot. So my question is wtf? Is this an issue with AMDs drivers? Their CodeXL sure is one buggy pos as the debugger never seems to work.

Anyways the issue is the 1st I've used all tabs for indenting whereas in the 2nd one, which does build, it's a mixture of tabs, and 3 spaces for indenting.

Here's the build log:

line 44: warning: variable "idx" was declared but never referenced
int idx = row * rows + column ;
    ^

line 48: warning: this declaration has no storage class or type specifier
umn ;
^

line 48: error: global variable must be declared in addrSpace constant
umn ;
^

line 49: error: identifier "column" is undefined
outputImage[column] = sum.x ;
            ^

line 49: error: identifier "sum" is undefined
outputImage[column] = sum.x ;
                      ^

line 50: error: expected a declaration
}
^

4 errors detected in the compilation of OCL2836T5.cl.
Frontend phase failed compilation.

This one won't build:

__kernel
void convolution(
    __read_only  image2d_t  sourceImage,
    __global float *outputImage,
    int rows,
    int cols,
    sampler_t sampler)
{
    // Store each work-item’s unique row and column
    int column = get_global_id(0);
    int row    = get_global_id(1);
    int2 coords;  // Coordinates for accessing the image
    coords.x = column ;
    coords.y = row ;

    // All accesses to images return data as four-element vector 
    // (i.e., float4), although only the 'x' component will contain 
    // meaningful data in this code
    float4 sum = {0.0f, 0.0f, 0.0f, 0.0f};

    uint4 pixel;
    // Read a pixel from the image.  A single channel image 
    // stores the pixel in the 'x' coordinate of the returned
    // vector.
    pixel = read_imageui(sourceImage, sampler, coords);
    read_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    sum.x += pixel.x;
    write_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    write_mem_fence(CLK_LOCAL_MEM_FENCE) ;
    sum.x += pixel.y;
    write_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    write_mem_fence(CLK_LOCAL_MEM_FENCE) ;
    sum.x += pixel.z;
    write_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    write_mem_fence(CLK_LOCAL_MEM_FENCE) ;


    // Copy the data to the output image if the
    // work-item is in bounds
    if(row < rows && column < cols) {
        //coords.x = row * rows + column ;
        //coords.y = 0;
        //write_imagef(outputImage, coords, sum);
        int idx = row * rows + column ;
        outputImage[column] = sum.x ;
    }
}

This one will:

__kernel
void convolution(
   __read_only  image2d_t  sourceImage,
   __global float *outputImage, 
   int rows,
   int cols,
   sampler_t sampler) 
{
   // Store each work-item’s unique row and column
   int column = get_global_id(0);
   int row    = get_global_id(1);
   int2 coords;  // Coordinates for accessing the image
   coords.x = column ;
   coords.y = row ;

   // All accesses to images return data as four-element vector 
   // (i.e., float4), although only the 'x' component will contain 
   // meaningful data in this code
   float4 sum = {0.0f, 0.0f, 0.0f, 0.0f};


    uint4 pixel;
    // Read a pixel from the image.  A single channel image 
    // stores the pixel in the 'x' coordinate of the returned
    // vector.
    pixel = read_imageui(sourceImage, sampler, coords);
    read_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    sum.x += pixel.x;
    write_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    write_mem_fence(CLK_LOCAL_MEM_FENCE) ;
    sum.x += pixel.y;
    write_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    write_mem_fence(CLK_LOCAL_MEM_FENCE) ;
    sum.x += pixel.z;
    write_mem_fence(CLK_GLOBAL_MEM_FENCE) ;
    write_mem_fence(CLK_LOCAL_MEM_FENCE) ;


   // Copy the data to the output image if the
   // work-item is in bounds
   if(row < rows && column < cols) {
      //coords.x = row * rows + column ;
      //coords.y = 0;
      //write_imagef(outputImage, coords, sum);
      int idx = row * rows + column ;
      outputImage[column] = sum.x ;
   } 
}

Solution

Upon further looking it appears that the problem was with a windows function to open the .cl file.

_sopen_s(&fd, filename, _O_RDONLY, _SH_DENYRW, _S_IREAD) ;

Upon changing it to:

_sopen_s(&fd, filename, _O_RDONLY | O_BINARY, _SH_DENYRW, _S_IREAD) ;

Tada! Problem solved.

Here's where it was screwing up without using the O_BINARY flag:

    outputImage[idx] = sum.x ;
    } 
}
lumn ;

    outputImage[idx] = sum.x ;

   } 

}

And fixed it looked as it should:

    outputImage[idx] = sum.x ;

   } 

}

Why was this happening? No idea.

EDIT: After reading the comments by sharpneli and jprice I decided to do some more digging to figure out what the exact issue was.

The O_BINARY flag isn't actually need as I traced the root cause to a discrepancy between:

*filesize = (int64_t *)_filelengthi64(fd)

and

bytesRead = _read( fileDescriptor, *fileContents, (UINT)filesize ) ;

Filesize was 765 whereas bytesRead was 733.

A bit further down the original line wherein the problem ultimately lay was:

//make sure string is null terminated
(*fileContents)[(int)filesize -1] = '\0' ;

and upon changing to:

(*fileContents)[(int)bytesRead-1] = '\0' ;

Voila.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow