Clang Analyzer false positive or overflow?

https://stackoverflow.com/questions/16311092

13-04-2022
|

Frage

Below is the simplification of some code of ours which seems like it demonstrates a bug in the clang analyzer, though it's possible there's a real bug in our code.

typedef enum {
    value1 =  0x8000, /*If value1 is initialized at < 0x8000,
                        the bug doesn't occur*/
    value2,
    value3,
    value4,
    value5,
    value6
}myEnum;

static bool test_UTIL(bool aBool, UINT16 iCaseValue)
{
    bool canMatch = true;
    int myValue; /*not initialized*/

    if (aBool)
        myValue = 1;  /*initialized */
    else
        canMatch = ((value1 == iCaseValue)
             || (value2 == iCaseValue)
             || (value3 == iCaseValue)
             || (value4 == iCaseValue)
             || (value5 == iCaseValue)
             || (value6 == iCaseValue));

    if (canMatch)
    {
        switch (iCaseValue) 
        {
            case value1:
            case value2:
            case value3:
            case value4:
            case value5:
            case value6:
                break;

            default:
                /*This triggers a clang warning, claiming myValue is undefined*/
            canMatch = (iCaseValue == myValue);
            break;
        }
    }

    return canMatch;
}

As noted in the comment, the bug only happens when the enumeration starts in the range of 0x8000, which would be the sign bit if it were not unsigned. Is it possible that we're getting some kind of overflow implicitly casting to a signed 16 bit integer in the switch statement somehow? Or is Clang confused?

Of course, this example could likely be refactored to achieve equivalent behavior, but the original that this is based off of is 20+ year old code that is not worth rewriting just to satisfy a faulty analyzer warning.

Edit: I've added the assembly generated by the test_UTIL() function below. I can't read assembly enough to spot a problem here, though others may be interested in it:

_test_UTIL:                             ## @test_UTIL
Ltmp15:
    .cfi_startproc
Lfunc_begin1:
    .loc    1 24 0                  ## /Users/jbrooks/Desktop/test/test/main.c:24:0
## BB#0:
    pushq   %rbp
Ltmp16:
    .cfi_def_cfa_offset 16
Ltmp17:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp18:
    .cfi_def_cfa_register %rbp
    movw    %si, %ax
    movl    %edi, -4(%rbp)
    movw    %ax, -6(%rbp)
    .loc    1 25 22 prologue_end    ## /Users/jbrooks/Desktop/test/test/main.c:25:22
Ltmp19:
    movl    $1, -12(%rbp)
    .loc    1 28 2                  ## /Users/jbrooks/Desktop/test/test/main.c:28:2
    cmpl    $0, -4(%rbp)
    je  LBB1_2
## BB#1:
    .loc    1 29 3                  ## /Users/jbrooks/Desktop/test/test/main.c:29:3
    movl    $1, -16(%rbp)
    jmp LBB1_9
LBB1_2:
    movb    $1, %al
    movl    $32768, %ecx            ## imm = 0x8000
    .loc    1 31 3                  ## /Users/jbrooks/Desktop/test/test/main.c:31:3
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#3:
    movb    $1, %al
    movl    $32769, %ecx            ## imm = 0x8001
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#4:
    movb    $1, %al
    movl    $32770, %ecx            ## imm = 0x8002
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#5:
    movb    $1, %al
    movl    $32771, %ecx            ## imm = 0x8003
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#6:
    movb    $1, %al
    movl    $32772, %ecx            ## imm = 0x8004
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#7:
    movl    $32773, %eax            ## imm = 0x8005
    movzwl  -6(%rbp), %ecx
    cmpl    %ecx, %eax
    sete    %dl
    movb    %dl, -17(%rbp)          ## 1-byte Spill
LBB1_8:
    movb    -17(%rbp), %al          ## 1-byte Reload
    andb    $1, %al
    movzbl  %al, %ecx
    movl    %ecx, -12(%rbp)
LBB1_9:
    .loc    1 38 2                  ## /Users/jbrooks/Desktop/test/test/main.c:38:2
    cmpl    $0, -12(%rbp)
    je  LBB1_14
## BB#10:
    .loc    1 40 3                  ## /Users/jbrooks/Desktop/test/test/main.c:40:3
Ltmp20:
    movzwl  -6(%rbp), %eax
    leal    -32768(%rax), %eax
    cmpl    $5, %eax
    ja  LBB1_12
    jmp LBB1_11
LBB1_11:
    .loc    1 48 5                  ## /Users/jbrooks/Desktop/test/test/main.c:48:5
Ltmp21:
    jmp LBB1_13
LBB1_12:
    .loc    1 52 5                  ## /Users/jbrooks/Desktop/test/test/main.c:52:5
    movzwl  -6(%rbp), %eax
    cmpl    -16(%rbp), %eax
    sete    %cl
    andb    $1, %cl
    movzbl  %cl, %eax
    movl    %eax, -12(%rbp)
Ltmp22:
LBB1_13:
LBB1_14:
    .loc    1 57 2                  ## /Users/jbrooks/Desktop/test/test/main.c:57:2
    movl    -12(%rbp), %eax
    popq    %rbp
    ret
Ltmp23:
Lfunc_end1:

Lösung

One unknown is the underlying integer type chosen by the compiler to represent myEnum. This is “implementation-defined” in the sense that the choice needs to be deterministic for separately compiled files to be linkable together, but it is not implementation-defined in the sense that the compiler's documentation explains how this type is chosen. The choice depends on the enum's definition, and any description could only be an algorithm.

Regardless of this shadow, I think that the function is defined (it does not read from an uninitialized myValue for any arguments). In other words, the warning is a false positive. I have “verified” this with another static analyzer that detects uses of uninitialized memory.

What you could do to lift the “integer type for myEnum” shadow is post the assembly code that clang-the-compiler generates. If there is an uninitialized access in the assembly code, it will be easier to understand why.

What might be happening here, but a full-featured static analyzer such as Clang is a complex beast and an explanation coming from someone who is not familiar with its internals should be taken with a grain of salt, is that the underlying integer type chosen for myEnum is different when 0x8000 is picked for value1 as opposed to smaller values. For smaller values, the underlying type for myEnum could be a signed 16-bit short int, whereas 0x8000 forces the compiler to use an unsigned short int. This different type for myEnum would introduce more implicit conversions in the Abstract Syntax Tree representing the function, making it harder to predict, and causing the false positive. I do not work on Clang but I can assure you that these implicit conversions are always a pain to handle in a static analyzer for C.

Clang developers consider false positives bugs and they certainly would like to hear about this one. The homepage says:

Please help us in this endeavor by reporting false positives

and this sentence links directly to the explanation on how to file bugs.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow