Integer overflow not consistent

https://stackoverflow.com/questions/12032122

27-06-2021
|

Question

Pardon me if this question has been posed before. I looked for answers to similar questions, but I'm still puzzled with my problem. So I will shoot the question anyway. I'm using a C library called libexif for image data. I run my application (which uses this library) both on my Linux desktop and my MIPS board. For a particular image file when I try to fetch the created time, I was getting an error/invalid value. On debugging further I saw that for this particular image file, I was not getting the tag (EXIF_TAG_DATE_TIME) as expected.

This library has several utility functions. Most functions are structured like below

int16_t 
exif_get_sshort (const unsigned char *buf, ExifByteOrder order)
{
    if (!buf) return 0;
        switch (order) {
        case EXIF_BYTE_ORDER_MOTOROLA:
                return ((buf[0] << 8) | buf[1]);
        case EXIF_BYTE_ORDER_INTEL:
                return ((buf[1] << 8) | buf[0]);
        }

    /* Won't be reached */
    return (0);
}

uint16_t
exif_get_short (const unsigned char *buf, ExifByteOrder order)
{
    return (exif_get_sshort (buf, order) & 0xffff);
}

When the library tries to investigate the presence of tags in raw data, it calls exif_get_short() and assigns the value returned to a variable which is of type enum (int).

In the error case, exif_get_short() which is supposed to return unsigned value (34687) returns a negative number (-30871) which messes up the whole tag extraction from the image data.

34687 is outside the range of maximum representable int16_t value. And therefore leads to an overflow. When I make this slight modification in code, everything seems to work fine

uint16_t
exif_get_short (const unsigned char *buf, ExifByteOrder order)
{
    int temp = (exif_get_sshort (buf, order) & 0xffff);
        return temp;
}

But since this is a pretty stable library and in use for quite some time, it led me to believe that I may be missing something here. Moreover this is the general way the code is structured for other utility functions as well. Ex: exif_get_long() calls exif_get_slong(). I would then have to change all utility functions.

What is confusing me is that when I run this piece of code on my linux desktop for the error file, I see no problems and things work fine with the original library code. Which led to me believe that perhaps UINT16_MAX and INT16_MAX macros have different values on my desktop and MIPS board. But unfortunately, thats not the case. Both print identical values on the board and desktop. If this piece of code fails, it should fail also on my desktop.

What am I missing here? Any hints would be much appreciated.

EDIT: The code which calls exif_get_short() goes something like this:

ExifTag tag;
...
tag = exif_get_short (d + offset + 12 * i, data->priv->order);
switch (tag) {
...
...

The type ExifTag is as follows:

typedef enum {
    EXIF_TAG_GPS_VERSION_ID             = 0x0000,
EXIF_TAG_INTEROPERABILITY_INDEX     = 0x0001,
    ...
    ...
    }ExifTag ;

The cross compiler being used is mipsisa32r2el-timesys-linux-gnu-gcc

CFLAGS        = -pipe -mips32r2 -mtune=74kc -mdspr2 -Werror -O3 -Wall -W -D_REENTRANT -fPIC $(DEFINES)

I'm using libexif within Qt - Qt Media hub (actually libexif comes along with Qt Media hub)

EDIT2: Some additional observations: I'm observing something bizarre. I have put print statements in exif_get_short(). Just before return

printf("return_value %d\n %u\n",exif_get_sshort (buf, order) & 0xffff, exif_get_sshort (buf, order) & 0xffff);
return (exif_get_sshort (buf, order) & 0xffff);

I see the following o/p: return_value 34665 34665

I then also inserted print statements in the code which calls exif_get_short()

....
tag = exif_get_short (d + offset + 12 * i, data->priv->order);
printf("TAG %d %u\n",tag,tag);

I see the following o/p: TAG -30871 4294936425

EDIT3 : Posting assembly code for exif_get_short() and exif_get_sshort() taken on MIPS board

        .file   1 "exif-utils.c"
    .section .mdebug.abi32
    .previous
    .gnu_attribute 4, 1
    .abicalls
    .text
    .align  2
    .globl  exif_get_sshort
    .ent    exif_get_sshort
    .type   exif_get_sshort, @function
exif_get_sshort:
    .set    nomips16
    .frame  $sp,0,$31       # vars= 0, regs= 0/0, args= 0, gp= 0
    .mask   0x00000000,0
    .fmask  0x00000000,0
    .set    noreorder
    .set    nomacro

    beq $4,$0,$L2
    nop

    beq $5,$0,$L3
    nop

    li  $2,1            # 0x1
    beq $5,$2,$L8
    nop

$L2:

    j   $31
    move    $2,$0

$L3:

    lbu $2,0($4)
    lbu $3,1($4)
    sll $2,$2,8
    or  $2,$2,$3
    j   $31
    seh $2,$2

$L8:

    lbu $2,1($4)
    lbu $3,0($4)
    sll $2,$2,8
    or  $2,$2,$3
    j   $31
    seh $2,$2

    .set    macro
    .set    reorder
    .end    exif_get_sshort
    .align  2
    .globl  exif_get_short
    .ent    exif_get_short
    .type   exif_get_short, @function

exif_get_short:

    .set    nomips16
    .frame  $sp,0,$31       # vars= 0, regs= 0/0, args= 0, gp= 0
    .mask   0x00000000,0
    .fmask  0x00000000,0
    .set    noreorder
    .cpload $25
    .set    nomacro

    lw  $25,%call16(exif_get_sshort)($28)
    jr  $25
    nop

    .set    macro
    .set    reorder
    .end    exif_get_short

Just for completeness, the ASM code taken from my linux machine

    .file   "exif-utils.c"
    .text
    .p2align 4,,15
    .globl  exif_get_sshort
    .type   exif_get_sshort, @function

exif_get_sshort:

.LFB1:

        .cfi_startproc
    xorl    %eax, %eax
    testq   %rdi, %rdi
    je  .L2
    testl   %esi, %esi
    jne .L8
    movzbl  (%rdi), %edx
    movzbl  1(%rdi), %eax
    sall    $8, %edx
    orl %edx, %eax
    ret
    .p2align 4,,10
    .p2align 3

.L8:
    cmpl    $1, %esi
    jne .L2
    movzbl  1(%rdi), %edx
    movzbl  (%rdi), %eax
    sall    $8, %edx
    orl %edx, %eax

.L2:
    rep
    ret
    .cfi_endproc

.LFE1:
    .size   exif_get_sshort, .-exif_get_sshort
    .p2align 4,,15
    .globl  exif_get_short
    .type   exif_get_short, @function

exif_get_short:

.LFB2:
    .cfi_startproc
    jmp exif_get_sshort@PLT
    .cfi_endproc
.LFE2:
    .size   exif_get_short, .-exif_get_short

EDIT4: Hopefully my last update :-) ASM code with compiler option set to -O1

exif_get_short:

.set    nomips16
.frame  $sp,32,$31      # vars= 0, regs= 1/0, args= 16, gp= 8
.mask   0x80000000,-4
.fmask  0x00000000,0
.set    noreorder
.cpload $25
.set    nomacro

addiu   $sp,$sp,-32
sw  $31,28($sp)
.cprestore  16
lw  $25,%call16(exif_get_sshort)($28)
jalr    $25
nop

lw  $28,16($sp)
andi    $2,$2,0xffff
lw  $31,28($sp)
j   $31
addiu   $sp,$sp,32

.set    macro
.set    reorder
.end    exif_get_short

Solution

One thing the MIPS assembly shows (though I'm not an expert in MIPS assembly, so there's a decent chance I'm missing something or otherwise wrong) is that the exif_get_short() function is just an alias for the exif_get_sshort() function. All that exif_get_short() does is jump to the address of the exif_get_sshort() function.

The exif_get_sshort() function sign extends the 16 bit value it's returning to the full 32-bit register used for the return. There's nothing wrong with that - it's actually probably what the MIPS ABI specifies (I'm not sure).

However, since the exif_get_short() function just jumps to the exif_get_sshort() function, it has no opportunity to clear the upper 16 bits of the register.

So when the 16 bit value 0x8769 is being returned from the buffer (whether from exif_get_sshort() or exif_get_short()), the $2 register used to return the function result contains 0xffff8769, which can have the following interpretations:

as a 32-bit signed int: -30871
as a 32-bit `unsigned int: 4294936425
as a 16-bit signed int16_t: -30871
as a 16-bit unsigned uint16_t: 34665

If the compiler is supposed to ensure that the $2 return register has a top 16-bit set to zero for a uint16_t return type, then it has a bug in the code it's emitting for exif_get_short() - instead of jumping to exif_get_sshort(), it should call exif_get_sshort() and clear the upper half of $2 before returning.

From the description of the behavior you're seeing, it looks like the code calling exif_get_short() expects that the $2 resister used for the return value will have the upper 16 bits cleared so that the entire 32-bit register can be used as-is for the 16-bit uint16_t value.

I'm not sure what the MIPS ABI specifies (but I'd guess that it specifies that the upper 16 bits of the $2 register should eb cleared by exif_get_short()), but there seems to be either a code generation bug that exif_get_short() doesn't ensure $2 is entirely correct before it returns or a bug where the caller of exif_get_short() assumes that the full 32-bits of $2 are valid when only 16 bits are.

OTHER TIPS

This is broken on so many levels, I don't know where to begin. Just look at what is done here:

The unsigned chars are read from the buffer.
They are assigned to a signed int16_t in exif_get_sshort.
This is assigned to an unsigned uint16_t in exif_get_short.
This is finally assigned to an enum which is of type signed int.

I'd say it's a miracle it works at all.

First, the assignment from the chars to int16_t is done with the values, not with the representation:

return ((buf[0] << 8) | buf[1]);

Which already throws you into the pit of undefined behaviour when the result actually is negative. Plus, it only works when the signed int representation of the implementation is the same as the one used in the file format (two's complement, I guess). It will fail for one's complement and sign-magnitude. So check what's the case for the MIPS implementation.

The clean way would be the other way around: Assign two chars from the buffer to an uint16_t, which will be well defined operations, and use this to return an int16_t. You could then care for further corrections of the value for different representations, if necessary.

Additionally, here:

if (!buf) return 0;

0 is a very bad choice of a return value, because it is a valid enum constant:

EXIF_TAG_GPS_VERSION_ID             = 0x0000,

If this is expected to be the default for invalidity, then the constant should be returned, not the magic number. Though this seems to be a generic function to return an int16_t, thus some other error mechanism should be used here.

For your specific question, well, follow the flow of conversions between signed and unsigned on your MIPS implementation, including the default promotions done, and examine all the intermediate values to find the point where it breaks. Your MIPS uses 32 bit ints, not 16 bit, right? Check INT_MAX and UINT_MAX.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow