Unexpected behavior when printing 4-byte integer byte by byte

https://stackoverflow.com/questions/2032744

19-09-2019
|

Question

I have this sample code for converting 32 bit integers to ip addresses.


#include <stdio.h>
int main()
{
 unsigned int c ;
 unsigned char* cptr  = (unsigned char*)&c ;
 while(1)
 {
  scanf("%d",&c) ;
  printf("Integer value: %u\n",c);
  printf("%u.%u.%u.%u \n",*cptr, *(cptr+1), *(cptr+2), *(cptr+3) );
 }
}

This code gives incorrect output for input 2249459722 . But when i replace

scanf("%d",&c) ;

scanf("%u",&c) ;

The output comes out to be correct.

P.S : I know about inet_ntop and inet_pton.
I expect answers other than suggesting those.

Solution

You are coding 'sinfully' (making a number of mistakes which will hurt you sooner or later - mostly sooner). First off, you are assuming that the integer is of the correct endian-ness. On some machines, you will be wrong - either on Intel machines or on PowerPC or SPARC machines.

In general, you should show the actual results you get rather than just saying that you get the wrong result; you should also show the expected result. That helps people debug your expectations.

Here's my modified version of your code - instead of requesting input, it simply assumes the value you specified.

#include <stdio.h>
int main(void)
{
    unsigned int c = 2249459722;
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
    return(0);
}

When compiled on my Mac (Intel, little-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

When compiled on my Sun (SPARC, big-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 134.20.8.10

(Using GCC 4.4.2 on the SPARC, I get a warning:

xx.c:4: warning: this decimal constant is unsigned only in ISO C90

Using GCC 4.2.1 on Mac - with lots of warnings enabled (gcc -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Werror) - I don't get that warning, which is interesting.) I can remove that by adding a U suffix to the integer constant.

Another way of looking at the problems is illustrated with the following code and the extremely fussy compiler settings shown above:

#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 0)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}

This fails to compile (because of the -Werror setting) with the message:

cc1: warnings being treated as errors
xx.c: In function ‘main’:
xx.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’

Remove the -Werror setting and it compiles, but then shows the next problem that you have - the one of not checking for error indications from functions that can fail:

Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Conversion failed for 2249459722

Basically, the sscanf() function reports that it failed to convert the string to a signed integer (because the value is too large to fit - see the warning from GCC 4.4.2), but your code was not checking for the error return from sscanf(), so you were using whatever value happened to be left in c at the time.

So, there are multiple problems with your code:

It assumes a particular architecture (little-endian rather than recognizing that big-endian also exists).
It doesn't compile cleanly when using a compiler with lots of warnings enabled - for good reason.
It doesn't check that functions that can fail actually succeeded.

Alok's Comment

Yes, the test on sscanf() is wrong. That's why you have code reviews, and also why it helps to post the code you are testing.

I'm now a bit puzzled - getting consistent behaviour that I can't immediately explain. With the obvious revision (testing on MacOS X 10.6.2, GCC 4.2.1, 32-bit and 64-bit compilations), I get one not very sane answer. When I rewrite more modularly, I get a sane answer.

+ cat yy.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 1)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}


+ gcc -o yy.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

I do not have a good explanation for the value 170.142.21.134; but it is consistent on my machine, at the moment.

+ gcc -o yy.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

Same value - even in 64-bit instead of 32-bit. Maybe the problem is that I'm trying to explain undefined behaviour, which is more or less by definition unexplainable (inexplicable).

+ cat xx.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

static void scan_value(const char *str, const char *fmt, const char *tag)
{
    unsigned int c;
    printf("Indirect operations (%s):\n", tag);
    fmt = "%d";
    if (sscanf(str, fmt, &c) != 1)
        printf("Conversion failed for %s (format %s \"%s\")\n", str, tag, fmt);
    else
        print_value(c);
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722U;

    printf("Direct operations:\n");
    print_value(c);
    scan_value(str, "%d", "signed");
    scan_value(str, "%u", "unsigned");

    return(0);
}

Using the function argument like this means GCC cannot spot the bogus format any more.

+ gcc -o xx.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

The results are consistent here.

+ gcc -o xx.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

And these are the same as the 32-bit case. I'm officially bemused. The main observations remain accurate - be careful, heed compiler warnings (and elicit compiler warnings), and don't assume that "all the world runs on Intel chips" (it used to be "don't assume that all the world is a VAX", once upon a long time ago!).

OTHER TIPS

%d is for signed integers

%u is for unsigned integers

Edit:

Please modify your program as follows to see how your input is really interpreted:

#include <stdio.h>
int main()
{
 unsigned int c ; 
 unsigned char* cptr  = (unsigned char*)&c ;
 while(1)
 {
  scanf("%d",&c) ;
  printf("Signed value: %d\n",c);
  printf("Unsigned value: %u\n",c);
  printf("%u.%u.%u.%u \n",*cptr, *(cptr+1), *(cptr+2), *(cptr+3) );
 }
}

What happens when you supply a number greater than INT_MAX is the left-most bit is 1. This indicates that it is a signed integer with a negative value. The number is then interpreted as it's two's complement

To answer your main question:

scanf("%d", &c);

scanf()'s behavior is undefined when the input being converted can't be represented to the data type. 2249459722 on your machine doesn't fit in an int, so scanf() can do anything, including storing garbage in c.

In C, int type is guaranteed to be able to store values in the range -32767 to +32767. An unsigned int is guaranteed values between 0 and 65535. So, as such, 2249459722 need not fit in even an unsigned int. unsigned long, however, can store values up to 4294967295 (2³²−1), so you should use unsigned long:

#include <stdio.h>
int main()
{
    unsigned long c ;
    unsigned char *cptr  = (unsigned char*)&c ;
    while(1)
    {
        if (scanf("%lu", &c) != 1) {
            fprintf(stderr, "error in scanf\n");
            return 0;
        }
        printf("Input value: %lu\n", c);
        printf("%u.%u.%u.%u\n", cptr[0], cptr[1], cptr[2], cptr[3]);
    }
    return 0;
}

If you have a C99 compiler, you can #include <inttypes.h> and then use uint32_t instead of unsigned long. The scanf() call becomes scanf("%" SCNu32, &c);

The correct endianness-safe way to write this is

printf("Dotted decimal: %u.%u.%u.%u \n", (c >> 24) & 0xff, (c >> 16) & 0xff, (c >> 8) & 0xff, (c >> 0) & 0xff);

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow