Method to convert from Hex to Integer in C, can't get lowercase!

https://stackoverflow.com/questions/1416571

06-07-2019
|

Question

Hey everyone, just a quick thing, I have the hex to integer working, but I need to get the numbers lowercase. Here's what I have, any ideas to get to get the A thru F case insensitive?

int htoi(char f[]) {
    int  z, n;
    n = 0;

    for (z = 0; f[z] >= '0' && f[z] <= 'F'; ++z) 
        if (f[z] >= 'A' && f[z] <= 'F')
            n = 10 + 16 * n + (f[z] - 'A');
        else
            n = 16 * n + (f[z] - '0');
}

Probably just a small thing, but I'd like to include a-f and A-F. Thanks for your help!

Solution

If you're doing this to learn how to do it, ignore this post. If you're using this function because you need to convert a string of hex numbers to an int, you should take a walk in your standard library. The standard function strtol() converts a string to a long, which can be cast down to an int (or an unsigned int while were at it). The third argument is the base to convert to - in this case, you would want base 16 for hexadecimal. Also, if given base 0, it will assume hex if the string begins with 0x, octal if it begins with 0, and decimal otherwise. It's a very useful function.

EDIT: Just noticed this, but while we're here, it's worth mentioning that you should generally not use an int to index arrays. The C standard defines a type, called size_t, which is designed to store array indices. It is generally an unsigned int or unsigned long or something, but is guaranteed to be big enough to store any array or pointer offset you can use.

The problem with using just an int is that, theoretically, maybe, someday, someone could pass a string longer than INT_MAX, and then your int will overflow, probably wrap around, and start reading memory it probably shouldn't because it's using a negative index. This is highly unlikely, especially for a function like this, because the int value you return will overflow long before your int counter overflows, but it is an important thing to keep in mind.

To be technically correct, you should only use size_t type variables to index arrays, or at least only use unsigned types, unless you really want to try to access negative elements (which is usually a bad idea unless you know what you're doing). However, it's not a huge issue here.

OTHER TIPS

Create another function which turns a hex digit to an integer:

int hex_digit_to_integer(char digit) {
    if (digit >= 'A' && digit <= 'F') {
        return digit - 'A' + 10;
    } else if (digit >= 'a' && digit <= 'f') {
        return digit - 'a' + 10;
    } else if (digit >= '0' && digit <= '9') {
        return digit - '0';
    }

    return -1; // Bad input.
}

Notice how it handles four cases: * digit is an upper-case letter A..F, * digit is a lower-case letter a..f, * digit is a decimal digit 0..9, and * digit is none of the above.

Now use the new function in your original function:

int htoi(char f[]) {
    int z, n;
    n = 0;

    /* Loop until we get something which isn't a digit (hex_digit_to_integer returns something < 0). */
    for (z=0; hex_digit_to_integer(f[z]) >= 0; ++z) {
        n = 16 * n + hex_digit_to_integer(f[z]);
    }
}

Notice how much cleaner the new function looks?

If you're adventurous, you can use this magic function (which doesn't handle bad input, so you need to check that beforehand):

int hex_digit_to_integer(char digit) {
    return digit - (digit & 64 ? 55 : 48) & 15;
}

Replace all f[z] with a dedicated variable. Assign that variable with toupper(f[z])

Here is some code from the NPS NSRL Bloom package:

static int *hexcharvals = 0;

/** Initialization function is used solely for hex output
 */
static void nsrl_bloom_init()
{
    if(hexcharvals==0){
        /* Need to initialize this */
        int i;
        hexcharvals = calloc(sizeof(int),256);
        for(i=0;i<10;i++){
            hexcharvals['0'+i] = i;
        }
        for(i=10;i<16;i++){
            hexcharvals['A'+i-10] = i;
            hexcharvals['a'+i-10] = i;
        }
    }
}

/**
 * Convert a hex representation to binary, and return
 * the number of bits converted.
 * @param binbuf output buffer
 * @param binbuf_size size of output buffer in bytes.
 * @param hex    input buffer (in hex)
 */
int nsrl_hex2bin(unsigned char *binbuf,size_t binbuf_size,const char *hex)
{
    int bits = 0;
    if(hexcharvals==0) nsrl_bloom_init();
    while(hex[0] && hex[1] && binbuf_size>0){
        *binbuf++ = ((hexcharvals[(unsigned char)hex[0]]<<4) |
                     hexcharvals[(unsigned char)hex[1]]);
        hex  += 2;
        bits += 8;
        binbuf_size -= 1;
    }
    return bits;
}

This code is designed to be super-fast, handle both upper-case and lower-case hex, and handle hex strings of any length. The function nsrl_hex2bin() takes a binary buffer, the size of that buffer, and the hex string you want to convert. It returns the number of bits that actually got converted.

Oh, if you want just an integer, then you can multiply out the bytes (for endian-independent code), or just do a cast (for endian-dependent code).

You could try sscanf instead:

#include <stdio.h>

...

//NOTE: buffer overflow if f is not terminated with \0 !!
int htoi(char f[]){
  int intval = -1;
  if (EOF == sscanf(f, "%x", &intval))
    return -1; //error
  return intval;
}

Two options:

Convert to upper case before you do your scan.

Add a second if in the four loop that handles lower case.

Try this instead:

int htoi (char f[]) {
    int  z, n;
    n = 0;
    for (z = 0; f[z] != '\0'; ++z) { 
        if (f[z] >= '0' && f[z] <= '9') {
            n = n * 16 + f[z] - '0';
        } else {
            if (f[z] >= 'A' && f[z] <= 'F') {
                n = n * 16 + f[z] - 'A' + 10;
            } else {
                if (f[z] >= 'a' && f[z] <= 'f') {
                    n = n * 16 + f[z] - 'a' + 10;
                } else {
                    break;
                }
            }
        }
    }
    return n;
}

It still treats the input the same way as yours (I'd tend to use pointers but they are sometimes hard to understand by a beginner) but introduces three separate cases, 0-9, A-F and a-f, treating each appropriately.

Your original code would actually allow erroneous characters (the six between '9' and 'A') and produce incorrect results based on them.

Note that this new code only normally terminates the loop at end of string. Finding an invalid hex character will break out of the loop, functionally identical to your terminating condition.

Use strtol() please. This is the standard C90 function and much more powerful than most of naive implementations. It also supports seamless conversion from dec (no prefix), hex (0x) and oct (starting with 0).

An inplementation using shift rotation instead of multiplication.

int HexToDec(char *Number)
{

    unsigned int val = 0;

    int i , nibble;
    for(i = strlen( Number ) - 1; i >= 0; --i, nibble += 4)
    {
        const char hex = Number[i];
        if (hex >= '0' && hex <= '9')
            val += (hex - '0')<<nibble;
        else if (hex >= 'A' && hex <= 'F')
            val += (hex - 'A' + 10)<<nibble;
        else if (hex >= 'a' && hex <= 'f')
            val += (hex - 'a' + 10)<<nibble;
        else
            return -1;
    }
    return val;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow