質問

I'm looking for a quick way to parse human-readable byte sizes (examples: 100, 1k, 2M, 4G) into a byte values. The input is a char * and the output must be a size_t (e.g. unsigned, likely 64-bit or 32-bit integer, depending on architecture). The code should detect invalid input and return an value indicating that it was invalid input.

Examples:

Input  => size_t result
-----------------------
"100"  => 100
"10k"  => 10240
"2M"   => 2097152
"4G"   => 4294967296 on 64-bit machine, error (overflow) on 32-bit machine
"ten"  => error

Here is an example fragment of code to be expanded to handle the unit prefixes:

int parse_human_readable_byte_size(char *input, size_t *result) {
    /* TODO: needs to support k, M, G, etc... */
    return sscanf("%zu", result) == 1;
}

Here are some additional requirements:

  • must be done in C (no C++)
  • use only standard libraries (or at least commonly available) libraries (e.g. sscanf, atoi)

The code is expected to run only a few times per program execution, so smaller readable code is favored over longer higher-performance code.

役に立ちましたか?

解決

Here is a potential implementation. Code to detect all errors is included; fill in your own handling in place of the gotos if you like.

char *endp = s;
int sh;
errno = 0;
uintmax_t x = strtoumax(s, &endp, 10);
if (errno || endp == s) goto error;
switch(*endp) {
case 'k': sh=10; break;
case 'M': sh=20; break;
case 'G': sh=30; break;
case 0: sh=0; break;
default: goto error;
}
if (x > SIZE_MAX>>sh) goto error;
x <<= sh;

他のヒント

I'll try with a sub-function that analyzes the input char by char.

Further an obvious error check, I'll make it translate symbols in numeric constant, multiplied for the base corresponding to the constant.

Based on accepted answer, I updated the snipped. It support float input (like 1.5k), support hexadecimal input (like 0x55k), drop gotos and use a string as list of units to avoid the switch and makes the update easy.

static char *human_readable_suffix = "kMGT";

size_t *parse_human_readable(char *input, size_t *target) {
    char *endp = input;
    char *match = NULL;
    size_t shift = 0;
    errno = 0;

    long double value = strtold(input, &endp);
    if(errno || endp == input || value < 0)
        return NULL;

    if(!(match = strchr(human_readable_suffix, *endp)))
        return NULL;

    if(*match)
        shift = (match - human_readable_suffix + 1) * 10;

    *target = value * (1LU << shift);

    return target;
}

Here are the tests result:

1337   =>           1337 [ok, expected: 1337]
857.54 =>            857 [ok, expected: 857]
128k   =>         131072 [ok, expected: 131072]
1.5k   =>           1536 [ok, expected: 1536]
8M     =>        8388608 [ok, expected: 8388608]
0x55   =>             85 [ok, expected: 85]
0x55k  =>          87040 [ok, expected: 87040]
1T     =>  1099511627776 [ok, expected: 1099511627776]
32.    =>             32 [ok, expected: 32]
-87    => error (expected)
abcd   => error (expected)
32x    => error (expected)

Full code can be found at: https://gist.github.com/maxux/786a9b8bf55fb0696f7e31b8fa3f6b9d

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top