crc16 algorithm from C++ to bash

https://stackoverflow.com/questions/8564267

21-03-2021
|

Question

I am trying to implement a CRC16 checksum in bash. I'm porting from an existing piece of C++ code. I'm almost there, but I am getting different answers.

I don't quite see why the checksums between the C++ code and the bash script are different.

Another set of eyes would be a big help.

Here is the C++ code:

uint16_t Encoder::checksum(std::string thestring)
{
    uint8_t d, e, f;
    uint16_t c, r, crccalc;
    c = 0xffff;

    for (unsigned int i = 0; i < thestring.length(); i++)
    {
        d = thestring[i];
        e = c ^ d;
        f = e ^ (e << 4);
        r = (c >> 8) ^ (f << 8) ^ (f << 3) ^ (f >> 4);
        c = r;
    }
    c ^= 0xffff;
    crccalc = c;
    return crccalc;
}

And here is my bash code:

function calc_crc16()
{
    string=$1
    while read -d "" -n 1 ; do astring+=( "$reply" ) ; done <<< "$string"

    cnt=${#astring[@]}
    c=0xffff

    for ((x=0;x<$cnt;x++)); do
        char=${astring[$x]}
        e=$(($c ^ $char))
        s=$(($e << 4))
        f=$(($e ^ $s))
        t1=$(($c >> 8))
        t2=$(($f << 8))
        t3=$(($f << 3))
        t4=$(($f >> 4))
        r1=$(($t1 ^ $t2 ^ $t3 ^ $t4))
        c=$r1
    done
    c=$c ^ 0xffff
    echo "checksum = $c"
}

Is it going to have something to do with the size of the ints? I'm guessing there's not much I can do about that in bash.

I'm getting an actual number, but it doesn't match the C++, which I know works correctly. does anyone see anything where I may be screwing things up?

Solution 3

ok. with Sorpigal's help, i've got a working version.

i suspect this could all be done within an awk script, which may run a lot faster. i may try that next.

thank you all for the help. i don't mean to steal the solution here, but i worked on it, and figure it is worth putting up.

anyway, here is a working version:

function calc_crc16()
{
    while read -r -d "" -n 1 ; do astring+=( "$REPLY" ) ; done <<< "$1"

    cnt=${#1}
    c=65535

    for ((x=0;x<$cnt;x++)); do
        char=$(printf '%d' \'"${1:$x:1}")
        e=$(((c ^ char) & 0x00FF))
        s=$(((e << 4) & 0x00FF))
        f=$(((e ^ s) & 0x00FF))
        r1=$(((c >> 8) ^ (f << 8) ^ (f << 3) ^ (f >> 4)))
        c=$r1
    done
    c=$((c ^ 0xffff))
    echo "checksum = $c"
}

OTHER TIPS

First problem is near the top

while read -d "" -n 1 ; do astring+=( "$reply" ) ; done <<< "$string"

$reply is wrong, since you didn't specify a variable name for read the name is $REPLY.

Next error is at the end

c=$c ^ 0xffff

This should be

c=$(($c ^ 0xffff))

At least this way it will run without errors, correctness and appropriateness is something else.

Correctness issues: What if the input string has a space? This will break horribly. Always quote variable exapnsions

Change

char=${astring[$x]}

char="${astring[$x]}"

Strangely this rule is different inside $(()) constructs. Your bit operations should reference variables without any $ in these cases

e=$(( c ^ char ))
s=$(( e << 4 ))
f=$(( e ^ s ))
t1=$(( c >> 8 ))
t2=$(( f << 8 ))
t3=$(( f << 3 ))
t4=$(( f >> 4 ))
r1=$(( t1 ^ t2 ^ t3 ^ t4))

And later

c=$(( c ^ 0xffff ))

This will cause variables to be expanded and whitespace not to blow things up.

In general you should also pass -r to read, see help read for what it does.

Why make an extra copy of $1 before processing it into an array? Using

while read -d "" -n 1 ; do astring+=( "$REPLY" ) ; done <<< "$1"

is sufficient.

It's probably not necessary to turn your input into an array before processing. Instead you can slice characters out of the string in your loop, which is closer to what the C++ version is doing. Replace

char="${astring[$x]}"

with

char="${1:$x:1}"

This is operating directly on the function paramter; since we're no longer making a copy of that we also need to get $cnt a different way

cnt=${#1}

But you really have even bigger problems than this, like the fact that a character isn't an integer in bash. To convert you must use the following syntax:

printf '%d' \'a

where a is the character to convert. Inserting this into the context of the script it would be

char=$(printf '%d' \'"${1:$x:1}")

Now we're getting somewhere, but I really must ask you to consider whether all of this is really worth it. Even if you can make it work, what do you gain?

just for future reference, here is the awk script that i came up with.

this works just as fast as the C++ code i have, which is basically instantaneous. the bash takes about 10 seconds to run for the same string. the awk is much faster.

function awk_calc_crc16()
{
    output=$(echo $1 | awk 'function ord(c){return chmap[c];}
    BEGIN{c=65535; for (i=0; i < 256; i++){ chmap[sprintf("%c", i)] = i;}}
    {
        split($0, chars, "");
        for(i = 1; i <= length(chars); i++)
        {
            cval=ord(chars[i])
            e=and(xor(c, ord(chars[i])), 0x00FF);
            s=and(lshift(e, 4), 0x00FF);
            f=and(xor(e, s), 0x00FF);
            r=xor(xor(xor(rshift(c, 8), lshift(f, 8)), lshift(f, 3)), rshift(f, 4));
            c=r;
        }
    }
    END{c=xor(c, 0xFFFF); printf("%hu", c);}')
    echo $output;
}

shorter (and faster) version
changes:
- the while loop at the beginning is not needed
- r1 is not needed
- cnt is not needed
- use capital letters for bash variables
- one less "\" (backslash) in char=$(printf …
- removed leading 0s for 0xFF

function calc_crc16() {
  CRC=0xFFFF

  for ((X=0; X<${#1}; X++)); do
    CHAR=$(printf '%d' "'${1:$X:1}")
    E=$(((CRC ^ CHAR) & 0xFF))
    S=$(((E << 4)     & 0xFF))
    F=$(((E ^ S)      & 0xFF))
    CRC=$(((CRC >> 8) ^ (F << 8) ^ (F << 3) ^ (F >> 4)))
  done

  let CRC^=0xFFFF
  printf "0x%X\n" $CRC
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow