Question

I am trying to translate the following C code, which basically just tries to convert an arbitrary integer value into a character from a pool of characters, into PHP:

#include <cstdint>
#include <cstring>
#include <iostream>

uint8_t GetCharacter(uint32_t value) {
    static const char* valid_characters = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
    static const size_t valid_characters_l = strlen(valid_characters);
    uint8_t c = valid_characters[value % valid_characters_l];
    return valid_characters[(value << c) % valid_characters_l];
}

int main() {
    uint32_t array[] = {176, 52, 608, 855};
    for (size_t i=0; i < 4; i++) {
        uint8_t c = GetCharacter(array[i]);
        std::cout << array[i] << ": " << (uint32_t) c << "\n";
    }
    return 0;
}

Which yields

176: 109
52: 114
608: 85
855: 65

The PHP code I've been able to come up with however yields the following:

176: 109
52: 114
608: 85
855: 104   // << Here's the problem

I am very sure I translated it exactly and I am unable to find the issue.

<?php

function getCharacter($index) {
    $chars = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
    $c = ord(substr($chars, $index % strlen($chars)));
    return ord(substr($chars, ($index << $c) % strlen($chars)));
}

function main() {
    $array = array(176, 52, 608, 855);
    foreach ($array as $value) {
        echo "$value: " . getCharacter($value) . "\n";
    }
}

main();

Could someone point me into the right direction to solve this problem?

Was it helpful?

Solution

I believe that the problem is that the number ($index << c) is 3,586,129,920 which is > 2 billion, and cannot be properly represented by a signed 32 bit integer. Since you don't explicitly define the data type of $value in php, I think the arithmetic ends up being implementation dependent.

Actually it is surprising that things work at all - you are shifting a 32 bit number by a value greater than 32 which is going to lead to undefined behavior, I think. You might want to re-think the underlying math, and in particular consider the underflow / overflow behavior of your code.

As a potential solution, you might notice that you have a finite number of possible inputs and corresponding outputs - you could actually create a direct lookup table. I believe I did this correctly (using the C++ version of your code with some modifications) - it surprised me a little bit that it didn't result in a 1:1 mapping. The lookup string becomes:

$lookupString = "6RQtrpp07TU4AP1IDKmjl8QD7WjitmwUAcjT3AT9MuAu3PUKJtIb5vS"

And your php code can be reduced to

$value = ord(substr($lookupString, $input % 55));

Where 55 is the length of the lookupString.

Interesting observation: a number of characters appear more than once; other characters are never used. This means that this is not a very "good" encoding scheme (if that is what it is trying to be).

For reference, this is the code I used to determine the lookup string:

#include <cstring>
#include <iostream>

static const char* valid_characters = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";

uint8_t GetCharacter(uint32_t value) {
    static const size_t valid_characters_l = strlen(valid_characters);
    uint8_t c = valid_characters[value % valid_characters_l];    
    return valid_characters[(value << c) % valid_characters_l];
}

int main() {
    uint32_t array[] = {176, 52, 608, 855};
    for (size_t i=0; i < 55; i++) {
        uint8_t c = GetCharacter(i + '0');
        std::cout << char(c);
    }
    std::cout << "\n";
    return 0;
}

OTHER TIPS

You're almost certainly encountering the "problem" because you're running on 32-bit PHP, or PHP on Windows (which doesn't support 64-bit integers regardless of the OS bitness). The issue is that you're overflowing the integer on the shift operation:

64-bit PHP:

PHP_INT_MAX: 9223372036854775807
C: 66, index: 176, strlen: 55, shift: 704, substr: mnopqrstuvw :: 176: 109
C: 117, index: 52, strlen: 55, shift: 468374361246531584, substr: 9ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw :: 52: 57
C: 51, index: 608, strlen: 55, shift: 1369094286720630784, substr: hijklmnopqrstuvw :: 608: 104
C: 86, index: 855, strlen: 55, shift: 3586129920, substr: ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw :: 855: 65

32-bit PHP:

PHP_INT_MAX: 2147483647
C: 66, index: 176, strlen: 55, shift: 704, substr: mnopqrstuvw :: 176: 109
C: 117, index: 52, strlen: 55, shift: 109051904, substr: rstuvw :: 52: 114
C: 51, index: 608, strlen: 55, shift: 318767104, substr: UVWabcdefghijklmnopqrstuvw :: 608: 85
C: 86, index: 855, strlen: 55, shift: -708837376, substr: hijklmnopqrstuvw :: 855: 104

Unfortunately, PHP doesn't support long integers on 32-bit systems at all natively (yet). The only way to work around this would be via an external package like GMP or BCMath. When PHP v7.0 is released later this year, this problem should be fixed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top