Question

It takes a long time to figure out what was causing malfunctioning a website of mine when migrating to a better hosting subscription.

I use a 'self-made' uniqueId generator to generate everything that must be unique but this uniqueness is not random. I use this to communicate between several services, generate reproducible unique 'numbers' for files, articles and so on.

This is the function I have made and never had problems with (I think it never runs on a 64bit system before?) to generate an unique id. I know this uniqueness is limited (64.000) but never lead to a problem until now.

function suGetHashCode($s)
{
 $hash=0;
 $c=(is_string($s))?strlen($s):0;
 $i=0;
 while($i<$c) 
 {
   $hash = (($hash << 5)-$hash)+ord($s{$i++});
   //hash = hash & hash; // Convert to 32bit integer
 }
 return ( $hash < 0 )?(($hash*-1)+0xFFFFFFFF):$hash; // convert to unsigned int
} 

function suUniqueId( $s, $bAddLen = false )
{ 
  $i = base_convert( suGetHashCode( $s ), 10, 32 );
  if( $bAddLen && is_string($s) )
   { $i.=('-'.suGetLz( dechex( strlen($s)*4 ), 3 )); } 

  return $i; 
}

function suGetLz( $i, $iMaxLen ) // Leading zero
{
  if( !is_numeric( $i ) || $i < 0 || $iMaxLen <= 0 )
   { return $i; }
  $c = strlen( $i );
  while( $c < $iMaxLen )
   { $c++; $i='0'.$i; } 
  return $i;
}   

The max int value of an integer is on the new system:

PHP_INT_MAX = 9223372036854775807

On other system(s) it is:

PHP_INT_MAX = 2147483647

Well, I am not a math person, I think this is causing the problem because of the 0xFFFFFFFF increment when negative (I think it will be never negative on this new system).

But how can I change the function that it produces the same unique id's like on other systems?

For example: It produces the same id for different strings on the new hosting server:

 $sThisUrl = '<censored>';
 var_dump( suUniqueId($sThisUrl) ); // Produce: 1l5kc37uicb  
 $sThisUrl = '<censored>';
 var_dump( suUniqueId($sThisUrl) ); // Produce the same id as above: 1l5kc37uicb

But, this must be like on older systems:

 $sThisUrl = '<censored>';
 var_dump( suUniqueId($sThisUrl) ); // Produce: a46q6nd  
 $sThisUrl = '<censored>';
 var_dump( suUniqueId($sThisUrl) ); // Produce: 2mirj1h

Notice: The string is seperate into parts to avoid stackoverflow see this a link.

EDIT: Removed filenames

Does anyone how to deal with this problem?

Était-ce utile?

La solution

I suggest you truncate after every character is processed:

$hash = (($hash << 5)-$hash)+ord($s{$i++});
$hash = $hash & 0xFFFFFFFF; // Convert to 32bit integer

At least on my 64bit system this leads to the desired 2mirj1h in your second example, although without this modification I got 1c6ta2qjga7 and not 1l5kc37uicb as you did.

I'd also change the return value to simply return $hash. Either it can represent unsigned 32bit numbers correctly, then the preceding mask should force that interpretation. Or your system can't represent these, then the added computation won't get you there either, and you'd have to split the number into bit groups and stringify them individually.

Of course, the easiest solution would be to use some well established common hashing algorithm, e.g. using the hash function. Add some secret salt if you feat this might open you to attacks. If the result of such a hash code is too long, you can simply take part of the output. You can convert base any way you like, so you won't have to use the hexadecimal notation common for hashes. Using a cryptographic hash would also reduce chances of a conflict; for example in your case the document generbM.js in the same path would yield the same hash.

Autres conseils

If I were you I would write a unit test to make sure that you get the same results on a 32 bit and a 64 bit machine for that one function.

The loop should be changed in something like this:

while($i<$c) 
{
  $hash = (($hash << 5)-$hash)+ord($s{$i++});
  hash = hash & 0xFFFFFFFF; // Convert to 32bit integer
}
$hash = ( $hash < 0 )?(($hash*-1)+0xFFFFFFFF):$hash; // convert to unsigned int
return $hash & 0xFFFFFFFF; // Convert to 32bit integer

Your Unit test can run against the original on the 32 bit version and save the output. Then run that on the 64 bit and compare with those 32 bit results. If any one is different, you know that you still don't have a 1 to 1 equivalent.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top