Question

Does anybody know any algorithm which generates unique 8 or 9 digits number for a given string? It would be better to have a php example also if not then at least the algorithm.

Was it helpful?

Solution

You could use crc32() and return the len.

<?php 
function crc_string($str, $len){
    return substr(sprintf("%u", crc32($str)),0,$len);
}
echo crc_string('some_string', 8);//65585849
?>

Edit

After doing a collision/reliability test against my answer, its likely you will get collisions for length 8 and perhaps slightly less for 9 and then even less for 10 ect. In my test I tested an incrementing value from 0 to 100k and there were 26 collisions, the first happens a 36k.

<?php 
set_time_limit(0);
header('Content-type: text/html; charset=utf-8');
$time_start = microtime(true);

function crc_string($str, $len){
    return substr(sprintf("%u", crc32($str)),0,$len);
}

echo 'Started, please wait...<br />';
$record = array();
$collisions = 0;
for($i=0; $i<100000;$i++){

    $new = crc_string($i, 8);
    if(in_array($new,$record)){
        $match = array_search($new,$record);
        $took_time = microtime(true) - $time_start;
        echo($new.' has collided for iteration '.$i.' matching against a previous iteration ('.$match.') '.$record[$match]).' (Process time: '.round($took_time,2).'seconds)<br />';
        $collisions++;
    }else{
        $record[]=$new;
    }

    ob_flush();
    flush();
}
echo 'Successfully iterated 100k incrementing values and '.$collisions.' collisions occurred; total processing time: '.round((microtime(true) - $time_start),2).'seconds.';
?>

Test result:

Started, please wait...
38862356 has collided for iteration 36084 matching against a previous iteration (8961) 38862356 (Process time: 165.47seconds)
18911887 has collided for iteration 36887 matching against a previous iteration (8162) 18911887 (Process time: 172.79seconds)
37462269 has collided for iteration 38245 matching against a previous iteration (33214) 37462269 (Process time: 185.81seconds)
20153794 has collided for iteration 38966 matching against a previous iteration (6083) 20153794 (Process time: 192.87seconds)
41429622 has collided for iteration 40329 matching against a previous iteration (24999) 41429622 (Process time: 206.41seconds)
20784356 has collided for iteration 48908 matching against a previous iteration (27095) 20784356 (Process time: 302.75seconds)
39932561 has collided for iteration 51926 matching against a previous iteration (12367) 39932561 (Process time: 340.88seconds)
14372225 has collided for iteration 53032 matching against a previous iteration (13211) 14372225 (Process time: 355.46seconds)
16636457 has collided for iteration 55490 matching against a previous iteration (39250) 16636457 (Process time: 389.44seconds)
23059743 has collided for iteration 63126 matching against a previous iteration (39808) 23059743 (Process time: 504.1seconds)
13627299 has collided for iteration 63877 matching against a previous iteration (21973) 13627299 (Process time: 516.08seconds)
24647738 has collided for iteration 63973 matching against a previous iteration (47328) 24647738 (Process time: 517.62seconds)
14471815 has collided for iteration 71118 matching against a previous iteration (37805) 14471815 (Process time: 641.93seconds)
13253269 has collided for iteration 73602 matching against a previous iteration (33064) 13253269 (Process time: 687.53seconds)
10732050 has collided for iteration 73706 matching against a previous iteration (9197) 10732050 (Process time: 689.44seconds)
18919349 has collided for iteration 80358 matching against a previous iteration (73190) 18919349 (Process time: 819.89seconds)
40795042 has collided for iteration 81875 matching against a previous iteration (31127) 40795042 (Process time: 851.3seconds)
14609922 has collided for iteration 82498 matching against a previous iteration (17366) 14609922 (Process time: 864.29seconds)
20425272 has collided for iteration 83914 matching against a previous iteration (9858) 20425272 (Process time: 894.32seconds)
24790147 has collided for iteration 84519 matching against a previous iteration (9754) 24790147 (Process time: 907.34seconds)
35605337 has collided for iteration 91434 matching against a previous iteration (36127) 35605337 (Process time: 1060.5seconds)
30935494 has collided for iteration 91857 matching against a previous iteration (91704) 30935494 (Process time: 1070.17seconds)
28520037 has collided for iteration 92929 matching against a previous iteration (28847) 28520037 (Process time: 1095.53seconds)
31109474 has collided for iteration 95584 matching against a previous iteration (30349) 31109474 (Process time: 1159.36seconds)
40842617 has collided for iteration 97330 matching against a previous iteration (13609) 40842617 (Process time: 1203.19seconds)
20309913 has collided for iteration 99224 matching against a previous iteration (94210) 20309913 (Process time: 1250.54seconds)
Successfully iterated 100k incrementing values and 26 collisions occurred; total processing time: 1269.98seconds.

Conclusion is that unless you do a 1 for 1 increment on a auto incrementing value your always get collisions for the same byte length and more as you fill your users table:

echo sprintf("%08d",'1');//00000001
echo sprintf("%08d",'2');//00000002
...                      //99999999

You can work around this by adding another byte to the collided value or include the a-z range like with md5()/sha() hash functions tho that defeats the object ;p

Good luck

OTHER TIPS

Collisions will occur, yes, but since you havent stated why you need this, ill assume that collisions dont matter.

You could get the md5 hash of the string (which is in hex) and convert it to our number system and truncate it to the required digits.

This may be helpful to you: php: number only hash?

There are 10^9 unique 9 digits numbers , while there are 256^length strings (assuming ascii strings) for each length.

Thus, from pigeonhole principle - for strings of length 4+, you cannot get a unique number. (collisions must occur)

As an alternative - you might be looking on conventional hash functions (which will collide) or use an unbounded numbers.

As already pointed out, "uniqueness" is not possible if the number has less bits than the string you want to associate with.

What you are looking for is a good hash function.

Check out the MD6 algorithm. It has a customizable digest length up to 512 Bits, so you can create digests which have 8 - 9 decimal digits. I'm not aware of any PHP implementation, the original implementation language is C.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top