Question

I am writing a very small URL shortener with Dancer. It uses the REST plugin to store a posted URL in a database with a six character string which is used by the user to access the shorted URL.

Now I am a bit unsure about my random string generation method.

sub generate_random_string{
    my $length_of_randomstring = shift; # the length of 
                                        # the random string to generate

    my @chars=('a'..'z','A'..'Z','0'..'9','_');
    my $random_string;
    for(1..$length_of_randomstring){
        # rand @chars will generate a random 
        # number between 0 and scalar @chars
        $random_string.=$chars[rand @chars];
    }

    # Start over if the string is already in the Database
    generate_random_string(6) if database->quick_select('urls', { shortcut => $random_string });

    return $random_string;
}

This generates a six char string and calls the function recursively if the generated string is already in the DB. I know there are 63^6 possible strings but this will take some time if the database gathers more entries. And maybe it will become a nearly infinite recursion, which I want to prevent.

Are there ways to generate unique random strings, which prevent recursion?

Thanks in advance

Was it helpful?

Solution

We don't really need to be hand-wavy about how many iterations (or recursions) of your function there will be. I believe at every invocation, the expected number of iterations is geomtrically distributed (i.e. number of trials before first success is governed by the geomtric distribution), which has mean 1/p, where p is the probability of successfully finding an unused string. I believe that p is just 1 - n/63^6, where n is the number of currently stored strings. Therefore, I think that you will need to have stored 30 billion strings (~63^6/2) in your database before your function recurses on average more than 2 times per call (p = .5).

Furthermore, the variance of the geomtric distribution is 1-p/p^2, so even at 30 billion entries, one standard deviation is just sqrt(2). Therefore I expect ~99% of the time that the loop will take fewerer than 2 + 2*sqrt(2) interations or ~ 5 iterations. In other words, I would just not worry too much about it.

OTHER TIPS

From an academic stance this seems like an interesting program to work on. But if you're on the clock and just need random and distinct strings I'd go with the Data::GUID module.

use strict;
use warnings;
use Data::GUID qw( guid_string );

my $guid = guid_string();

Getting rid of recursion is easy; turn your recursive call into a do-while loop. For instance, split your function into two; the "main" one and a helper. The "main" one simply calls the helper and queries the database to ensure it's unique. Assuming generate_random_string2 is the helper, here's a skeleton:

do {
   $string = generate_random_string2(6);
} while (database->quick_select(...));

As for limiting the number of iterations before getting a valid string, what about just saving the last generated string and always building your new string as a function of that?

For example, when you start off, you have no strings, so let's just say your string is 'a'. Then the next time you build a string, you get the last built string ('a') and apply a transformation on it, for instance incrementing the last character. This gives you 'b'. and so on. Eventually you get to the highest character you care for (say 'z') at which point you append an 'a' to get 'za', and repeat.

Now there is no database, just one persistent value that you use to generate the next value. Of course if you want truly random strings, you will have to make the algorithm more sophisticated, but the basic principle is the same:

  1. Your current value is a function of the last stored value.
  2. When you generate a new value, you store it.
  3. Ensure your generation will produce a unique value (one that did not occur before).

I've got one more idea based on using MySQL.

create table string (
    string_id int(10) not null auto_increment,
    string varchar(6) not null default '',
    primary key(string_id)
);

insert into string set string='';

update string
    set string = lpad( hex( last_insert_id() ), 6, uuid() )
    where string_id = last_insert_id();

select string from string
    where string_id = last_insert_id();

This gives you an incremental hex value which is left padded with non-zero junk.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top