Question

I get confused between two algorithms for generating random characters. I have four characters types, lowercase, uppercase, numbers and special characters. According to the type of random characters requested, it should produce characters. For example:

  • If the requested is pure lowercase, uppercase, numbers or special characters, the value should be chosen randomly from each type alone.
  • If the requested is lowercase and numbers it should be chosen from a mixed string of that two types.
  • etc

The algorithm I adopt for mixed characters type is as the following:

  1. Choose random string from each type with the characters length requested.
  2. Concat those strings and then choose again random characters from the new string.

I think that this algorithm assure equal probabilities for each type to be available in the sample space. However, in some cases the resultant string of mixed types may has no one of the type presented in the string and In some many cases we may get some repeated characters.

There is another algorithm depends on pre-mixing the types before choose the random string, However, the absence of one type is more likely occurred but the repetition is reduced.

I need to know if there is another algorithm that offers balance between presentation of each type and reduce repetition?

The following is the code that implement the algorithm that I mention, firstly, above. It is CakePHP 1.2.x component and a working example could be found here:

<?php
class RandomStringComponent extends Object{
  var $name = 'RandomString';
  var $lowerCase = 'qwertyuiopasdfghjklzxcvbnm';
  var $upperCase = 'ASDFGHJKLQWERTYUIOPZXCVBNM';
  var $numbers = '9874563210';
  var $specialChars = '!@#$%)-+/}[<>~=,;:|`{.?]^&*(';
  var $outputForm = 'LcUcNuSp';
  var $settings = array();
  var $controller = null;      

  var $_defaults = array(
      'outputForm' => 'LcUcNuSp',
      'length' => 6
  );

  function initialize(&$controller, $settings){
    $this->settings = array_merge($this->_defaults, $settings);

  }

  function getRand($l = null){
    if (is_null($l)) $l = $this->settings['length'];
    switch ($this->settings['outputForm']){
      case 'Lc':
        return $this->_Lc($l);
        break;
      case 'Uc':
        return $this->_Uc($l);
        break;
      case 'Sp':
        return $this->_Sp($l);
        break;
      case 'Nu':
        return $this->_Nu($l);
        break;
      case 'LcUc':
        return $this->_setRand($this->_Lc($l).$this->_Uc($l),$l);
        break;
      case 'LcNu':
        return $this->_setRand($this->_Lc($l).$this->_Nu($l),$l);
        break;
      case 'UcNu':
        return $this->_setRand($this->_Uc($l).$this->_Nu($l),$l);
        break;
      case 'LcUcNu':
        return $this->_setRand($this->_Lc($l).$this->_Uc($l).$this->_Nu($l),$l);
        break;
      default:
        return $this->_setRand($this->_Lc($l).$this->_Uc($l).$this->_Nu($l).$this->_Sp($l),$l);
        break;
    }
  }


  function _Lc($l){
    $output = '';
    for ($i = 0; $i < $l; $i++){      
      $output .= $this->lowerCase[mt_rand(0, strlen($this->lowerCase)-1)];    
    }
    return $output;
  }
  function _Uc($l){
    $output = '';
    for ($i = 0; $i < $l; $i++){
      $output .= $this->upperCase[mt_rand(0, strlen($this->upperCase)-1)];    
    }
    return $output;
  }
  function _Sp($l){
    $output = '';
    for ($i = 0; $i < $l; $i++){
      $output .= $this->specialChars[mt_rand(0, strlen($this->specialChars)-1)];      
    }
    return $output;
  }
  function _Nu($l){
    $output = '';
    for ($i = 0; $i < $l; $i++){
      $output .= $this->numbers[mt_rand(0, strlen($this->numbers)-1)];    
    }
    return $output;
  }

  function _setRand($str, $l){

    $output = '';
    for ($i = 0; $i < $l; $i++){
      $output .= $str[mt_rand(0, strlen($str)-1)];    
    }
    return $output; 
  }  
}
?>
Was it helpful?

Solution

You can eliminate repetition by sampling without replacement. If you want "some" repetition, you sample with repetition for some k values, append it to the original list, and sample without replacement.

Eg: lowerCase = 'qwertyuiopasdfghjklzxcvbnm'

sampledLowercase = Sample 5 letters from lowercase = 'abbcd'

Them sample from lowercase+sampledLowercase without replacement.

Here the probability of finding a repeated character is about 5/(26+5) = 5/31. The longer the length of sampledLowercase, higher the probability that there will be duplicate letters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top