Question

I was toying around with arrays, populating with pseudo random numbers, finding minimum and maximum values and their indices and number of occurrences and I noticed something strange - when using srand seeded with time the number of minimum and maximum value occurrences is ALWAYS equal. This doesn't seem very random to me.

Is there an alternative way of getting a DIFFERENT number of minimum and maximum value occurrences, as one would expect with random numbers?

Here is my code (I am learning so it may be messy and inefficient, recommendations are welcome)

#include <cstdlib>
#include <iostream>
#include <time.h>

using namespace std;

void findExtremes( const int[], int); 

int main()
{
    const int lenght = 2000; //define lenght

    int array1[lenght];

    srand(time(0)); 

    for ( int i = 0; i < lenght; i++) //populate array with random numbers and print them out
    {
        array1[i] = rand() % 3000;
        cout << "Index " << i << " = " << array1[i] << endl;
    }

    findExtremes(array1, lenght);   // call fn

    return 0;
}

void findExtremes( const int array[], int size)
{
     int maxV, minV, maxI, minI;
     maxV = array[0];
     minV = array[0];
     minI = 0;
     maxI = 0;

     for ( int i = 1; i < size; i++)
     {
         if ( array[i] > maxV)
            {
                maxV = array[i];
                maxI = i;
            }
         if ( array[i] < minV)
            {
                minV = array[i];
                minI = i;
            }
     }

     //find the number of occurances for min and max values

     int minOcc = 0;
     int maxOcc = 0;

     for ( int i = 1; i < size; i++)
     {
             if (array[i] == minV)
                 minOcc++;
             if (array[i] == minV)
                 maxOcc++;
     }

     //output

     cout << "\nMinmim value is index " << minI << " with value " << minV << " and " << minOcc << " occurances" << endl;
     cout << "\nMaxium value is index " << maxI << " with value " << maxV << " and " << maxOcc << " occurances" << endl << "\n";  
}
Was it helpful?

Solution

For a start, they're actually pseudo random numbers, not random numbers. In any case, it may be that a a truly random sequence has that exact property that you're seeing :-) The sequence 1,1,1,1,1 is as equally likely to occur in a truly random set as much as 5,2,4,2,99.

If you want a "more random" random sequence, I wouldn't be using the normal ones shipped with C libraries (unless those libraries were written by people who understand randomness) - you should look into things like the Mersenne Twister, using /dev/random (if under Linux) and so on.

You may also want to look at this snippet of your code.

if (array[i] == minV)
    minOcc++;
if (array[i] == minV)
    maxOcc++;

I believe that last if should be comparing with maxV rather than minV. Otherwise there is zero chance that your minimum and maximum counts will be different.

When I make that change (and change % 3000 to % 30, to get a range of duplicates), I see:

Minmim value is index 112 with value 0 and 65 occurances
Maxium value is index 24 with value 29 and 58 occurances

And, not that it really matters in terms of this question, you may want to clean up your spelling somewhat:

  • lenght -> length.
  • minmum -> minimum
  • maxium -> maximum
  • occurances -> occurrences

OTHER TIPS

I perform numerical simulations on Physics and my group uses the GSL library for that:

#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

class Random
{
private:
    gsl_rng* r; //!< Pointer to the gsl rng
public:
    //! Constructor: uses argument as the seed
    Random(long unsigned int seed);

    long int R(int N);
    long double R();
    long double gaussianR(long double sigma);
};

inline Random::Random(long unsigned int s)
{
    r = gsl_rng_alloc( gsl_rng_taus );
    gsl_rng_set(r, s); //seed to use to the pseudo-aleatory number generator.
}

// a uniform number between 0 and N-1
inline long int Random::R(int N)
{
    return gsl_rng_uniform_int (r, N);
}

// a uniform number between 0 and 1
inline long double Random::R()
{
    return gsl_rng_uniform_pos( r );
}

// a gaussian distribution with sigma
inline long double Random::gaussianR(long double sigma)
{
    return gsl_ran_gaussian(r, sigma);
}

you have to compile it with flags: OTHER_LDFLAGS = -lgsl -lm -lgslcblas

and add includes and libs (this is for fink installation case):

HEADER_SEARCH_PATHS = /sw/include LIBRARY_SEARCH_PATHS = /sw/lib

Hope this helps.

You can use the new random library included in C++11, or you can use the Boost::Random library that it was based off.

The behaviour of your Pseudo-Random Number Generator (PRNG) is perfectly normal.

In fact, if your draw enough numbers from rand(), you will always get the same extrema, since it is uniformly distributed.

In your case, the question is: do you need another behaviour? You should not pounce on True Random Numbers as @sehe suggests. This might be useless, and even problematic when dealing with stochastic simulations, which Monte Carlo algorithms are. Imagine that you want to debug a code snippet based upon random numbers, or that of colleague of yours intends to check your results: how would you do if you were not able to reproduce the same random sequence?

That is one of the reason why PRNGs are sufficient and often preferred when you do not need crypto-secure random numbers.

I think the problem is that your initial statement is wrong. The code is provides different numbers each time. I tried you unmodified code and there are the results:

Minmim value is index 1194 with value 0 and 1 occurances
Maxium value is index 1264 with value 2995 and 1 occurances

Minmim value is index 1958 with value 1 and 1 occurances
Maxium value is index 1510 with value 2991 and 1 occurances

...

However, there are two bugs in the code:

  • In the second for loop, you should start with i = 0.
  • You should compare with maxV instead of minV in the same loop.

With regards to the random number generation:

  • When seeded with the same number, a series of rand() calles should return the same numbers. rand() is not for random numbers, but for pseudo-random numbers. rand() should be have this way because than e.g. a simulation will output the same results when started with the same seed. It is a very nice property.
  • You seed it with the current time, which is ok and therefore rand() should return a different series of numbers each time (at least when not called multiple times a second). The seeding looks good to me. It is in fact very similar to the example provided here.
  • The sample size is 2000 and the range of generated numbers is 3000. This means that it is not probable that the minimal size and the maximal size are always the same. If the sample size would be a million, with a high probability 2999 should be the largest number in the very most runs.

Gentlepeople: NOTE

Yes! This answer is "old". And in the era of c++11, by all means use c++11 <random>. But please don't downvote this question, years after the fact because you think "Ew Everybody knows rand() is evil!". In fact, it is not. It's just limited, and very easy to use inappropriately. But - as an historic fact, it exists as an API and it is still useful to document how you can use it better. I'm not deleting this answer for very reason.

Original answer:


Please read

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

Notably, don't write rand() % 3000. Write

 int r = rand() / ( RAND_MAX / 3000 + 1 );

In fact, random should be uniformly distributed, meaning that indeed the lower and upper bound would have near 100% chance of occurring when the number of samples is large enough (larger than the size of the domain, for starters).

That's what true random is about (try to do a Monte-Carlo algorithm without it - you'd be very unhappy)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top