Question

My teacher told me the only way to have random non-correlated numbers in awk between consecutive runs of the same script, is saving the last seed used to file and then read it again when you start another execution.

So far I'm trying with this:

BEGIN {
    getline seed < "myseed.txt";
    srand(seed);
    print rand();
    print rand();
    print srand() > "myseed.txt";
}

but the only content of myseed.txt is always 0 and I get the same random numbers at every execution.

Any ideas on how to save the status of the internal random number generator and then resuming generating random numbers exactly where it stopped between different executions of the same script?

Was it helpful?

Solution

There seems to be some confusion about what the seed actually is. It's a value used as a parameter to initialize the random number generator, and that's it; it does not contain the current state of the random number generator. If you call srand(N) with any N, and then generate some random numbers, and then call srand() again (with any seed), the return value will be that original N, no matter how many random numbers you have generated in the meantime.

If the goal is to have a reproducible string of numbers, allowing for interruptions, then you need to save not only the seed but the number of random numbers generated so far since the seeding, and generate and discard that many at the start of the next run. This is a terrible hack and you would probably be better implementing your own RNG or, more likely, using something other than awk.

If the goal is to make sure not to generate the same sequence even if run twice within a single second, then you should read the discussion at the link Ed Morton posted in his comment.

OTHER TIPS

Either I misread your question or you are doing the complete opposite of what you are trying to achieve. If you read the seed from a file, use it, write the same seed back to the file for re-use the next time, you will always have the same seed, thus always produce the same run of numbers.

$ awk 'BEGIN{srand(1234);print rand()}' && sleep 1 && awk 'BEGIN{srand(1234);print rand()}'
0.240849
0.240849

If you don't provide a seed for srand() it will use seconds since epoch:

$ awk 'BEGIN{srand();print rand()}' && sleep 1 && awk 'BEGIN{srand();print rand()}'
0.2776
0.668099

... thus it will only produce new numbers once a second. You can work around this by using $RANDOM (if your shell supports it; changes on each use) or nanoseconds since epoch (if your date supports it) as seed, e.g.:

$ awk -v seed="$RANDOM" 'BEGIN{srand(seed);print rand()}' && awk -v seed="$RANDOM" 'BEGIN{srand(seed);print rand()}'
0.661197
0.325718

$ awk -v seed="$(date +%N)" 'BEGIN{srand(seed);print rand()}' && awk -v seed="$RANDOM" 'BEGIN{srand(seed);print rand()}'
0.588395
0.911353

srand(seed) takes an integer value for "seed"

rand() returns a decimal number >= 0 and < 1.

So, saving the output of rand() to use directly as the next seed will result in srand() always using a seed of zero. Look (remember a call to srand() returns the PREVIOUS seed used):

$ awk -v s=0.1 'BEGIN{srand(s); print rand(); print srand()}'
0.566305
0
$ awk -v s=0.9 'BEGIN{srand(s); print rand(); print srand()}'
0.566305
0

So, if you want to use the output of rand() as the next seed you need to multiply it by whatever number you want your seed to be less than, e.g. 1,000,000:

$ awk 'BEGIN{
    if ( (getline s < "myseed.txt") > 0 ) {
        print "seed read from file =", s
        srand(1000000*s)
    }
    else {
        print "using default seed of seconds since epoch"
        srand()
    }
    r = rand()
    print r > "myseed.txt"

    print "seed actually used by srand() was:", srand()
    print "rand() =", r
}'
using default seed of seconds since epoch
seed actually used by srand() was: 1368282525
rand() = 0.331514

$ awk 'BEGIN{
    if ( (getline s < "myseed.txt") > 0 ) {
        print "seed read from file =", s
        srand(1000000*s)
    }
    else {
        print "using default seed of seconds since epoch"
        srand()
    }
    r = rand()
    print r > "myseed.txt"

    print "seed actually used by srand() was:", srand()
    print "rand() =", r
}'
seed read from file = 0.331514
seed actually used by srand() was: 331514
rand() = 0.677688

$ awk 'BEGIN{
    if ( (getline s < "myseed.txt") > 0 ) {
        print "seed read from file =", s
        srand(1000000*s)
    }
    else {
        print "using default seed of seconds since epoch"
        srand()
    }
    r = rand()
    print r > "myseed.txt"

    print "seed actually used by srand() was:", srand()
    print "rand() =", r
}'
seed read from file = 0.677688
seed actually used by srand() was: 677688
rand() = 0.363388

On the first execution the file "myseed.txt" didn't exist.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top