How can it be detected that a number generator is not really random?

https://cs.stackexchange.com/questions/7729

16-10-2019
|

Pergunta

I heard that random number generation in computers isn't really random, but there is no efficient algorithm to detect it. How can it be detected at all ?

Solução

Computers Being Really Random:

True randomness is impossible for Turing Machines in a theoretical sense, and most computers can't generate truly random output. Therefore, some modern computers include hardware that allows the computer to access an outside source which will hopefully include some randomness. One example of how this can be accomplished is to track small fluctuations in temperature inside the computer. Randomness can be obtained from an outside source as well. But from the tone of your post I don't think that outside sources of randomness are what you're interested in.

Seeds:

Without an outside addition, everything a computer does is deterministic. This leads to a big issue: if you call a random number generation program, it will give you the same result every time if you give it the same input. Clearly, we need a program that outputs a random number to change its behavior each time it's run (otherwise we'll keep getting the same "random" number, which is not particularly helpful). One idea is to give the program some input, which changes each time the program is run, so that a different number will be output. We call this input a "seed." The random number generator needs to take in a seed, perform some operations, and give us a random number.

The current system time is a classic example of a seed. This gives a long string with high entropy, and if the time is kept track of in a sufficiently granular fashion (i.e. if your system clock uses hours then "time" is a pretty poor seed), you're unlikely to feed the pseudorandom number generator the same number twice.

Algorithms that are Random Enough:

Now we have an algorithm that at least has some way to be different each time it's run. We give it a seed, and while the algorithm gives the same number when prompted with the same seed, we want the numbers it generates to be random otherwise. This acts like the above--you take in some input, and it produces some (hopefully different enough from the input to be "random") output.

Now let's say you came up with your own algorithm to do this, and you claim that the numbers you come up with are pretty close to random when you gave it a bunch of different seeds. How would we test how good it is?

Now we want some algorithm that will take in a seed, do some operations, and produce a random number. At the simplest, the algorithm could just output the seed--it's not giving us the same number each time, and random seeds give us random outputs. But clearly that's not what we want. On the other hand, an algorithm can be fairly complicated, like many actual pseudorandom generators. How can we tell which algorithms give us "random" numbers from our not-necessarily-random seeds? If we can't get it exactly, how can we tell which are best?

It's hard to tell which of those tests are ideal, but it's easy to come up with some minimum requirements these algorithms should meet before we say they give us "random" numbers. Maybe we want to make sure that your algorithm gives even numbers half the time. Maybe we want to make sure that if I ask for a random number between $1$ and $n$, all numbers in that range will be output for some input to your function. Clearly there are a lot of tests we can run; if your algorithm passes some suite of tests it's a pseudorandom generator. Which tests to use is a very interesting and well used/studied area of computer science.

Random Enough to Fool an Attacker:

Now what you MAY be referring to is Cryptographially Secure Pseudorandom Generators. I think the best way to explain this is in the context of the above--here, we're using our randomness for cryptography, so when we're designing tests what we really care about is that someone won't be able to break our security by predicting what random number we picked. I don't know your level of familiarity with cryptography, but imagine we're doing a simple replacement cypher---each letter is replaced with some other letter. We want to pick these replacements randomly, so they're hard for an attacker to guess. But if he can figure out how my random number generator works, he'll be able to solve the whole cipher! Therefore, cryptographic algorithms require random number generators that are specifically hard to guess. Specific cryptographic algorithms may require additional tests (like for some sort of nice-enough distribution as mentioned above).

For this reason, CSPRGs are defined in terms of how well other algorithms solve them (which is where we finally come to your question). Specifically, let's say I have a CSPRG which I'll call R. R is a CSPRG if and only if there is NO feasible algorithm that can guess which bit it will output next. This is true even if you know all the previous bits it output!

So let's say that the first five bits my CSPRG has output are 10100. You don't know the input I used to the program, but you have access to the code I used to write my CSPRG. Then the claim is that it's impossible for you to write a program to decide if the next bit output will be 101000 or 101001.

So for reasons of cryptography, sometimes how well a pseudorandom number generator does is defined in terms of how predictable it is to other programs. Note that this still gives much of the intuition of "randomness," as (say) if you know all of the random outputs will be odd it is neither cryptographically secure nor does it pass a common-sense randomness test.

Outras dicas

Recently I found a nice post about randomness in computation on the MIT CSAIL Theory of Computation Group blog: Can you tell if a bit is random?

The post starts with some ideas extracted from a Avi Wigderson's wonderful talk about the power and limitations of randomness in computation, surveying the beautiful area of randomized algorithms, and the surprising connection between pseudorandomness and computational intractability.

Then it summarizes some recent results on quantum cryptography; in particular the way to efficiently test if the output of a certain kind of device is truly random (randomness expansion protocols).

For example see the recent work by Umesh Vazirani, Thomas Vidick, Certifiable Quantum Dice (Or, testable exponential randomness expansion)

Abstract: We introduce a protocol through which a pair of quantum mechanical devices may be used to generate n bits of true randomness from a seed of O(log n) uniform bits. The bits generated are certifiably random based only on a simple statistical test that can be performed by the user, and on the assumption that the devices obey the no-signaling principle. No other assumptions are placed on the devices' inner workings....

Assuming you are talking about statistical randomness -- cryptography has other needs! -- there is a whole slew of goodness-of-fit tests that can detect whether a sequence of numbers fits a given distribution. You can use these to test whether a (pseudo) random number generator is sound (up to the quality of your test and the chosen significance).

Diehard test suites combine different methods.

This is a broad/complex topic in computer science which the other answer by SamM addresses some. Your specific question seems to be about if computers have what are called PRNGs, ie pseudo random number generators, how can one detect that?

The short answer is that nontrivial PRNGs are built so that their algorithms cannot be detected (derived). In general, if the PRNG is what is called "secure", even if an attacker knows the algorithm used to generate the pseudorandom sequence, they cannot guess the particular parameters used to generate the sequence. In this way pseudorandomness has many deep ties to cryptography, and one can talk about "breaking" a PRNG in much the same way that a cryptographic algorithm can be "broken". There are many research papers in this area, its an active area at the forefront of cryptography.

For "trivial" PRNGs, eg say a linear congruential generator, if the attacker knows the algorithm used to generate it and it is not generated with "bignums", the search space is "relatively small" and the attacker could theoretically also find the parameters used by the particular PRNG basically by brute force and trying all combinations.

PRNGs can be broken in practice (again depending on their "security") in some cases by running a large suite of statistical randomness tests against them. eg this is the rationale of the program "Dieharder" (by Brown). There is also an NIST suite.

The intrinsic difficulty/hardness of breaking PRNGs is not yet strictly theoretically proven but is basically associated with what are called "trapdoor" or "one-way functions" which can be computed efficiently in one direction but are "hard" to invert (reverse). There are some open problems in cryptography about randomness hardness. These questions relate closely to complexity class separations eg the famous P=?NP question.

Questions about breaking PRNGs also relate to Kolmogorov complexity, a field which studies the smallest Turing Machines that can generate sequences. breaking the PRNG also closely relates finding the "shortest" program to compute a pseudorandom sequence. And Kolmogorov complexity is undecidable to compute in general.

As Gilles points out in a comment there do exist hardware-based RNGs built out of physical electronic processes such as related to quantum noise. these if engineered correctly are unbreakable.

In fact everything a classical computer do is deterministic, in the sense that when you give them some tasks it follows them in a deterministic way. Therefore if you want to have one random number you can compute it accordingly to the time (based on the user's input time), but if you want to have a set of random numbers you can not use the time for the next numbers, because the numbers would no more be independent.

What people do is to use pseudo-random generators which have a seed, i.e. a number that is used to compute all the numbers of the pseudo-random number generator (in some more sophisticated cases of simulation or other tasks, more seeds may be needed, if more than one set of independently random numbers is needed). The seed is usually 0 or a specific number if you want reproducible results, or the time if you and different unreproducible results.

The fact that the pseudo-random number generators are good enough, lies in the fact that they follow "the basic characteristics of a pseudo-random numbers generation", in order to be computeded efficiently and behave like real random numbers:

the produced numbers must follow an uniform distribution (from this distribution you can achieve any other distribution);
the produced numbers must be statiscally independent;
the sequence is reproductible (this point is imposed because of that property of the hardware of a classical computer, and it's why they are called "pseudo-random numbers");
the period of the sequence must be large enough;
the numbers generation must be fast.

From each number of the sequence of pseudo-random numbers a new number is computed (usually we work with integers). However there is a period, n, in a sequence of pseudo-random number generators prepared to work in a specific base with finite number of available bits to express the numbers (eg. binary). If this n wouldn't be big enough there would be serious problems, but don't worry, the computer scientists choose the seeds and other parameters of the pseudo-random generators well, in order to have a good n.

For instance, a possible pseudo-random number generator, with the linear congruential method, which is one of the oldest and best-know pseudo-random number generators algorithmns can be defined accordingly to:

it has four values:
- x_0 ≥ 0
- a ≥ 0
- c ≥ 0
- m > x_0, where:

x0 is the initial value, a, c, and m are constants where: m > a, m > c, and it produces the sequence with the fornula:

x_{i+1} = (a*x_i+c) MOD m

The values for these constants must be carefully chosen. One possibility is:

x_{i+1} = (1664525*x_i + 1013904223) MOD 2^32, refs.[1-2]

There are other algorithms more sophisticated to generate random numbers, which avoid some of the problems of previous algorithms, that included: [3]

shorter than expected periods for some seed states (such seed states may be called 'weak' in this context);

lack of uniformity of distribution for large amounts of generated numbers;

correlation of successive values;

poor dimensional distribution of the output sequence;

the distances between where certain values occur are distributed differently from those in a random sequence distribution.

In the future, the classical computers may be united to quantum systems which can provide really random numbers, and deliver them. [4]

references:
[1] http://en.wikipedia.org/wiki/linear_congruential_generator
[2] William H., et al. (1992). "Numerical recipes in fortran 77: The art of scientific computing" (2nd ed.). ISBN 0-521-43064-X.
[3] http://en.wikipedia.org/wiki/pseudorandom_number_generator
[4] http://www.technologyreview.com/view/418445/first-evidence-that-quantum-processes-generate-truly-random-numbers/

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange