سؤال

I need to generate around 9-100 million non-repeating random numbers, ranging from zero to the amount of numbers generated, and I need them to be generated very quickly. Several answers to similar questions proposed simply shuffling an array in order to get the random numbers, and others proposed using a bloom filter. The question is, which one is more efficient, and in case of it being the bloom filter, how do I use it?

هل كانت مفيدة؟

المحلول

You don't want random numbers at all. You want exactly the numbers 0 to N-1, in random order.

Simply filling the array and shuffling should be very quick. A proper Fisher-Yates shuffle is O(n), so an array of 100 million should take well under a second in C or even Java, slightly slower in a higher-level language like Python.

You only have to generate N-1 random numbers to do the shuffle (maybe up to 1.3N if you use rejection sampling to get perfect uniformity), so the speed will depend largely on how fast your RNG is.

You'll never need to look up whether a number has already be generated; that will deadly be slow no matter which algorithm you use, especially toward the end of the run.

If you need slightly fewer than N total numbers, fill the array from 0 to N-1, then just abort the shuffle early and take the partial result. Only if the amount of numbers you need is very small compared to their range should you consider the generate-and-check-for-dups approach. In that case Bob Floyd's algorithm might be good.

نصائح أخرى

As an alternative you could use an appropriately sized block cypher. Use the block cypher to encrypt the numbers 0, 1, 2, ... and you will get a series of non-repeating random numbers out. Exactly what series will depend on the key you use. They are guaranteed not to repeat, because a block cypher is a reversible permutation.

For 64 bit numbers use DES, for 32 bit use Hasty Pudding (which allows a large range of block sizes) or write your own simple Feistel cypher. Assuming that security is not a big issue for this, then writing your own is possible.

For sure its better create an algorithm to shuffle the numbers, if you use a seed, as for example, the server microtime or timestamp, you can have one different random string for each milisecond .

Start creating an array using range function, set number of numbers as you like . Than, you need to use a seed to make the pseudo-randomness better . So, instead of rand, you gotta use SHUFFLE, so you set array on range as 1 to 90, set the seed, than use shuffle to shuffle the array.. than you got all numbers in a random order (corresponding to the seed) . You gotta change the seed to have another result . The order of the numbers is the result . as .. ball 1 : 42 ... ball 2: 10.... ball 3: 50.... ball 1 is 0 in the array. ;) You can also use slice function and create a for / each loop, incrementing the slice factor, so you loop

  1. slice array 0,1 the the result .. ball 1...
  2. slice array 0.2 ball 2...
  3. slice array 0.3

Thats the logic, i hope you understand, if so .. it ill help you a lot .

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top