質問

My problem is the following: I need to shuffle an array and then get just the first N elements.

I am currently shuffling the whole array, that has 50+ elements, but this gives me performance problems, since the shuffling routine is called 1e+9 times.

I am currently implementing the Fisher-Yates algorithm to shuffle:

public static void shuffle(int[] array) {
    Random gen = new Random();
    for (int i = array.length - 1; i > 0; i--) {
        int index = gen.nextInt(i + 1);
        int a = array[index];
        array[index] = array[i];
        array[i] = a;
    }
}

Then I select just the first N elements.

I have also tried using Reservoir sampling, but it just saved me 1 second. That's not enough, since my program runs for 30 secs. Also, I might have implemented it incorrectly, because I am not getting the same results when compared to the Fisher-Yates algorithm. This is my implementation:

public static int[] shuffle(int[] array, int N) {
    int[] ret = new int[N];
    for (int i = 0; i < N; i++) {
        ret[i] = array[i];
    }
    Random gen = new Random();
    int j;
    for (int i = N; i < array.length; i++) {
        j = gen.nextInt(i+1);
        if (j <= N - 1)
            ret[j] = array[i];
    }
    return ret;
}

To conclude, what I need is a a shuffling algorithm that would pick N random elements using a search of length N, instead 50+. If not possible, something better then Fisher-Yates and Reservoir sampling.

Note-1: Altering the original "int[] array" is not a problem.

Note-2: N is usually around 10.

役に立ちましたか?

解決

A simple way to get N shuffled elements from the array is as follows:

  1. Pick a random element r.
  2. Add element r to the output.
  3. Move the last element of the array to the position of r and shrink the array size by 1.
  4. Repeat N times.

In code:

public static int[] shuffle(int[] array, int N) {
    int[] result = new int[N];
    int length = array.length;

    Random gen = new Random();

    for (int i = 0; i < N; i++) {
        int r = gen.nextInt(length);

        result[i] = array[r];

        array[r] = array[length-1];
        length--;
    }

    return result;
}

This algorithm has the advantage over FY that it only computes the first N elements of the shuffled array, rather than shuffling the whole array.

Your optimized algorithm is not optimal for two reasons:

  1. The first N elements are never shuffled. For instance, element 0 can never appear at position 1 in the shuffled array.
  2. You're still doing a lot of work. If N=10 and the total array length is 1000000, you're still computing about 1000000 random values, while you only need 10.

他のヒント

You can modify FY by only doing N iterations, and then taking the last N elements of the array. Alternatively, build the shuffled area at the start of the array, rather than the end, and take the first N elements. However, that will only save iterations of the actual shuffle, and the gain there will be a factor of N/50. That may not be enough.

Depending on the situation, you may be able to generate e.g. a million shuffled decks, and then pick one of them at random for each of the 1e9 outer iterations. That would do one random number generation per outer iteration, plus one thousandth of the current shuffle calls.

Do try not to reinvent the wheel, when Collection.shuffle exists:

public static int[] shuffle(int[] array, int N) { 
    List<Integer> list = new ArrayList<Integer>();
    for (int r : array) {
       list.add(r);
    }
    Collections.shuffle(list);
    Integer[] ret = list.toArray(); // Integer is int, except autoboxed 
    return ret;
}
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top