Understanding why/how Java's native sorting of an int array is optimized on successive sorts...so I can stop it

https://stackoverflow.com/questions/17774073

03-06-2022
|

سؤال

My program in a nutshell:

I have a program that successively runs several sort algorithms against an array of ints, timing each. The GUI allows the user to select array size and a variety of random number ranges with which to fill the array to be sorted. Each click of the "Sort" button grabs the user's array values, constructs a new array, then creates for each sort algorithm a clone of the array, using .clone().

The problem:

when "sort" button is clicked a second time the sorts improve themselves.

Somewhere there's optimization happening that I don't understand.

The reason this is an issue: if the user doesn't change their array settings and runs the sort methods again, it is true that a new array is constructed with new random numbers but the random number ranges remain the same so the run time, over a list of 125k, should remain about the same....not improve 300%.

So here is a demo program of what I am facing. It uses only one sort, Java's native sort to demonstrate the issue. It also uses hard coded values for constructing the random int array to be sorted - but does so with each "enter" press. I think this simulation accurately reflects my program because the same "error" is happening here, too.

Why is the sort faster the second time?

...the array is rebuilt with new values for each run, so how can it get faster?

package sortTooFast;

import java.util.Arrays;
import java.util.Scanner;

public class SortTooFast {
    public static final int ARRAY_SIZE = 500000;
    public static final int MIN_RANGE = 0;
    public static final int MAX_RANGE = 100;
    public static final int INCLUSIVE = 1;
    
    int[] sortingArray;

    public static void main(String[] args) {
        SortTooFast test = new SortTooFast();
        test.run();
    }
    
    // Run program.
    public void run(){
        while(true){    
            
            // Assign int[] filled with random numbers.
            sortingArray = getArray();                  

            // Inform user.
            System.out.println("\nPress return key to run sort!");
            
            // Wait for user.
            new Scanner(System.in).nextLine();
            
            System.out.println("First 15 elements to be sorted:");
            
            // Print a small section of the array; prove not sorted
            for (int i = 0; i < 15; i++){
                System.out.printf("%4d", sortingArray[i]);
            }
            
            // Perform sort.
            runNativeSort(sortingArray);
        }
    }

    // Run native java sort.
    private void runNativeSort(int[] array) {
        // Start timer
        long startTime = System.currentTimeMillis();
        
        // Perform sort.
        Arrays.sort(array);
        
        // End timer
        long finishTime = System.currentTimeMillis();
        
        // Running time.
        long runTime = finishTime - startTime;
        
        // Report run time.
        System.out.println("\nRun time: " +runTime);        
    }

    // Obtain an array filled with random int values.
    private int[] getArray() {
        // Make int array.
        int[] mArray = new int[ARRAY_SIZE];
        
        // Length of array.
        int length = mArray.length;
        
        // Fill array with random numbers.
        for(int counter = 0; counter < length; counter++){
            int random = MIN_RANGE + (int)(Math.random() * ((MAX_RANGE - MIN_RANGE) + INCLUSIVE));
            mArray[counter] = random;
        }
        return mArray;
    }
}

المحلول

Why is the sort faster the second time?

Because by that time, the JIT has optimized the bytecode into faster native code.

There are two effects you need to counter when benchmarking this sort of thing:

Time taken to JIT the code in the first place
The way that the native code improves over time as it is optimized harder and harder by the JIT.

Typically you can reduce the effect of this to achieve a steady state by running the code for long enough to get it fully optimized before you start timing.

Additionally, you should use System.nanoTime instead of System.currentTimeMillis when benchmarking: System.currentTimeMillis is meant to give you a reasonably accurate "wall clock" time, which may be adjusted by the operating system if it notices that the clock is out of sync, whereas nanoTime is specifically designed for measuring elapsed time since a particular instant, regardless of changes to the system clock.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow