Question

See the following snippet:

    Long first_begin = System.currentTimeMillis();

    // first nested loops
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 1000000; j++) {
            // do some stuff
        }
    }
    System.out.println(System.currentTimeMillis() - first_begin);
    // second nested loops
    Long seconde_begin = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        for (int j = 0; j < 10; j++) {
            // do some stuff
        }
    }
    System.out.println(System.currentTimeMillis() - seconde_begin);

I am wondering why the first nested loops is running slower than the second one?

Regards!

Important Note!: I am sorry that I made the variable j beginning with 1 accidentally when this question is first asked, I have made the correction.

Update:there is not any specific logic within the loops, I am just doing some test, actually this is a question asked during an interview and the interviewer hint me to change the order of loops to achieve better performance. BTW, I am using JDK1.5. after some test I am more confused now, because the result of program is not consistent---sometime the first loop running faster than the second one, but most of the time it's running slower than second one.

Was it helpful?

Solution

This answer is for the updated question:

  • If you're accessing two dimensional array such as int[][], the one with the larger value in the inner loop should be slower. Not by much but still. To somewhat understand the problem, read about Shlemiel the street painter in one of Joel's blog posts.
  • The reason you're getting inconsistent results is that you're not performing any JVM warmup. JVM constantly analyzes the bytecode that is run and optimizes it, usually only after 30 to 50 iterations it runs at optimal speed. Yes, this means you need to run the code first a couple of dozen times and then benchmark it from an average of another couple dozen runs because of Garbage Collector which will slow couple of runs.
  • General note, using Long object instead of long primitive is just dumb, JVM most likely optimizes it by replacing it with the primitive one if it can and if it can't, there's bound to be some (albeit extremely minor) constant slowdown from using it.

OTHER TIPS

EDIT: Original answer is below. Now that you've fixed the example so that all loop variables start at 0, we're back to simply not having enough information. It seems likely that it's a cache coherency / locality of reference issue - but we're just guessing. If you could provide a short but complete program which demonstrates the problem, that would help... as would telling us which language/platform we're talking about to start with!


The first loop has 10 * 999999 = 9999990 iterations. The second loop has 1000000 * 9 = 9000000 iterations. I would therefore expect (all other things being equal) the first loop to take longer.

However, you haven't indicated what work you're doing or what platform this is on. There are many things which could affect things:

  • The second loop may hit a cache better
  • If you're using a JIT-compiled platform, the JIT may have chosen to optimise the second loop more heavily.
  • The operations you're performing may themselves have caching or something like that
  • If you're performing a small amount of work but it first needs to load and initialize a bunch of types, that could cause the first loop to be slower

If you look at the generated byte code, the two loops are almost identical. EXCEPT that when it does the while-condition for the 10 loop, Java gets the 10 as an immediate value from within the instruction, but when it does the while-condition for the 1000000 loop, Java loads the 1000000 from a variable. I don't have any info on how long it takes to execute each instruction, but it seems likely that an immediate load will be faster than a load from a variable.

Note, then, that in the first loop, the compare against 1000000 must be done 10 million times while in the second loop it is only done 1 million times. Of course the compare against 10 is done much more often in the second loop, but if the variable load is much slower than the immediate load, that would explain the results you are seeing.

The question shifted. These are not the droids you seek...

Because you are doing ~1000000 times more work in the first example. ;-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top