سؤال

Is it better to pass the size of Collection to Collection constructor if I know the size at that point? Is the saving effect in regards to expanding Collection and allocating/re-allocating noticable?

What if I know minimal size of the Collection but not the upper bound. It's still worth creating it at least with minimal size?

هل كانت مفيدة؟

المحلول

Different collections have different performance consequences for this, for ArrayList the saving can be very noticeable.

import java.util.*;
public class Main{
public static void main(String[] args){
  List<Integer> numbers = new ArrayList<Integer>(5);
  int max = 1000000;
  // Warmup
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }

  long start = System.currentTimeMillis();
  numbers = new ArrayList<Integer>(max);
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }
  System.out.println("Preall: "+(System.currentTimeMillis()-start));

  start = System.currentTimeMillis();
  numbers = new ArrayList<Integer>(5);
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }
  System.out.println("Resizing: "+(System.currentTimeMillis()-start));

}
}

Result:

Preall: 26
Resizing: 58

Running with max set to 10 times the value at 10000000 gives:

Preall: 510
Resizing: 935

So you can see even at different sizes the ratio stays around the same.

This is pretty much a worst-case test but filling an array one element at a time is very common and you can see that there was a roughly 2*speed difference.

نصائح أخرى

OK, here's my jmh code:

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(3)
public class Comparison
{
  static final int size = 1_000;
  @GenerateMicroBenchmark
  public List<?> testSpecifiedSize() {
    final ArrayList<Integer> l = new ArrayList(size);
    for (int i = 0; i < size; i++) l.add(1);
    return l;
  }

  @GenerateMicroBenchmark
  public List<?> testDefaultSize() {
    final ArrayList<Integer> l = new ArrayList();
    for (int i = 0; i < size; i++) l.add(1);
    return l;
  }
}

My results for size = 10_000:

Benchmark             Mode Thr    Cnt  Sec         Mean   Mean error    Units
testDefaultSize       avgt   1      9    1       80.770        2.095  usec/op
testSpecifiedSize     avgt   1      9    1       50.060        1.078  usec/op

Results for size = 1_000:

Benchmark             Mode Thr    Cnt  Sec         Mean   Mean error    Units
testDefaultSize       avgt   1      9    1        6.208        0.131  usec/op
testSpecifiedSize     avgt   1      9    1        4.900        0.078  usec/op

My interpretation:

  • presizing has some edge on the default size;
  • the edge isn't that spectacular;
  • the absolute time spent on the task of adding to the list is quite insignificant.

My conclusion:

Add the initial size if that makes you feel warmer around the heart, but objectively speaking, your customer is highly unlikely to notice the difference.

All collections are auto-expanding. Not knowing the bounds will not affect their functionality (until you run into other issues such as using all available memory etc), it may however affect their performance.

With some collections. Most notably the ArrayList, auto expanding is expensive as the whole underlying array is copied; array lists are default sized at 10 and then double in size each time they get to their maximum. So, say you know your arraylist will contain 110 objects but do not give it a size, the following copies will happen

Copy 10 --> 20
Copy 20 --> 40
Copy 40 --> 80
Copy 80 --> 160

By telling the arraylist up front that it contains 110 items you skip these copies.

An educated guess is better than nothing

Even if you're wrong it doesn't matter. The collection will still autoexpand and you will still avoid some copies. The only way you can decrease performance is if your guess is far far too large: which will lead to too much memory being allocated to the collection

In the rare cases when the size is well known (for example when filling a know number of elements into a new collection), it may be set for performance reasons.

Most often it's better to ommit it and use the default constructor instead, leading to simpler and better understandable code.

For array-based collections re-sizing is a quite expensive operation. That's why pass exact size for ArrayList is a good idea.

If you set up size to a minimal size(MIN) and then add to the collection MIN+1 elements, then you got re-sizing. ArrayList() invokes ArrayList(10) so if MIN is big enough then you get some advantage. But the best way is to create ArrayList with expecting collection size.

But possibly you prefer LinkedList because it has no any costs for adding elements (although list.get(i) have O(i) cost)

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top