Pergunta

I've being studying the different possibilities to add values in a heap in Java. I'm using the PriorityHeap class. When I noticed slow running times in my application, I decided to give this a look. I'm adding several thousand and, sometimes, millions of custom entries (I have a custom class which has 3 fields: a int, a LongWritable and Text, both from hadoop.io; this instrumentation agent says my records have 200 bytes in average).

Is it obvious that using addAll() instead of add() method to put entries in the heap will improve performance, simply because this would avoid several heapify operations?

I tried with different strategies using the following new example:

package Sorting;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.PriorityQueue;

public class Main {

private static final int HEAP_SIZE = 1000000;
private static final int BULK_LIST_SIZE = HEAP_SIZE / 10;

private static String normal;
private static String bulk;
private static String fullBulk;

public static void main(String[] args) throws IOException {
normal = "";
bulk = "";
fullBulk = "";
long time = 0;

warmup();

normal = "";
bulk = "";
fullBulk = "";

for (int i = 0; i < 100; i++) {

    // Normal add time
    System.out.println("Normal starts...");
    time = normalExecution();
    System.out.println("Normal add time " + time);

    // Bulk add time
    System.out.println("Bulk starts...");
    time = bulk();
    System.out.println("Bulk add time " + time);

    // Bulk add time with list and heap with same size
    System.out.println("Full Bulk starts...");
    time = fullBulk();
    System.out.println("Full Bulk add time " + time);
}
System.out.println(normal);
System.out.println(bulk);
System.out.println(fullBulk);

}

private static long fullBulk() {
long time;
long start;
List<Double> fullBulkList = new ArrayList<Double>(HEAP_SIZE);
PriorityQueue<Double> fullBulkHeap = new PriorityQueue<Double>(HEAP_SIZE);

start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
    if (fullBulkList.size() == HEAP_SIZE) {
    fullBulkHeap.addAll(fullBulkList);
    fullBulkList.clear();
    }
}
fullBulkHeap.addAll(fullBulkList);
time = System.nanoTime() - start;

fullBulk = fullBulk + "\t" + time;
fullBulkList = null;
fullBulkHeap = null;
return time;
}

private static long bulk() {
long time;
long start;
List<Double> bulkList = new ArrayList<Double>(BULK_LIST_SIZE);
PriorityQueue<Double> bulkHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
    if (bulkList.size() == BULK_LIST_SIZE) {
    bulkHeap.addAll(bulkList);
    bulkList.clear();
    }
}
bulkHeap.addAll(bulkList);
time = System.nanoTime() - start;
bulk = bulk + "\t" + time;
bulkList = null;
bulkHeap = null;
return time;
}

private static long normalExecution() {

long time;
long start;
PriorityQueue<Double> normalHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
    normalHeap.add(Double.MAX_VALUE);
}
time = System.nanoTime() - start;
normal = normal + "\t" + time;
normalHeap = null;
return time;
}

private static void warmup() {
System.out.println("Starting warmup");
for (int i = 0; i < 1000; i++) {
    normalExecution();
    bulk();
    fullBulk();
}
for (int i = 0; i < 1000; i++) {

    bulk();
    fullBulk();
    normalExecution();
}
for (int i = 0; i < 1000; i++) {

    fullBulk();
    normalExecution();
    bulk();
}
System.out.println("Warmup finished");
}

}

Which resulted in the following results:

enter image description here

The huge peak in the 11th iteration of the normal add method is explained by a GC call: [GC 1347684K->31354K(1446400K), 0.0331610 secs].

Mediam values are 16049669, 783724 and 800276 respectively. ST dev are 3512492.89, 244374.17 and 33344.17.

Foi útil?

Solução

PriorityQueue does not override the method addAll inherited from AbstractQueue.

In AbstractQueue this method looks like this.

public boolean addAll(Collection<? extends E> c) {
    if (c == null)
        throw new NullPointerException();
    if (c == this)
        throw new IllegalArgumentException();
    boolean modified = false;
    for (E e : c)
        if (add(e))
            modified = true;
    return modified;
}

As you can see, it just loops and calls add.

So I don't think addAll will improve anything compared to add.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top