Domanda

I'm working through Programming Peals, and the first essay deals with sorting numbers within a known range. As a smart solution, they offer implementing a bitmap, setting all numbers in the input file to one in the bitmap, and then simply iterating over it to print the result. The assumption is that this should be a lot faster than a more traditional sorting algorithm like quicksort or mergesort.

To test this out, I wrote the bitmap sort myself, in Java. I was not too surprised when I found out that the Unix sort command, which uses merge sort, is still a lot faster. I attributed this to the fact that it is written in C, and probably highly optimized by some very smart folks.

So, then I wrote my own merge sort, also in Java. To my surprise, my BitmapSort was faster, but only marginally. With a very large input file (+-800000 integers), bitmapsort is only about 30% faster.

Here is my bitmap sort and bitmap implementation:

import java.util.Scanner;
import java.io.FileReader;
import java.io.File;

class BitmapSort {
    Scanner sc;

    BitmapSort() throws Exception {
        sc = new Scanner(new File("numbers.txt"));
    }

    void start() {
        BitMap map = new BitMap(3000000);
        while (sc.hasNextInt()) {
            map.set(sc.nextInt());
        }
        for (int i = 0; i < 3000000; i++) {
            if (map.isSet(i)) {
                System.out.println(i);
            }
        }
    }

    public static void main(String[] args) throws Exception {
        new BitmapSort().start();
    }
}


class BitMap {

    byte[] bits;
    int size;


    BitMap(int n) {
        size = n;
        bits = new byte[(int) Math.ceil((double) n / (double) Byte.SIZE)];
        for (Byte b : bits) {
            b = 0;
        }
    }

    private String toBinary(byte b) {
        return String.format(Integer.toBinaryString(b & 0xFF)).replace(' ', '0');
    }

    void set(int i) {
        int index = i / Byte.SIZE;
        bits[index] = (byte) ((bits[index] | (byte) (1 << (Byte.SIZE - 1 - (i % Byte.SIZE)))));
    }

    void unset(int i) {
        int index = i / Byte.SIZE;
        bits[index] = (byte) ((bits[index] ^ (byte) (1 << (Byte.SIZE - 1 - (i % Byte.SIZE)))));
    }

    boolean isSet(int i) {
        int index = i / Byte.SIZE;
        byte mask = (byte) ((bits[index] & (byte) (1 << (Byte.SIZE - 1 - (i % Byte.SIZE)))));
        return (bits[index] & mask) != 0;
    }

}

and here is my mergesort:

import java.util.Scanner;
import java.io.FileReader;
import java.io.File;

class MergeSort {
    Scanner sc;
    static int times;

    MergeSort() throws Exception {
        sc = new Scanner(new File("numbers.txt"));
        times = 0;
    }


    int[] mergeSort(int[] input) {
        if (input.length <= 1) {
            return input;
        }

        int middle = input.length / 2;

        int[] left = new int[middle];
        int[] right;
        if (input.length % 2 == 0) {
            right = new int[middle];
        } else {
            right = new int[middle + 1];
        }

        for (int i = 0; i < middle; i++) {
            left[i] = input[i];
        }
        for (int i = middle; i < input.length; i++) {
            right[i - middle] = input[i];
        }
        left = mergeSort(left);
        right = mergeSort(right);
        return merge(left, right);
    }

    int[] merge(int[] left, int[] right) {
        times++;
        int[] result = new int[left.length + right.length];
        int left_size = 0;
        int right_size = 0;
        int result_size = 0;
        while (left_size < left.length || right_size < right.length) {
            if (left_size < left.length && right_size < right.length) {
                if (left[left_size] <= right[right_size]) {
                    result[result_size] = left[left_size];
                    left_size++;
                    result_size++;
                } else {
                    result[result_size] = right[right_size];
                    right_size++;
                    result_size++;
                }
            } else if (left_size < left.length) {
                result[result_size] = left[left_size];
                left_size++;
                result_size++;
            } else if (right_size < right.length) {
                result[result_size] = right[right_size];
                right_size++;
                result_size++;
            }
        }
        return result;
    }

    void start() {
        int[] input = new int[838662];
        int i = 0;
        while (sc.hasNextInt()) {
            input[i] = sc.nextInt();
            i++;
        }


        int[] result = mergeSort(input);
        for (int j : result) {
            System.out.printf("%d\n", j);
        }
    }


    public static void main(String[] args) throws Exception {
        new MergeSort().start();
    }
}

The input file contains integers between 0 and 3000000, and contains 838661 numbers. Please forgive the ugly coding style, this was just meant as a quick comparison.

Thanks in advance! Regards, Linus

È stato utile?

Soluzione

For one thing, the Programming Pearls articles were written before the impact of memory hierarchy became as severe as it is today. A map of 800K bytes adds lots of random access memory traffic very likely to cause cache misses. Mergesorts tend to have good local memory performance.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top