Question

I'm running a Java software on an ARM v6 processor. The nature of this program requires me to convert some numbers (int or float) to String. The processor runs at 850Mhz. The Java Runtime is OpenJDK Zero VM 1.7.0_21-b02.

I'm not expecting rock-solid performances here, but I would expect something much more efficient than what I am seeing with the code snippet below.

    long time1, time2;

    float[] src = new float[2000000];
    for (int i = 0; i < src.length; i++) {
        src[i] = (float)Math.random()* 2.56454512f * (float) Math.random();
    }
    time1 = System.nanoTime();
    for (int j = 0; j < src.length; j++) {
        String test = String.valueOf(src[j]);
    }
    time2 = System.nanoTime();
    logTimeDelay("String.valueOf", time1, time2);

    time1 = System.nanoTime();
    for (int j = 0; j < src.length; j++) {
        String test = Float.toString(src[j]);
    }
    time2 = System.nanoTime();
    logTimeDelay("Float.toString", time1, time2);


    StringBuilder sb = new StringBuilder(50);
    time1 = System.nanoTime();
    for (int j = 0; j < src.length; j++) {
        sb.setLength(0);
        sb.append(src[j]);
    }
    time2 = System.nanoTime();
    logTimeDelay("StringBuilder.append, setLength", time1, time2);

    time1 = System.nanoTime();
    for (int j = 0; j < src.length; j++) {
        String test = "" + src[j];
    }
    time2 = System.nanoTime();
    logTimeDelay("\"\" + ", time1, time2);

    private static void logTimeDelay(String message, long time1, long time2){
        System.out.println(String.format(message + ": %.5f s", (float) (time2 - time1) / 1.0e9));
    }

Running this code snippet on my i7 computer returns the following results:

String.valueOf: 0.39714 s
Float.toString: 0.33295 s
StringBuilder.append, setLength: 0.33277 s
"" + : 0.37581 s

Running the exact same code snippet on the ARMv6 processor returns the following values:

String.valueOf: 204.78758 s
Float.toString: 200.79659 s
StringBuilder.append, setLength: 180.81551 s
"" + : 267.63036 s

Any clues on how I could optimize my numbers to int conversion on this device?

Thanks in advance.

Was it helpful?

Solution

"Out of thin air" hypothesis, but the difference in performance you observe here seems to be related to CPU caching; your ARM CPU has far less cache than your desktop's i7.

Your float array has two millions elements in it; that makes for a minimum of 8 MB storage. Those 8 MB need to reach the CPU.

I also have an i7 here and the size of caches is: 32kb (L1), 256kb (L2), 6MB (L3); three quarters of the float array can fit into L3! It seems that in your case there can only be 32kb at a time... Therefore there is a lot of cache thrashing and the memory bus traffic is very high.

I suspect that if you reduce your array size to something which fits 32kb (for instance, try with only 1000 floats) the performance figures will be far closer.

EDIT: it also happens that your CPU does not have an FPU; that accounts for the majority of the performance loss, as @Voo mentioned.

So:

  • lack of an FPU,
  • small cache,
  • lots of data.

For a more "realistic" comparison, you should test over a smaller subset of data; this will at least alleviate (but not completely eliminate) the cache problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top