Question

I don't Java much.

I am writing some optimized math code and I was shocked by my profiler results. My code collects values, interleaves the data and then chooses the values based on that. Java runs slower than my C++ and MATLAB implementations.

I am using javac 1.7.0_05 I am using the Sun/Oracle JDK 1.7.05

There exists a floor function that performs a relevant task in the code. java math.floor profile results

  1. Does anybody know of the paradigmatic way to fix this?
  2. I noticed that my floor() function is defined with something called StrictMath. Is there something like -ffast-math for Java? I am expecting there must be a way to change the floor function to something more computationally reasonable without writing my own.

    public static double floor(double a) {
        return StrictMath.floor(a); // default impl. delegates to StrictMath
    }
    

Edit

So a few people suggested I try to do a cast. I tried this and there was absolutely no change in walltime.

private static int flur(float dF)
{
    return (int) dF;
}

413742 cast floor function

394675 Math.floor

These test were ran without the profiler. An effort was made to use a profiler but the runtime was drastically altered (15+ minutes so I quit).

Was it helpful?

Solution

Here's a sanity check for your hypothesis that the code is really spending 99% of its time in floor. Let's assume that you have Java and C++ versions of the algorithm that are both correct in terms of the outputs they produce. For the sake of the argument, let us assume that the two versions call the equivalent floor functions the same number of times. So a time function is

t(input) = nosFloorCalls(input) * floorTime + otherTime(input)

where floorTime is the time taken for a call to floor on the platform.

Now if your hypothesis is correct, and floorTime is vastly more expensive on Java (to the extent that it takes roughly 99% of the execution time) then you would expect the Java version of the application to run a large factor (50 times or more) slower than the C++ version. If you don't see this, then your hypothesis most likely is false.


If the hypothesis is false, here are two alternative explanations for the profiling results.

  1. This is a measurement anomaly; i.e. the profiler has somehow got it wrong. Try using a different profiler.

  2. There is a bug in the Java version of your code that is causing it to call floor many, many more times than in the C++ version of the code.

OTHER TIPS

You might want to give a try to FastMath.

Here is a post about the performance of Math in Java vs. Javascript. There are a few good hints about why the default math lib is slow. They are discussing other operations than floor, but I guess their findings can be generalized. I found it interesting.

EDIT

According to this bug entry, floor has been implemented a pure java code in 7(b79), 6u21(b01) resulting in better performance. The code of floor in the JDK 6 is still a bit longer than the one in FastMath, but might not be responsible for such a perf. degradation. What JDK are you using? Could you try with a more recent version?

Math.floor() is insanely fast on my machine at around 7 nanoseconds per call in a tight loop. (Windows 7, Eclipse, Oracle JDK 7). I'd expect it to be very fast in pretty much all circumstances and would be extremely surprised if it turned out to be the bottleneck.

Some ideas:

  • I'd suggest re-running some benchmarks without a profiler running. It sometimes happens that profilers create spurious overhead when they instrument the binary - particularly for small functions like Math.floor() that are likely to be inlined.
  • Try a couple of different JVMs, you might have hit an obscure bug
  • Try the FastMath class in the excellent Apache Commons Math library, which includes a new implementation of floor. I'd be really surprised if it is faster, but you never know.
  • Check you are not running any virtualisation technonolgy or similar that might be interfering with Java's ability to call native code (which is used in a few of the java.lang.Math functions including Math.floor())

It is worth noting that monitoring a method takes some overhead and in the case of VisualVM, this is fairly high. If you have a method which is called often but does very little it can appear to use lots of CPU. e.g. I have seen Integer.hashCode() as a big hitter once. ;)

On my machine a floor takes less 5.6 ns, but a cast takes 2.3 ns. You might like to try this on your machine.


Unless you need to handle corner cases, a plain cast is faster.

// Rounds to zero, instead of Negative infinity.
public static double floor(double a) {
    return (long) a;
}

public static void main(String... args) {
    int size = 100000;
    double[] a = new double[size];
    double[] b = new double[size];
    double[] c = new double[size];
    for (int i = 0; i < a.length; i++) a[i] = Math.random()  * 1e6;

    for (int i = 0; i < 5; i++) {
        timeCast(a, b);
        timeFloor(a, c);
        for (int j = 0; j < size; j++)
            if (b[i] != c[i])
                System.err.println(a[i] + ": " + b[i] + " " + c[i]);
    }
}

public static double floor(double a) {
    return a < 0 ? -(long) -a : (long) a;
}

private static void timeCast(double[] from, double[] to) {
    long start = System.nanoTime();
    for (int i = 0; i < from.length; i++)
        to[i] = floor(from[i]);
    long time = System.nanoTime() - start;
    System.out.printf("Cast took an average of %.1f ns%n", (double) time / from.length);
}

private static void timeFloor(double[] from, double[] to) {
    long start = System.nanoTime();
    for (int i = 0; i < from.length; i++)
        to[i] = Math.floor(from[i]);
    long time = System.nanoTime() - start;
    System.out.printf("Math.floor took an average of %.1f ns%n", (double) time / from.length);
}

prints

Cast took an average of 62.1 ns
Math.floor took an average of 123.6 ns
Cast took an average of 61.9 ns
Math.floor took an average of 6.3 ns
Cast took an average of 47.2 ns
Math.floor took an average of 6.5 ns
Cast took an average of 2.3 ns
Math.floor took an average of 5.6 ns
Cast took an average of 2.3 ns
Math.floor took an average of 5.6 ns

First of all: Your profiler shows that your spending 99% of the cpu time in the floor function. This does not indicate floor is slow. If you do nothing but floor() thats totally sane. Since other languages seem to implement floor more efficient your assumption may be correct, however.

I know from school that a naive implementation of floor (which works only for positive numbers and is one off for negative ones) can be done by casting to an integer/long. That is language agnostic and some sort of general knowledge from CS courses.

Here are some micro benches. Works on my machine and backs what I learned in school ;)

rataman@RWW009 ~/Desktop
$ javac Cast.java && java Cast
10000000 Rounds of Casts took 16 ms

rataman@RWW009 ~/Desktop
$ javac Floor.java && java Floor
10000000 Rounds of Floor took 140 ms
#
public class Cast/Floor {

    private static final int ROUNDS = 10000000;

    public static void main(String[] args)
    {
        double[] vals = new double[ROUNDS];
        double[] res = new double[ROUNDS];

        // awesome testdata
        for(int i = 0; i < ROUNDS; i++)
        {
            vals[i] = Math.random() * 10.0;
        }

        // warmup
        for(int i = 0; i < ROUNDS; i++)
        {
            res[i] = floor(vals[i]);
        }

        long start = System.currentTimeMillis();
        for(int i = 0; i < ROUNDS; i++)
        {
            res[i] = floor(vals[i]);
        }
        System.out.println(ROUNDS + " Rounds of Casts took " + (System.currentTimeMillis() - start) +" ms");
    }

    private static double floor(double arg)
    {
        // Floor.java
        return Math.floor(arg);
        // or Cast.java
        return (int)arg;
    }

}

Math.floor (and Math.ceil) can be a surprising bottleneck if your algorithm depends on it a lot. This is because these functions handle edge cases that you might not care about (such as minus-zero and positive-zero etc). Just look at the implementation of these functions to see what they're actually doing; there's a surprising amount of branching in there.

Also consider that Math.floor/ceil take only a double as an argument and return a double, which you might not want. If you just want an int or long, some of the checks in Math.floor are simply unnecessary.

Some have suggested to simply cast to an int, which will work as long as your values are positive (and your algorithm doesn't depend on the edge cases that Math.floor checks for). If that's the case, a simple cast is the fastest solution by quite a margin (in my experience).

If for example your values can be negative and you want an int from a float, you can do something like this:

public static final int floor(final float value) {
    return ((int) value) - (Float.floatToRawIntBits(value) >>> 31);
}

(It just subtracts the float's sign bit from the cast to make it correct for negative numbers, while preventing an "if")

In my experience, this is a lot faster than Math.floor. If it isn't, I suggest to check your algorithm, or perhaps you've ran into JVM performance bug (which is much less likely).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top