How can I further optimize this color difference function?

https://stackoverflow.com/questions/3049213

27-09-2019
|

Question

I have made this function to calculate color differences in the CIE Lab colorspace, but it lacks speed. Since I'm not a Java expert, I wonder if any Java guru around has some tips that can improve the speed here.

The code is based on the matlab function mentioned in the comment block.

/**
 * Compute the CIEDE2000 color-difference between the sample color with
 * CIELab coordinates 'sample' and a standard color with CIELab coordinates
 * 'std'
 *
 * Based on the article:
 * "The CIEDE2000 Color-Difference Formula: Implementation Notes,
 * Supplementary Test Data, and Mathematical Observations,", G. Sharma,
 * W. Wu, E. N. Dalal, submitted to Color Research and Application,
 * January 2004.
 * available at http://www.ece.rochester.edu/~gsharma/ciede2000/
 */
public static double deltaE2000(double[] lab1, double[] lab2)
{
    double L1 = lab1[0];
    double a1 = lab1[1];
    double b1 = lab1[2];

    double L2 = lab2[0];
    double a2 = lab2[1];
    double b2 = lab2[2];

    // Cab = sqrt(a^2 + b^2)
    double Cab1 = Math.sqrt(a1 * a1 + b1 * b1);
    double Cab2 = Math.sqrt(a2 * a2 + b2 * b2);

    // CabAvg = (Cab1 + Cab2) / 2
    double CabAvg = (Cab1 + Cab2) / 2;

    // G = 1 + (1 - sqrt((CabAvg^7) / (CabAvg^7 + 25^7))) / 2
    double CabAvg7 = Math.pow(CabAvg, 7);
    double G = 1 + (1 - Math.sqrt(CabAvg7 / (CabAvg7 + 6103515625.0))) / 2;

    // ap = G * a
    double ap1 = G * a1;
    double ap2 = G * a2;

    // Cp = sqrt(ap^2 + b^2)
    double Cp1 = Math.sqrt(ap1 * ap1 + b1 * b1);
    double Cp2 = Math.sqrt(ap2 * ap2 + b2 * b2);

    // CpProd = (Cp1 * Cp2)
    double CpProd = Cp1 * Cp2;

    // hp1 = atan2(b1, ap1)
    double hp1 = Math.atan2(b1, ap1);
    // ensure hue is between 0 and 2pi
    if (hp1 < 0) {
        // hp1 = hp1 + 2pi
        hp1 += 6.283185307179586476925286766559;
    }

    // hp2 = atan2(b2, ap2)
    double hp2 = Math.atan2(b2, ap2);
    // ensure hue is between 0 and 2pi
    if (hp2 < 0) {
        // hp2 = hp2 + 2pi
        hp2 += 6.283185307179586476925286766559;
    }

    // dL = L2 - L1
    double dL = L2 - L1;

    // dC = Cp2 - Cp1
    double dC = Cp2 - Cp1;

    // computation of hue difference
    double dhp = 0.0;
    // set hue difference to zero if the product of chromas is zero
    if (CpProd != 0) {
        // dhp = hp2 - hp1
        dhp = hp2 - hp1;
        if (dhp > Math.PI) {
            // dhp = dhp - 2pi
            dhp -= 6.283185307179586476925286766559;
        } else if (dhp < -Math.PI) {
            // dhp = dhp + 2pi
            dhp += 6.283185307179586476925286766559;
        }
    }

    // dH = 2 * sqrt(CpProd) * sin(dhp / 2)
    double dH = 2 * Math.sqrt(CpProd) * Math.sin(dhp / 2);

    // weighting functions
    // Lp = (L1 + L2) / 2 - 50
    double Lp = (L1 + L2) / 2 - 50;

    // Cp = (Cp1 + Cp2) / 2
    double Cp = (Cp1 + Cp2) / 2;

    // average hue computation
    // hp = (hp1 + hp2) / 2
    double hp = (hp1 + hp2) / 2;

    // identify positions for which abs hue diff exceeds 180 degrees
    if (Math.abs(hp1 - hp2) > Math.PI) {
        // hp = hp - pi
        hp -= Math.PI;
    }
    // ensure hue is between 0 and 2pi
    if (hp < 0) {
        // hp = hp + 2pi
        hp += 6.283185307179586476925286766559;
    }

    // LpSqr = Lp^2
    double LpSqr = Lp * Lp;

    // Sl = 1 + 0.015 * LpSqr / sqrt(20 + LpSqr)
    double Sl = 1 + 0.015 * LpSqr / Math.sqrt(20 + LpSqr);

    // Sc = 1 + 0.045 * Cp
    double Sc = 1 + 0.045 * Cp;

    // T = 1 - 0.17 * cos(hp - pi / 6) +
    //       + 0.24 * cos(2 * hp) +
    //       + 0.32 * cos(3 * hp + pi / 30) -
    //       - 0.20 * cos(4 * hp - 63 * pi / 180)
    double hphp = hp + hp;
    double T = 1 - 0.17 * Math.cos(hp - 0.52359877559829887307710723054658)
            + 0.24 * Math.cos(hphp)
            + 0.32 * Math.cos(hphp + hp + 0.10471975511965977461542144610932)
            - 0.20 * Math.cos(hphp + hphp - 1.0995574287564276334619251841478);

    // Sh = 1 + 0.015 * Cp * T
    double Sh = 1 + 0.015 * Cp * T;

    // deltaThetaRad = (pi / 3) * e^-(36 / (5 * pi) * hp - 11)^2
    double powerBase = hp - 4.799655442984406;
    double deltaThetaRad = 1.0471975511965977461542144610932 * Math.exp(-5.25249016001879 * powerBase * powerBase);

    // Rc = 2 * sqrt((Cp^7) / (Cp^7 + 25^7))
    double Cp7 = Math.pow(Cp, 7);
    double Rc = 2 * Math.sqrt(Cp7 / (Cp7 + 6103515625.0));

    // RT = -sin(delthetarad) * Rc
    double RT = -Math.sin(deltaThetaRad) * Rc;

    // de00 = sqrt((dL / Sl)^2 + (dC / Sc)^2 + (dH / Sh)^2 + RT * (dC / Sc) * (dH / Sh))
    double dLSl = dL / Sl;
    double dCSc = dC / Sc;
    double dHSh = dH / Sh;
    return Math.sqrt(dLSl * dLSl + dCSc * dCSc + dHSh * dHSh + RT * dCSc * dHSh);
}

Solution

cos is expensive, especially 4 in a row. You appear to be computing cos(na+b) where b is a constant and n is a small integer. This means you can precompute cos(b) and sin(b) and at runtime compute just cos(hp) and sin(hp). You can get cos(na+b) by making repeated use of

cos(a+b) = cos(a)*cos(b)-sin(a)*sin(b)

You'll be trading a couple of sins and coss for some multiplications and additions, almost certainly worthwhile.

You can do better if you're feeling ambitious. You're getting hp indirectly from an atan2. The pattern trig-function(rational-function(inverse-trig-function(x))) can frequently be replaced by some combination of polynomials and roots which are faster to evaluate than trig functions.

I don't know how pow is implemented in Java, but if it uses logs, you may be better off getting Cp7 using Cp2=Cp*Cp;Cp4=Cp2*Cp2;Cp7=Cp4*Cp2*Cp;

Update: Getting a bit more speculative right now as I don't have time to actually rewrite the code. The power optimization and the trig optimization are actually the same thing in disguise! The trig optimization is a version of the power optimization applied to complex numbers. What's more, the line

double dH = 2 * Math.sqrt(CpProd) * Math.sin(dhp / 2);

is part of a complex number square root operation. This makes me think that a large chunk of this code could actually be written to use complex numbers eliminating almost all of the trig functions. I don't know how your complex number arithmetic is though...

OTHER TIPS

Generally any system which implements this and has serious speed issues isn't going to be doing random colors. It's going to be doing several distinct colors. Even a giant image full of different colors is typically going to only have a few thousand colors. I very highly recommend a caching algorithm. Though if speed is a concern you should roll up your own (you want primitives only, speed).

There's not much optimizing to be done with the actual color distance routine itself, but I wrote a caching system for this thing and it went on the order of a 100 times faster. The distance routine went from the overwhelming dominant factor to a blip. You shouldn't seek to reduce the speed of that. You might eek out something. But, reduce the number of times you invoke the thing properly.

You have two set inputs and it produces a single set output, and does so after a very very long time. 7 doubles per caching index. That's 14 bytes. For a 14 meg memory footprint (or so, ignoring hashes or what not, likely we're talking double). You can store a million entries and that's enough that if you have like 1k typical different colors you'll get high 90%s cache hits. You can even hugely reduce this if you're converting your initial colors from RGB to Lab (those conversions should be cached too). You'll see a speed up if you hit like 5% of the time. And you'll get hits likely 99% of the time (unless you're doing something odd like random color comparisons). From my observations it makes CIEDE2000 take pretty much the same about of time as Euclidean RGB.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow