Efficient Bicubic filtering code in GLSL?

Question 1

I decided to take a minute to dig my old Perforce activities and found the missing cubic() function; enjoy! :)

vec4 cubic(float v)
{
    vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
    vec4 s = n * n * n;
    float x = s.x;
    float y = s.y - 4.0 * s.x;
    float z = s.z - 4.0 * s.y + 6.0 * s.x;
    float w = 6.0 - x - y - z;
    return vec4(x, y, z, w);
}

Question 2

I found this implementation which can be used as a drop-in replacement for texture() (from http://www.java-gaming.org/index.php?topic=35123.0 (one typo fixed)):

// from http://www.java-gaming.org/index.php?topic=35123.0
vec4 cubic(float v){
    vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
    vec4 s = n * n * n;
    float x = s.x;
    float y = s.y - 4.0 * s.x;
    float z = s.z - 4.0 * s.y + 6.0 * s.x;
    float w = 6.0 - x - y - z;
    return vec4(x, y, z, w) * (1.0/6.0);
}

vec4 textureBicubic(sampler2D sampler, vec2 texCoords){

   vec2 texSize = textureSize(sampler, 0);
   vec2 invTexSize = 1.0 / texSize;

   texCoords = texCoords * texSize - 0.5;


    vec2 fxy = fract(texCoords);
    texCoords -= fxy;

    vec4 xcubic = cubic(fxy.x);
    vec4 ycubic = cubic(fxy.y);

    vec4 c = texCoords.xxyy + vec2 (-0.5, +1.5).xyxy;

    vec4 s = vec4(xcubic.xz + xcubic.yw, ycubic.xz + ycubic.yw);
    vec4 offset = c + vec4 (xcubic.yw, ycubic.yw) / s;

    offset *= invTexSize.xxyy;

    vec4 sample0 = texture(sampler, offset.xz);
    vec4 sample1 = texture(sampler, offset.yz);
    vec4 sample2 = texture(sampler, offset.xw);
    vec4 sample3 = texture(sampler, offset.yw);

    float sx = s.x / (s.x + s.y);
    float sy = s.z / (s.z + s.w);

    return mix(
       mix(sample3, sample2, sx), mix(sample1, sample0, sx)
    , sy);
}

Example: Nearest, bilinear, bicubic:

The ImageData of this image is

{{{0.698039, 0.996078, 0.262745}, {0., 0.266667, 1.}, {0.00392157, 
   0.25098, 0.996078}, {1., 0.65098, 0.}}, {{0.996078, 0.823529, 
   0.}, {0.498039, 0., 0.00392157}, {0.831373, 0.00392157, 
   0.00392157}, {0.956863, 0.972549, 0.00784314}}, {{0.909804, 
   0.00784314, 0.}, {0.87451, 0.996078, 0.0862745}, {0.196078, 
   0.992157, 0.760784}, {0.00392157, 0.00392157, 0.498039}}, {{1., 
   0.878431, 0.}, {0.588235, 0.00392157, 0.00392157}, {0.00392157, 
   0.0666667, 0.996078}, {0.996078, 0.517647, 0.}}}

I tried to reproduce this (many other interpolation techniques)

but they have clamped padding, while I have repeating (wrapping) boundaries. Therefore it is not exactly the same.

It seems this bicubic business is not a proper interpolation, i.e. it does not take on the original values at the points where the data is defined.

Question 3

The missing function cubic() in JAre's answer could look like this:

vec4 cubic(float x)
{
    float x2 = x * x;
    float x3 = x2 * x;
    vec4 w;
    w.x =   -x3 + 3*x2 - 3*x + 1;
    w.y =  3*x3 - 6*x2       + 4;
    w.z = -3*x3 + 3*x2 + 3*x + 1;
    w.w =  x3;
    return w / 6.f;
}

It returns the four weights for cubic B-Spline.

It is all explained in NVidia Gems.

Question 4

Wow. I recognize the code above (I can not comment w/ reputation < 50) as I came up with it in early 2011. The problem I was trying to solve was related to old IBM T42 (sorry the exact model number escapes me) laptop and it's ATI graphics stack. I developed the code on NV card and originally I used 16 texture fetches. That was kinda of slow but fast enough for my purposes. When someone reported it did not work on his laptop it became apparent that they did not support enough texture fetches per fragment. I had to engineer a work-around and the best I could come up with was to do it with number of texture fetches that would work.

I thought about it like this: okay, so if I handle each quad (2x2) with linear filter the remaining problem is can the rows and columns share the weights? That was the only problem on my mind when I set out to craft the code. Of course they could be shared; the weights are same for each column and row; perfect!

Now I had four samples. The remaining problem was how to correctly combine the samples. That was the biggest obstacle to overcome. It took about 10 minutes with pencil and paper. With trembling hands I typed the code in and it worked, nice. Then I uploaded the binaries to the guy who promised to check it out on his T42 (?) and he reported it worked. The end. :)

I can assure that the equations check out and give mathematically identical results to computing the samples individually. FYI: with CPU it's faster to do horizontal and vertical scan separately. With GPU multiple passes is not that great idea, especially when it's probably not feasible anyway in typical use case.

Food for thought: it is possible to use a texture lookup for the cubic() function. Which is faster depends on the GPU but generally speaking, the sampler is light on the ALU side just doing the arithmetic would balance things out. YMMV.

Question 5

(EDIT)

Cubic() is a cubic spline function

Example:

cubic spline

Texscale is sampling window size coefficient. You can start with 1.0 value.

vec4 filter(sampler2D texture, vec2 texcoord, vec2 texscale)
{
    float fx = fract(texcoord.x);
    float fy = fract(texcoord.y);
    texcoord.x -= fx;
    texcoord.y -= fy;

    vec4 xcubic = cubic(fx);
    vec4 ycubic = cubic(fy);

    vec4 c = vec4(texcoord.x - 0.5, texcoord.x + 1.5, texcoord.y -
0.5, texcoord.y + 1.5);
    vec4 s = vec4(xcubic.x + xcubic.y, xcubic.z + xcubic.w, ycubic.x +
ycubic.y, ycubic.z + ycubic.w);
    vec4 offset = c + vec4(xcubic.y, xcubic.w, ycubic.y, ycubic.w) /
s;

    vec4 sample0 = texture2D(texture, vec2(offset.x, offset.z) *
texscale);
    vec4 sample1 = texture2D(texture, vec2(offset.y, offset.z) *
texscale);
    vec4 sample2 = texture2D(texture, vec2(offset.x, offset.w) *
texscale);
    vec4 sample3 = texture2D(texture, vec2(offset.y, offset.w) *
texscale);

    float sx = s.x / (s.x + s.y);
    float sy = s.z / (s.z + s.w);

    return mix(
        mix(sample3, sample2, sx),
        mix(sample1, sample0, sx), sy);
}

Source

Question 6

For anybody interested in GLSL code to do tri-cubic interpolation, ray-casting code using cubic interpolation can be found in the examples/glCubicRayCast folder in: http://www.dannyruijters.nl/cubicinterpolation/CI.zip

edit: The cubic interpolation code is now available on github: CUDA version and WebGL version, and GLSL sample.

Question 7

I've been using @Maf 's cubic spline recipe for over a year, and I recommend it, if a cubic B-spline meets your needs.

But I recently realized that, for my particular application, it is important for the intensities to match exactly at the sample points. So I switched to using a Catmull-Rom spline, which uses a slightly different recipe like so:

// Catmull-Rom spline actually passes through control points
vec4 cubic(float x) // cubic_catmullrom(float x)
{
    const float s = 0.5; // potentially adjustable parameter
    float x2 = x * x;
    float x3 = x2 * x;
    vec4 w;
    w.x =    -s*x3 +     2*s*x2 - s*x + 0;
    w.y = (2-s)*x3 +   (s-3)*x2       + 1;
    w.z = (s-2)*x3 + (3-2*s)*x2 + s*x + 0;
    w.w =     s*x3 -       s*x2       + 0;
    return w;
}

I found these coefficients, plus those for a number of other flavors of cubic splines, in the lecture notes at: http://www.cs.cmu.edu/afs/cs/academic/class/15462-s10/www/lec-slides/lec06.pdf

Question 8

I think it is possible that the Catmull version could be done with 4 texture lookups by (a) arranging the input texture like a chessboard with alternate slots saved as positives and as negatives, and (b) an associated modification of textureBicubic. That would rely on the contributions/weights w.x/w.w always being negative, and the contributions w.y/w.z always being positive. I haven't double-checked if this is true, or exactly how the modified textureBicubic would look.

... I have verified that w contributions do satisfy the +ve -ve rules.