Need help implementing Karatsuba algorithm in C++

https://stackoverflow.com/questions/7985883

20-02-2021
|

Question

First a little background:
- I'm a first time poster, a student in university (not in programming).
- This is not a homework question, I'm only doing this for fun.
- My programming experience consists of one semester (3 months) of C++, and some QBasic in high school.
- Yes I have looked at the GMP and Bignum libraries; it's very difficult to learn stuff from raw codes, especially without understanding of the programmers' intent. Besides I want to learn how to do it for myself.

I'm coding a multiplication function for arbitrarily large integers. I'm using character arrays to represent these numbers, with a + or - at the end to serve as sentinel (eg. "12345+", "31415-").

I'm currently implementing the Karatsuba algorithm. The problem is that with the recursion and dynamic memory assignments, the function is 5 times slower than the naive method.
I could use some hints on how to reduce the running time.

char* dam(char* one, char* two){            // Karatsuba method

    char* zero = intochar(0, 0);
    int size_a = char_size(one) - 1;
    int size_b = char_size(two) - 1;

    if(compare(one, zero) == 0 || compare(two, zero) == 0)
        return zero;                        // if either array is zero, product is zero
    delete[] zero;

    if(size_a < 4 && size_b < 4)            // if both numbers are 3 digits or less, just return their product
        return multiplication(one, two);
                                            // is the product negative?
    bool negative = one[size_a] == two[size_b]? false : true;
    int digits = size_a > size_b ? size_a : size_b;
    digits += digits & 1;                   // add one if digits is odd
    int size = digits / 2 + 1;              // half the digits plus sentinel

    char* a, *b;                            // a and b represent one and two but with even digits
    if(size_a != digits)
        a = pad_char(one, digits - size_a); // pad the numbers with leading zeros so they have even digits
    else
        a = copy_char(one);
    if(size_b != digits)
        b = pad_char(two, digits - size_b);
    else
        b = copy_char(two);

    char* a_left = new char[size];          // left half of number a
    char* a_rite = new char[size];          // right half of number a
    char* b_left = new char[size];
    char* b_rite = new char[size];
    memcpy(a_left, a, size - 1);
    a_left[size - 1] = a[digits];
    memcpy(a_rite, a + size - 1, size);
    memcpy(b_left, b, size - 1);
    b_left[size - 1] = b[digits];
    memcpy(b_rite, b + size - 1, size);
    delete[] a;
    delete[] b;

    char* p0 = dam(a_left, b_left);         // Karatsuba product = p1*10^n + (p0+p2-p1)*10^(n/2) + p2
    char* p2 = dam(a_rite, b_rite);
    deduct(a_left, a_rite);
    deduct(b_left, b_rite);
    char* p1 = dam(a_left, b_left);
    char* p3 = intochar(0, digits - 1);     // p3 = p0 + p2 - p1
    append(p3, p0);                         // append does naive addition
    append(p3, p2);
    deduct(p3, p1);
    delete[] a_left;
    delete[] a_rite;
    delete[] b_left;
    delete[] b_rite;

    int sum_size = 2 * digits;              // product of two numbers can have a maximum of n1 + n2 digits
    char* sum = new char[sum_size + 1];
    memset(sum, 0, sum_size);
    if(negative)
        sum[sum_size] = '-';
    else
        sum[sum_size] = '+';

    char* left = extend_char(p0, digits, false);        // extend returns a new array with trailing zeros
    char* mid = extend_char(p3, size - 1, false);
    append(sum, left);
    append(sum, mid);
    append(sum, p2);
    delete[] p0;
    delete[] p1;
    delete[] p2;
    delete[] p3;
    delete[] left;
    delete[] mid;

    return sum;}

Solution

Karatsuba is a nice algorithm, and not too hard to program. If you're only doing it for fun, it's even not a bad idea to work in base 10 -- it slows you down terribly, but it slows down the naive implementation too, so you still have a basis for comparing the two methods.

However, you absolutely must abandon the idea of allocating and freeing your workspace dynamically at every node of the recursion tree. You just can't afford it. You must allocate the required workspace at the start of the computation, and handle your pointers intelligently so that every level of the tree gets the workspace it needs without having to allocate it.

Also, it makes no sense to test for negative products at every level. Just do that at the top level, and work exclusively with positive numbers during the calculation.

Not that it's relevant to your question, but whenever I see something like

bool negative = one[size_a] == two[size_b]? false : true;

my heart shrinks a little. Think of all those wasted pixels! I respectfully suggest:

bool negative = one[size_a] != two[size_b] ;

OTHER TIPS

Your use of spelled-out decimal values is imposing a great deal of overhead. Karatsuba multiplication is only going to beat long multiplication for numbers that are huge relative to the machine register size, and you really want each primitive multiply to be doing as much work as possible.

I recommend you redesign your data structure such that this:

if(size_a < 4 && size_b < 4)
    return multiplication(one, two);

can become something like this:

if(size_a == 1 && size_b == 1)
    return box(int64_t(one[0]) * two[0]);

where the type of one[0] is int32_t, perhaps. This is what GMP is doing with its mp_limb_t arrays.

I would imagine the slowdown might actually be the allocations. Replace these with local fixed size buffers and I'd imagine you'll see a decent speed increase. Or using a custom pool allocator. But I think that if you're crafty, much of this can be done in place.

Also, if you pass the length of the strings to the functions, you save yourself an iteration to find the length each time.

(also it's spelled "right").

What do you mean writing "the function is 5 times slower"? Karatsuba is asymptotically faster, not just faster. This means that even a toy implementation of Karatsuba will eventually get faster than the naive multiplication. Did you test speed with 10000-digits numbers?

I know GMP code is not easy to read... but look at this table extracted from the code. It gives (for different CPUs) the threshold value for Toom-2,2 (Karatsuba). In short terms, the implementation of Karatsuba in GMP is NOT faster than naive implementation for operands smaller than 320 bits (10 x 32-bit registers).

Some questions about your code:

do you really need char *a, *b ? You delete them before starting computation! ;-)
are you sure you need to copy the sign to {a,b}_{left,rite}? Did you check if the result is correct with negative operands?
consider very unbalanced operands... what should you do if size_a * 2 < size_b (or viceversa)?

Next step will be Toom, right?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow