質問

Source my answer in:

Is this expression correct in C preprocessor

I'm a little bit out of my forte here, and I'm trying to understand how this particular optimization works.

As mentioned in the answer, gcc will optimize integer division by 7 to:

mov edx, -1840700269
mov eax, edi
imul    edx
lea eax, [rdx+rdi]
sar eax, 2
sar edi, 31
sub eax, edi

Which translates back into C as:

int32_t divideBySeven(int32_t num) {
    int32_t temp = ((int64_t)num * -015555555555) >> 32;
    temp = (temp + num) >> 2;
    return (temp - (num >> 31));
}

Let's take a look at the first part:

int32_t temp = ((int64_t)num * -015555555555) >> 32;

Why this number?

Well, let's take 2^64 and divide it by 7 and see what pops out.

2^64 / 7 = 2635249153387078802.28571428571428571429

That looks like a mess, what if we convert it into octal?

0222222222222222222222.22222222222222222222222

That's a very pretty repeating pattern, surely that can't be a coincidence. I mean we remember that 7 is 0b111 and we know that when we divide by 99 we tend to get repeating patterns in base 10. So it makes sense that we'd get a repeating pattern in base 8 when we divide by 7.

So where does our number come in?

(int32_t)-1840700269 is the same as (uint_32t)2454267027

* 7 = 17179869189

And finally 17179869184 is 2^34

Which means that 17179869189 is the closest multiple of 7 2^34. Or to put it another way 2454267027 is the largest number that will fit in a uint32_t which when multiplied by 7 is very close to a power of 2

What's this number in octal?

0222222222223

Why is this important? Well, we want to divide by 7. This number is 2^34/7... approximately. So if we multiply by it, and then leftshift 34 times, we should get a number very close to the exact number.

The last two lines look like they were designed to patch up approximation errors.

Perhaps someone with a little more knowledge and/or expertise in this field can chime in on this.

>>> magic = 2454267027
>>> def div7(a):
...   if (int(magic * a >> 34) != a // 7):
...     return 0
...   return 1
... 
>>> for a in xrange(2**31, 2**32):
...   if (not div7(a)):
...     print "%s fails" % a
... 

Failures begin at 3435973841 which is, funnily enough 0b11001100110011001100110011010001

Classifying why the approximation fails is a bit beyond me, and why the patches fix it up is as well. Does anyone know how the magic works beyond what I've put down here?

役に立ちましたか?

解決

The first part of the algorithm is multiplying by an approximation to the reciprocal of 7. In this case, we're approximating computing the reciprocal with an integer multiplication and a right bit-shift.

First, we see the value -1840700269 (octal -015555555555) as a 32-bit integer. If you read this as an unsigned 32 bit integer, it has value 2454267027 (octal 22222222223). It turns out that 2454267027 / 2^34 is a very close integer approximation to 1/7.

Why do we pick this number and this particular power of 2? The larger the integers we use, the closer the approximation is. In this case, 2454267027 seems to be the largest integer (satisfying the above property) with which you can multiply a signed 32-bit int without overflowing a 64-bit int.

Next, if we immediately right shift with >> 34 and store the result in a 32-bit int, we're going to lose the accuracy in the two lowest-order bits. Those bits are necessary to determining the right floor for integer division.

I'm not sure the second line was translated correctly from the x86 code. At that point, temp is approximately num * 4/7, so num * 4/7 + num to that and bit-shifting is going to give you approximately num * 1/7 + num * 1/4, a quite large error.

For example, take as input 57, where 57 // 7 = 8. I verified the below in code as well:

  • 57 * 2454267027 = 139893220539
  • 139893220539 >> 32 = 32 (approx 57 * 4/7 = 32.5714... at this point)
  • 32 + 57 = 89
  • 89 >> 2 = 22 (huh?? Nowhere close to 8 at this point.)

Anyway, for the last line, it is an adjustment we make after computing signed integer division this way. I quote from the section from Hacker's delight on signed division:

The code most naturally computes the floor division result, so we need a correction to make it compute the conventional truncated toward 0 result. This can be done with three computational instructions by adding to the dividend if the dividend is negative.

In this case (referring to your other post) it seems you are doing a signed shift, so it will subtract -1 in the case of a negative number; giving the result of +1.

This isn't even all you can do; here's an even crazier blog post about how to divide by 7 with just a single multiplication.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top