挑选良好的第一估计数Goldschmidt司

https://stackoverflow.com/questions/2661541

27-09-2019
|

题

我计算fixedpoint倒数在Q22。10 Goldschmidt司用在我的软件的光栅器在手臂。

这是通过设置分子为1,i。e分子变得标上的第一迭代。说实话，我有点下维基百科算法盲目地在这里。文章说，如果母是缩放在半开放范围(0.5,1.0]，一个良好的第一估计可以根据母的孤单：F可以估计的标和D的分母，然后F=2-D。

但在这样做时，我失去了很多的精确度。说如果我想找到的倒数512.00002f。为了尺度的数下来，我会失去的10位精度中的一小部分的一部分，其中转移出去。因此，我的问题是：

有没有一种方法，以挑选一个更好的估计其中不需要规范化?为什么？为什么不呢？一个数学证明这是为什么或者不可能将是巨大的。
此外，是否有可能预先计算的初步估计因此一系列收敛更快？现在，它收敛之后的第4次迭代的平均水平。手臂上这是约-50周期的最糟糕的是，那不是把仿真的工具/暂存入帐户，也没有内存的查找。如果这是可能的，我想知道，如果这样做会增加错误，并通过多少。

这里是我的测试用例.注：该软件的执行情况 clz 在第13行中是从我的帖子在这里，.你可以替代它有内在的如果你想要的。 clz 应该返回的数量导致零和32值0。

#include <stdio.h>
#include <stdint.h>

const unsigned int BASE = 22ULL;

static unsigned int divfp(unsigned int val, int* iter)
{
  /* Numerator, denominator, estimate scalar and previous denominator */
  unsigned long long N,D,F, DPREV;
  int bitpos;

  *iter = 1;
  D = val;
  /* Get the shift amount + is right-shift, - is left-shift. */
  bitpos = 31 - clz(val) - BASE;
  /* Normalize into the half-range (0.5, 1.0] */
  if(0 < bitpos)
    D >>= bitpos;
  else
    D <<= (-bitpos);

  /* (FNi / FDi) == (FN(i+1) / FD(i+1)) */
  /* F = 2 - D */
  F = (2ULL<<BASE) - D;
  /* N = F for the first iteration, because the numerator is simply 1.
     So don't waste a 64-bit UMULL on a multiply with 1 */
  N = F;
  D = ((unsigned long long)D*F)>>BASE;

  while(1){
    DPREV = D;
    F = (2<<(BASE)) - D;
    D = ((unsigned long long)D*F)>>BASE;
    /* Bail when we get the same value for two denominators in a row.
      This means that the error is too small to make any further progress. */
    if(D == DPREV)
      break;
    N = ((unsigned long long)N*F)>>BASE;
    *iter = *iter + 1;
  }
  if(0 < bitpos)
    N >>= bitpos;
  else
    N <<= (-bitpos);
  return N;
}


int main(int argc, char* argv[])
{
  double fv, fa;
  int iter;
  unsigned int D, result;

  sscanf(argv[1], "%lf", &fv);

  D = fv*(double)(1<<BASE);
  result = divfp(D, &iter); 

  fa = (double)result / (double)(1UL << BASE);
  printf("Value: %8.8lf 1/value: %8.8lf FP value: 0x%.8X\n", fv, fa, result);
  printf("iteration: %d\n",iter);

  return 0;
}

解决方案

我忍不住花一个小时你的问题...

这个算法是描述在5.5.2节中的"Arithmetique des ordinateurs"，由让-米歇尔*穆勒(法语).它实际上是一个特殊的情况下牛顿的迭代，1作为起点。这本书给了一个简单的制剂的算法计算N/D，D化的范围[以1/2、1个[:

e = 1 - D
Q = N
repeat K times:
  Q = Q * (1+e)
  e = e*e

数量的正确位兼在每一次迭代。在这种情况下的32位，4个迭代将是不够的。您还可以直迭代 e 变得太小的修改 Q.

标准化使用，因为它提供了最大数量的显着位的结果。它也更容易计算错误和迭代的数量时，需要投入是在已知范围内。

一旦你输入的价值是标准化的，你不需要打扰有价值的基础，直到你有逆。你只是有一个32位数量的X化的范围内0x80000000到0xFFFFFFFF，并计算的近似值Y=2^64/X(Y至多2^33).

这种简化算法可以实现你的Q22。10表述如下：

// Fixed point inversion
// EB Apr 2010

#include <math.h>
#include <stdio.h>

// Number X is represented by integer I: X = I/2^BASE.
// We have (32-BASE) bits in integral part, and BASE bits in fractional part
#define BASE 22
typedef unsigned int uint32;
typedef unsigned long long int uint64;

// Convert FP to/from double (debug)
double toDouble(uint32 fp) { return fp/(double)(1<<BASE); }
uint32 toFP(double x) { return (int)floor(0.5+x*(1<<BASE)); }

// Return inverse of FP
uint32 inverse(uint32 fp)
{
  if (fp == 0) return (uint32)-1; // invalid

  // Shift FP to have the most significant bit set
  int shl = 0; // normalization shift
  uint32 nfp = fp; // normalized FP
  while ( (nfp & 0x80000000) == 0 ) { nfp <<= 1; shl++; } // use "clz" instead

  uint64 q = 0x100000000ULL; // 2^32
  uint64 e = 0x100000000ULL - (uint64)nfp; // 2^32-NFP
  int i;
  for (i=0;i<4;i++) // iterate
    {
      // Both multiplications are actually
      // 32x32 bits truncated to the 32 high bits
      q += (q*e)>>(uint64)32;
      e = (e*e)>>(uint64)32;
      printf("Q=0x%llx E=0x%llx\n",q,e);
    }
  // Here, (Q/2^32) is the inverse of (NFP/2^32).
  // We have 2^31<=NFP<2^32 and 2^32<Q<=2^33
  return (uint32)(q>>(64-2*BASE-shl));
}

int main()
{
  double x = 1.234567;
  uint32 xx = toFP(x);
  uint32 yy = inverse(xx);
  double y = toDouble(yy);

  printf("X=%f Y=%f X*Y=%f\n",x,y,x*y);
  printf("XX=0x%08x YY=0x%08x XX*YY=0x%016llx\n",xx,yy,(uint64)xx*(uint64)yy);
}

正如在码，乘法都是不完整的32x32->64位。电子会变得越来越小，并适合最初在32位。Q总是会在34位。我们只需要高的32位的产品。

推导 64-2*BASE-shl 是左作为一个运动对于读者:-).如果它变成0或负面的，结果并不表示(输入的价值是太小)。

编辑。作为后续行动，我的评论，这里是第二版本有一个隐含的32个位问：E和Q都是现已存在的32位：

uint32 inverse2(uint32 fp)
{
  if (fp == 0) return (uint32)-1; // invalid

  // Shift FP to have the most significant bit set
  int shl = 0; // normalization shift for FP
  uint32 nfp = fp; // normalized FP
  while ( (nfp & 0x80000000) == 0 ) { nfp <<= 1; shl++; } // use "clz" instead
  int shr = 64-2*BASE-shl; // normalization shift for Q
  if (shr <= 0) return (uint32)-1; // overflow

  uint64 e = 1 + (0xFFFFFFFF ^ nfp); // 2^32-NFP, max value is 2^31
  uint64 q = e; // 2^32 implicit bit, and implicit first iteration
  int i;
  for (i=0;i<3;i++) // iterate
    {
      e = (e*e)>>(uint64)32;
      q += e + ((q*e)>>(uint64)32);
    }
  return (uint32)(q>>shr) + (1<<(32-shr)); // insert implicit bit
}

其他提示

几个想法你的，虽然没有解决你的问题直接作为说明。

为什么这algo为司吗？大多数分歧，我已经看到在手臂使用的一些变量的
```
      adcs hi, den, hi, lsl #1
      subcc hi, hi, den
      adcs lo, lo, lo
```

重复n位倍的二进制搜索的工具，以确定从哪里开始。这是漂亮的该死的快。

如果精密，是一个大问题，你不限于32/64位为你的固定点表示。它会慢一点，但是你可以做添加/adc或子/sbc动值寄存器。mul/mla也是专为此种工作。

再次，没有直接回答你，但是可能的一些想法，前进，这一点。看到真实的手臂代码很可能会帮助我一点为好。

Mads，你不会失去任何精度。当你把512.00002f2^10,你仅仅是减少指数的浮点数10。尾数仍然是相同的。当然，除非指数达到最低值，但这不应该发生，因为你正在扩展到(0.5,1].

编辑：好的所以你使用的一个固定小数点。在这种情况下，应该允许一种表示方式不同的分母，在你的算法。D值是从(0.5,1]不仅在开始，但在整个整个的计算(这很容易证明x*(2-x) < 1x < 1).所以你应该表示的母与小数点在基=32.这样，你将有32位精度所有的时间。

编辑：要实现这个你会改变以下你的代码:

  //bitpos = 31 - clz(val) - BASE;
  bitpos = 31 - clz(val) - 31;
...
  //F = (2ULL<<BASE) - D;
  //N = F;
  //D = ((unsigned long long)D*F)>>BASE;
  F = -D;
  N = F >> (31 - BASE);
  D = ((unsigned long long)D*F)>>31;
...
    //F = (2<<(BASE)) - D;
    //D = ((unsigned long long)D*F)>>BASE;
    F = -D;
    D = ((unsigned long long)D*F)>>31;
...
    //N = ((unsigned long long)N*F)>>BASE;
    N = ((unsigned long long)N*F)>>31;

还在结束你就会有移N不bitpos但是，一些不同的价值，我太懒图出现在：).

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow