Negiert die Truthiness einer boolean Auswertung verursacht 5x Verlangsamung?

https://stackoverflow.com/questions/3731481

03-10-2019
|

Frage

Ich versuche, eine Octree zu implementieren, und für das, ich brauche einen schnellen AABB-Strahlschnitt Algorithmus. Nach einiger Suche war ich auf dieses Papier, das zu bieten schien Das. Aus dem Quellcode, rel="nofollow verfügbar hier ich die pluecker_cls_cff Funktion in C # wie dies übersetzt:

public bool Intersect_2(ref RayPluecker r)
{
  switch (r.Classification)
  {

    // 7 same-ish cases snipped

    case Classification.PPP:

      return !((r.Position.X > this.Max.X) || (r.Position.Y > this.Max.Y) || (r.Position.Z > this.Max.Z) ||
        (r.PlueckerCoefficient.X + r.Direction.X * this.Max.Y - r.Direction.Y * this.Min.X < 0) ||
        (r.PlueckerCoefficient.X + r.Direction.X * this.Min.Y - r.Direction.Y * this.Max.X > 0) ||
        (r.PlueckerCoefficient.Y + r.Direction.X * this.Min.Z - r.Direction.Z * this.Max.X > 0) ||
        (r.PlueckerCoefficient.Y + r.Direction.X * this.Max.Z - r.Direction.Z * this.Min.X < 0) ||
        (r.PlueckerCoefficient.Z - r.Direction.Z * this.Min.Y + r.Direction.Y * this.Max.Z < 0) ||
        (r.PlueckerCoefficient.Z - r.Direction.Z * this.Max.Y + r.Direction.Y * this.Min.Z > 0));
  }

  return false;
}

Das scheint zu funktionieren gut, aber es schien mir ziemlich langsam (250ms 10 Millionen intersects zu tun), damit ich einige Mikro-Benchmarking mit verschiedenen Sorten ausprobiert. In einem, entfernte ich die Negation, die nach der return Aussage richtig ist, und umgekehrt alle Vergleiche (> zu < und umgekehrt).

Es ist nun:

case Classification.PPP:

      return ((r.Position.X < this.Max.X) || (r.Position.Y < this.Max.Y) || (r.Position.Z < this.Max.Z) ||
        (r.PlueckerCoefficient.X + r.Direction.X * this.Max.Y - r.Direction.Y * this.Min.X > 0) ||
        (r.PlueckerCoefficient.X + r.Direction.X * this.Min.Y - r.Direction.Y * this.Max.X < 0) ||
        (r.PlueckerCoefficient.Y + r.Direction.X * this.Min.Z - r.Direction.Z * this.Max.X < 0) ||
        (r.PlueckerCoefficient.Y + r.Direction.X * this.Max.Z - r.Direction.Z * this.Min.X > 0) ||
        (r.PlueckerCoefficient.Z - r.Direction.Z * this.Min.Y + r.Direction.Y * this.Max.Z > 0) ||
        (r.PlueckerCoefficient.Z - r.Direction.Z * this.Max.Y + r.Direction.Y * this.Min.Z < 0));

Dies sollte das gleiche Ergebnis, Recht geben? Es schien so, wie es die gleichen Ergebnisse wie die negierte Version mit ein paar Testfälle zurückgibt. Doch in der Benchmark war es 5x schneller (50ms 10 Millionen intersects zu tun)! Ich bin sicher, es war nicht optimiert geführt wird, meine Benchmark sieht wie folgt aus:

for (int i = 0; i < 10000000; i++)
{
  if (!box.Intersect_3(ref ray))
  {
    throw new Exception();
  }
}

Was kann diesen großen Unterschied erklären? Ich bin .NET 4.0 auf x86 läuft.

Lösung

Your second code doesn't do the same thing as your first.

In addition to the changes you already made, you need to turn all your ORs into ANDs. (See De Morgan's Laws.)

I'll bet that after you make the fix, your two versions will run at the same speed.

Andere Tipps

Specifically related to performance, I bet the return statement is short-circuiting sooner in the second case than the first. It may be worth trying to change the order of the comparisons if some are more likely than others to be true. If you change the calculations to be equivalent using && instead of || in the second case, then you would want the ones that are most likely to be false to come first.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow