Question

For my graphics course we were taught the Naive line rasterizing algorithm then Bresenham's line drawing algorithm. We were told computers are integer machines which is why we should use the latter.

  1. If we assume no optimization on software level is this true for modern cpus with mmx and other instruction sets ? As I have looked at Intel’s 64-ia-32-architectures-optimization-manual.pdf and the latency for addition subtraction multiplication is the same or better for float compared to int for mmx.

  2. If the algorithm is executed in the gpu should this matter ? As having checked NVIDIA CUDA Programming Guide 1.0 (pdf), page 41, the clock cycles for int and float are the same.

  3. What is the de-efficiency of casting float to int? is load-hit-store stall a real issue for us?

  4. How efficient are functions which round up/down numbers? (we can think of the implementation in c++ stl)

  5. Is the efficiency gained from Bresenham algorithm due to addition instead of the multiplication used in the inner loop?

Was it helpful?

Solution

It's a little misleading to call computers integer machines, but the sentiment is mostly true. Since as far as I know, CPUs use integer registers to generate memory addresses to read from and write to. Keeping line drawing in integer registers means you avoid the overhead of copying from other registers into integer registers to generate the memory address to write pixels to during line drawing.

As for your specific questions:

  1. Since you need to use the general purpose registers to access memory, using SSE or the FPU to calculate memory offsets (pointers) will still have the overhead of transferring data from those registers to the general purpose ones. So it depends on whether the overhead of transferring from one register set to another is greater than the performance of using a particular instruction set.
  2. GPUs tend to have a unified register set, so it shouldn't matter nearly as much.
  3. Casting a float to an int in and of itself is not expensive. The overhead comes from transferring data from one register set to another. Usually that has to be done through memory, and if your CPU has load-hit-store penalties, this transfer is a big source of them.
  4. The performance of rounding up or down depends on the CPU and the compiler. On the slow end, MSVC used to use a function to round to zero which mucked with the FPU control word. On the fast end you have special CPU instructions that handle rounding directly.
  5. Bresenham's line drawing algorithm is fast because it reduces determining where to draw points on a line from the naive y= m*x + b formula to an addition plus a branch (and the branch can be eliminated through well know branchless integer techniques). The run-slice version of Brensenham's line drawing algorithm can be even faster as it determines "runs" of pixels with the same component directly rather than iterating.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top