Question

I've been looking at shadow mapping and I see some write the squared distance to the depth texture in the shadow pass while some use actual depth values. Is there any reason to prefer one over the other? Seems faster to just use actual depth values?

Was it helpful?

Solution

In the comments to one of the answers I noticed you made a reference to another answer I wrote, and I wanted to make sure you understood the original context of that answer.

While the answers thus far have all focused on the complexity of the actual comparison itself, they do not consider what impact changing the value stored in the hardware depth buffer has on the performance of shadow map construction. You are concerned about the implications on early Z rejection, and this is a valid concern, but it will only affect the performance of shadow map construction as I will explain below.

Also, keep in mind that the answer you are referring to pertains to cubemap based shadow maps. They have their own set of challenges to deal with because of the unique way that they are constructed and sampled, and this is why the comparison is slightly different than you will see in other contexts.


Modern GPUs compress the color and depth buffers using a hierarchy of tiles to increase memory throughput.

This compression does not save storage space (in fact, it adds a little bit extra storage overhead), but what it does do is allow much quicker buffer clears and fetches. Instead of writing the same color or depth to every single pixel in the buffer, each tile can be flagged as "clear" and given a clear color/depth. When it comes time to fetch the color/depth for a pixel the first thing that happens is the tile the pixel belongs to is looked at, if the entire tile is clear, then the tile's color is returned instead of going through the trouble of fetching the actual pixel from memory.

Great... but what does compression have to do with early depth testing?

A lot, actually. This hierarchical memory structure lends itself nicely to rejecting large groups of fragments at a time, because the min/max depth for an entire tile worth of pixels can be determined in a single specialized memory operation. This does mean that writing to the color/depth buffer is a lot more complicated (having to update flags and such per-tile), but the hardware is specifically designed to work this way and a lot of the time you do not have to do anything special to benefit from it.

Now, even though the rasterizer has a simple fixed-function job to do, it does some pretty clever things whenever hierarchical Z-buffering (Hi-Z) is applicable. Given the fact that all primitives are planar, if the rasterizer can be guaranteed that a fragment shader is not going to alter the depth, it can perform a coarse-grained (e.g. 1 test per-tile in the compressed depth buffer) depth test using a min/max depth value and kill multiple fragments before shading/blending. If this coarse-grained test passes, or if the fragment shader writes its own depth, then each fragment must be shaded and then tested against the depth buffer on an individual basis afterwards. Now, in your case, the fragment shader is pretty simple so the expense of unnecessarily shading fragments will not be as much as it typically would be and blending is also not a factor in a depth-only pass.

However, having to do a late depth test per-fragment on primitives that are completely occluded is a waste of time that Hi-Z could have avoided. A lot of the measurable expense of depth-only rendering is actually front-end CPU overhead caused by the draw call(s) themselves (state validation, command serialization, etc.). Assuming your depth-only pass is batched efficiently, you can squeeze out a little bit more performance by making the depth test more efficient. Just do not expect to see a huge improvement in performance, there are multiple reasons described above why Hi-Z works better for more traditional rendering.

By the way, if you want a visual summary of most of what I just explained, have a look here.


Back to your original question...

In the end, properly exploiting hierarchical Z-buffering during the construction of your shadow map is not going to produce huge performance gains, but it can outweigh the gains to be had by reducing the number of arithmetic instructions necessary to compare depth. It mostly depends on how frequently you update your shadow map. On the one hand, it really does not matter how efficient the hardware fills your shadow map if you only do it once (static). On the other hand, if you have to draw 6 independent shadow maps per-light per-frame there will be a real measurable improvement in performance if you can decrease the amount of time it takes to draw each of those.

The elephant in the room here that is not being considered is the amount of time it takes to fetch the depth from a shadow map in the first place (much more time than your comparison). You can speed up shadow map construction and comparison all you want, but some of the biggest benefits come from improving the re-construction (sampling) performance.

Anti-aliasing VSM shadows, for example, can be done using traditional texture filtering instead of the multiple convoluted samples and comparisons you have to perform for other techniques. This makes anti-aliased re-construction from a VSM more efficient. Because VSM is based on variance, it does not require the storage of perspective depth... you can use linear distance if you want, it makes no difference for this algorithm. Even though the construction (storing d and d2) is more complicated, if you need anti-aliasing, it can be more efficient.

Clearly there is no one-size-fits-all, what you store in your shadow map largely depends on your algorithm.

OTHER TIPS

Actually using the squared values takes the fewer computational effort. To determine the length of a vector you do sqrt(r·r). If you compare the length of two vectors this would be sqrt(r_0·r_0) > sqrt(r_1·r_1) but sqrt is a strictly monotonic function so r_0·r_0 > r_1·r_1 i.e. comparing the squared values is equally valid, but it saves the computation of the square roots.

In regards to distance squared vs. actual depth, it just depends on your requirements from my experience: if I am using anything which involves square roots I prefer using squares of values as square roots are heavily computationally intensive, whereas squares are cheap. If on the other hand I have no maths involving square roots then I utilise straight depth values. I will try and expound a bit:

Before rendering the scene: render it in a pre-processing step with the camera at the position of the light to generate the scene from the POV of the light. All pixels that are visible from this position cannot be in shadow, so by elimination all other pixels must be in shadow.

It would be easy if we could somehow mark all pixels that are reached by the light, however that can be cumbersome and memory intensive.

Method 1: (depth check)

So we store the distance of each illuminated pixel to the light in a depth buffer. When rendering the scene, we calculate the distance of any rendered pixel to the light position. If this distance is equal to the stored depth at that pixel position, we know that the same pixel was visible from the light and thus is illuminated. If the distance is higher, we know this pixel was not visible from the POV of the light, and is in shadow.

Method2: (shadow map)

Now, depending on implementation details, you can store the pixel data for a shadow map using floating points or packed into integers for a colour map (reference 1) and utilise an algorithm like the one in the linked GPUGems book, however for normal depth test shadow mapping, one can get away with simple distances (reference 2).

What it all boils down to is how detailed you want the shadow-mapping to be, and how much computation you are willing to undergo: for real-time renders (games, simple cad render etc) the less time spent on niceties and extras the better, hence we use tricks like using squared values rather than square roots (there is a good order of complexity basic text on wikipedia (I know, wouldn't normally use that as a source) here).

I have tried to keep this as brief as possible so may have missed something, let me know if yo uwant more info, or if my poist doesnt make sense and I will update with more detail/clarification:)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top