DirectX Large Plane Tessellation

Question

It's very doubtful that you actually want to tessellate the entire plane - you're just wasting time tessellating the entire thing, when the viewer can't even tell the difference for far-away points. Consider implementing some sort of real-time level-of-detail (LOD) algorithm. The concept of LOD is to provide more detail for closer objects while leaving the distant ones simpler to save render time. Here's an image you help you out if you don't "get" LOD:

Image illustrating LOD

High tessellation <-------------------------------> Low tessellation

While these specific models were manually created in a modeling software such as Maya, as oppose to being tessellated on the GPU in real-time, you hopefully still get the idea. When the camera is far away from the model, render the bunny on the right, and as it gets closer, render more detailed bunnies until the camera is right next to the object, when you'd render it in its full beauty.

In my time, I have discovered two methods of real-time LOD that I consider superior to all others. The first one was developed for water rendering, and the link to it is here - I really can't explain it much better than they did. Watch in amazement (or point and laugh if I fail) as I now attempt to explain the other one!

GPU Terrain Subdivision and Tessellation

Source: first entry in the book GPU Pro 4, by Wolfgang Engel
Your posted image could resemble badly skewed terrain if you squint a bit. Since this was developed for terrain, this would probably be easier to implement than the other method, which was designed for water rendering.

As any well-informed GPU programmer knows, the CPU is fast and the GPU is faster. The main bottleneck of complex graphics is the transfer of data between the CPU and the GPU. It takes forever.

This method, developed specifically for terrain rendering, is an almost-fully-GPU-based algorithm designed to create a subdivided mesh with distance-based LOD. While performance is largely dependent on the installed hardware's capability to support it, this method is highly optimized for speed.

Pretty much, the method repeatedly subdivides an until it achieves a desired LOD - represented by some parameters passed by the programmer. In addition, culling is also performed at each iteration of the algorithm, avoiding a lot of unnecessary processing or subdivision of areas outside of the viewing frustum.

Because all data used by this method is retained and refined on the GPU, the CPU is mostly left available to perform other tasks. In addition, we also utilize a smooth LOD transitioning scheme to show a practical application of the subdivision algorithm. The final output (this image with partially superimposed wireframe) is really cool:

Final output

Now, I'm not going to lie. The algorithm is difficult to swallow. So, I'm going to do my best to strip it down for you. Here's the lingo:

Viewable region A viewable region, denoted by the variable R, is defined as an axis-aligned (typically upward-facing) quadrilateral that represents a region to be subdivided and/or rendered. This region is defined by a center position and a width, extended in both the positive and negative directions along each axis (see image below). The center position is denoted by RC. The applied offset is denoted by Rλ. The bounding points of the region are denoted by P1, P2, P3, and P4. Picture a viewable region span like this:

Viewable region span

Viewable region span A viewable region span, denoted by θ, can be quantified by the following function, given point P and applied offset λ:

θ(P, λ) = |P_ScreenR − P_ScreenL|

P_ScreenL = (P_VProjLxy)/P_ProjLw

P_ScreenR = (P_ProjRxy)/P_ProjRw

P_ProjL = P_WL × matProjection

P_ProjR = P_WR × matProjection

P_WL = (P_Wx − λ, P_Wy)

P_WR = (P_Wx + λ, P_Wy)

P_W = P × matWorldView

A viewable region span is how wide its viewable region is on the screen.

Maximum viewable region span The maximum viewable region span, denoted by θ_max, is the maximum allowable viewable region span. This value, which is set by the user, is one of the main factors in the LOD Algorithm, and it also plays a key part in the Subdivision Algorithm. In the LOD Algorithm, if any given viewable region has a viewable regions span greater than θ_max, it is subdivided into 4 new viewable regions.

Relative quadrant code This code identifies the relative position of a split viewable region in relation to its parent viewable region. This code is calculated in the Subdivision Algorithm, and utilized by the LOD Transition Algorithm. Usually encoded as a 2-bit mask, this code becomes part of the definition of a viewable region.

The algorithm has three main steps.

Overview

Step 1 (Preparation): "Initialize" + "Intermediate Region Buffer"

Just setup; I won't wear out my hands explaining it.

Step 2 (The Subdivision Algorithm): The loop, including "Dispose Region" and "Final Region Buffer"

Step 3 (The LOD Algorithm): "Final Region Buffer" + "LOD" + "Output"

Stage 2 - The Subdivision Algorithm

For each iteration of the loop, feed an input stream of viewable regions into the geometry shader stage of the rendering pipeline. The input stream for the first iteration is the initial input stream that was created in Stage 1. All subsequent iterations use the intermediate output stream from the preceding iteration as the input.

Two output streams are utilized in the geometry shader. An intermediate output stream is used to hold viewable regions that are intended for further processing. A final output stream is used to hold viewable regions that require no further processing in the next iteration, and are to be rendered as part of Stage 3 of the algorithm. Within the geometry shader, each viewable region is processed and one of the following three actions is performed:

If the viewable region is determined to not intersect with the view frustum, it is disposed of. The culled viewable region is not added to either the intermediate or the final output stream. Make sure to account for displacements in Stage 3 of the algorithm when determining whether or not the viewable region intersects the view frustum.
If the viewable region span θ, written as θ(RC,Rλ), is greater than the maximum viewable region span θmax, then the viewable region R is split into four quadrants (R1,R2,R3,R4). The quadrants, each of which become viewable regions themselves, are then added to the intermediate output stream to be reprocessed at the next iteration of the loop. Their relative quadrant code (unique to each quadrant) is also added to the output stream to identity the relative location of the split viewable regions to their parent viewable region. This code is later utilized by the LOD Transition Algorithm when rendering each viewable region. To split the viewable region R, create four new viewable regions (R1,R2,R3,R4) with their respective center positions (R1_C,R2_C,R3_C,R4_C) and applied offsets (R1_λ,R2_λ,R3_λ,R4_λ) defined as so:

R1_C = (R_Cx − .5 × R_λ, R_Cy + .5 × R_λ)

R2_C = (R_Cx + .5 × R_λ/, R_Cy + .5 × R_λ)

R3_C = (R_Cx + .5 × R_λ, R_Cy − .5 × R_λ)

R4_C = (R_Cx − .5 × R_λ, R_Cy − .5 × R_λ)

R1_λ = .5 × R_λ

R2_λ = .5 × R_λ

R3_λ = .5 × R_λ

R4_λ = .5 × R_λ

If the viewable region span θ(R_C,R_λ), is less than or equal to the maximum viewable region span θmax, then the viewable region R is added to final output stream of viewable regions.

Also note that on the last iteration through the loop, all viewable regions must be added to the output stream in Stage 2.

Stage 3 - The LOD Transition Algorithm

In Stage 3 of the algorithm, a stream of viewable regions are rendered. The need for a stage 3: A given viewable region R with an applied offset of R_λ may be adjacent to another viewable region with an applied offset that is either half or double the size of R_λ. Without performing a smooth transition between the two differently sized viewable regions, there would be visible discontinuities or other visual anomalies when rendering.

To avoid cracks in our tessellation, each viewable region is rendered by splitting the viewable region into four quadrilaterals, denoted by Q1, Q2, Q3, and Q4. The boundary points for each quadrilateral, comprised from the collection of static boundary points (P₁, P₂, P₃, P₄) and the morphing boundary points (P_L, P_T, P_R, P_B, P_C), are defined as follows:

Boundary Points

Collectively, these non-overlapping quadrilaterals will cover the same surface area as the viewable region they are created from. The boundary points of each quadrilateral are calculated to align with the boundary points of the neighboring viewable region quadrilaterals. The morphing quadrilateral boundary points (P_L, P_T, P_R, P_B, P_C) are calculated as so:

Morphing boundary points

Given a viewable region span θ at point P and applied offset λ, written as θ(P, λ), use the following formula to calculate a morphing factor - written as T(P, λ):

Morphing Factor

We calculate each of the general morphing factors (T_L, T_T, T_R, T_B, T_C) for a viewable region R with a center position R_C and applied offset R_λ as so:

General morphing factors

The problem is, these general morphing factors only work when the neighboring viewable regions have the same applied offset as R_λ. See below: a diagram of the various positions used when calculating the general morphing factors.

There are two special boundary cases we need to handle when calculating the morphing factors. These special cases arise when one viewable region is adjacent to another viewable region of a larger or smaller applied offset λ. The cases are defined as so:

Special Boundary Cases

Special Boundary Case 1

The viewable region R with applied offset R_λ is adjacent to a smaller viewable region with applied offset of .5R_λ. We will conditionally set the following morphing factors as follows:

Special Boundary Case 1

The above set of conditionals test for the cases where an adjacent viewable region has been split into smaller viewable regions. We therefore need to lock the affected morphing factors to 1. This is to ensure that all of the overlapping quadrilateral vertices exactly match with those of the smaller adjacent viewable region. The image below provides a diagram of the various positions used when calculating the morphing factors for Boundary Case 1.

Positions for Boundary Case 1

Boundary Case 2

The viewable region with applied offset R_λ is adjacent to a larger viewable region with applied offset of 2R_λ. In order to be able to test for this case, we need to ensure we have some additional information regarding the current viewable region we are rendering. In Stage 2 of our algorithm, we needed to store the relative quadrant code for each viewable region. This is used now to correctly calculate the center positions for larger neighboring viewable regions. This also allows us to calculate the edges of the viewable region adjacent to larger neighboring viewable regions. We will conditionally set the following morphing factors as so, based on the relative quadrant code for the viewable region:

Morphing Factors based on relative quadrant codes

The above set of conditionals test for the cases where an adjacent viewable region has not been split to the same size as the current viewable region and instead has an applied offset of 2R_λ. We therefore need to lock the affected morphing factors to 0 to ensure that all of the overlapping quadrilateral vertices exactly match with those of the larger adjacent viewable region. The image below provides diagrams of the various positions used with each viewable region when calculating the morphing factors for Boundary Case 2, based on the relative quadrant code for the viewable region:

Boundary Case 2 relative quadrant code

Yay! You made it to the end! Just as I was beginning to enjoy slamming my head against a table! Wait, what's that you ask? A code sample? Well, sure! Here you go - VC++2008 and VC++2010 versions included. Enjoy!