In the comments to one of the answers I noticed you made a reference to another answer I wrote, and I wanted to make sure you understood the original context of that answer.
While the answers thus far have all focused on the complexity of the actual comparison itself, they do not consider what impact changing the value stored in the hardware depth buffer has on the performance of shadow map construction. You are concerned about the implications on early Z rejection, and this is a valid concern, but it will only affect the performance of shadow map construction as I will explain below.
Also, keep in mind that the answer you are referring to pertains to cubemap based shadow maps. They have their own set of challenges to deal with because of the unique way that they are constructed and sampled, and this is why the comparison is slightly different than you will see in other contexts.
Modern GPUs compress the color and depth buffers using a hierarchy of tiles to increase memory throughput.
This compression does not save storage space (in fact, it adds a little bit extra storage overhead), but what it does do is allow much quicker buffer clears and fetches. Instead of writing the same color or depth to every single pixel in the buffer, each tile can be flagged as "clear" and given a clear color/depth. When it comes time to fetch the color/depth for a pixel the first thing that happens is the tile the pixel belongs to is looked at, if the entire tile is clear, then the tile's color is returned instead of going through the trouble of fetching the actual pixel from memory.
Great... but what does compression have to do with early depth testing?
A lot, actually. This hierarchical memory structure lends itself nicely to rejecting large groups of fragments at a time, because the min/max depth for an entire tile worth of pixels can be determined in a single specialized memory operation. This does mean that writing to the color/depth buffer is a lot more complicated (having to update flags and such per-tile), but the hardware is specifically designed to work this way and a lot of the time you do not have to do anything special to benefit from it.
Now, even though the rasterizer has a simple fixed-function job to do, it does some pretty clever things whenever hierarchical Z-buffering (Hi-Z) is applicable. Given the fact that all primitives are planar, if the rasterizer can be guaranteed that a fragment shader is not going to alter the depth, it can perform a coarse-grained (e.g. 1 test per-tile in the compressed depth buffer) depth test using a min/max depth value and kill multiple fragments before shading/blending. If this coarse-grained test passes, or if the fragment shader writes its own depth, then each fragment must be shaded and then tested against the depth buffer on an individual basis afterwards. Now, in your case, the fragment shader is pretty simple so the expense of unnecessarily shading fragments will not be as much as it typically would be and blending is also not a factor in a depth-only pass.
However, having to do a late depth test per-fragment on primitives that are completely occluded is a waste of time that Hi-Z could have avoided. A lot of the measurable expense of depth-only rendering is actually front-end CPU overhead caused by the draw call(s) themselves (state validation, command serialization, etc.). Assuming your depth-only pass is batched efficiently, you can squeeze out a little bit more performance by making the depth test more efficient. Just do not expect to see a huge improvement in performance, there are multiple reasons described above why Hi-Z works better for more traditional rendering.
By the way, if you want a visual summary of most of what I just explained, have a look here.
Back to your original question...
In the end, properly exploiting hierarchical Z-buffering during the construction of your shadow map is not going to produce huge performance gains, but it can outweigh the gains to be had by reducing the number of arithmetic instructions necessary to compare depth. It mostly depends on how frequently you update your shadow map. On the one hand, it really does not matter how efficient the hardware fills your shadow map if you only do it once (static). On the other hand, if you have to draw 6 independent shadow maps per-light per-frame there will be a real measurable improvement in performance if you can decrease the amount of time it takes to draw each of those.
The elephant in the room here that is not being considered is the amount of time it takes to fetch the depth from a shadow map in the first place (much more time than your comparison). You can speed up shadow map construction and comparison all you want, but some of the biggest benefits come from improving the re-construction (sampling) performance.
Anti-aliasing VSM shadows, for example, can be done using traditional texture filtering instead of the multiple convoluted samples and comparisons you have to perform for other techniques. This makes anti-aliased re-construction from a VSM more efficient. Because VSM is based on variance, it does not require the storage of perspective depth... you can use linear distance if you want, it makes no difference for this algorithm. Even though the construction (storing d and d2) is more complicated, if you need anti-aliasing, it can be more efficient.
Clearly there is no one-size-fits-all, what you store in your shadow map largely depends on your algorithm.