Low-resolution-Z on Adreno GPUs

July 13, 2022 10 minute read

What is LRZ?

Citing official Adreno documentation:

[A Low Resolution Z (LRZ)] pass is also referred to as draw order independent depth rejection. During the binning pass, a low resolution Z-buffer is constructed, and can reject LRZ-tile wide contributions to boost binning performance. This LRZ is then used during the rendering pass to reject pixels efficiently before testing against the full resolution Z-buffer.

My colleague Samuel Iglesias did the initial reverse-engineering of this feature; for its in-depth overview you could read his great blog post Low Resolution Z Buffer support on Turnip.

Here are a few excerpts from this post describing what is LRZ?

To understand better how LRZ works, we need to talk a bit about tiled-based rendering. This is a way of rendering based on subdividing the framebuffer in tiles and rendering each tile separately.

The binning pass processes the geometry of the scene and records in a table on which tiles a primitive will be rendered. By doing this, the HW only needs to render the primitives that affect a specific tile when is processed.

The rendering pass gets the rasterized primitives and executes all the fragment related processes of the pipeline. Once it finishes, the resolve pass starts.

Where is LRZ used then? Well, in both binning and rendering passes. In the binning pass, it is possible to store the depth value of each vertex of the geometries of the scene in a buffer as the HW has that data available. That is the depth buffer used internally for LRZ. It has lower resolution as too much detail is not needed, which helps to save bandwidth while transferring its contents to system memory.

Thanks to LRZ, the rendering pass is only executed on the fragments that are going to be visible at the end.

LRZ brings a couple of things on the table that makes it interesting. One is that applications don’t need to reorder their primitives before submission to be more efficient, that is done by the HW with LRZ automatically.

Now, a year later, I returned to this feature to make some important improvements, for nitty-gritty details you could dive into Mesa MR#16251 “tu: Overhaul LRZ, implement on-GPU dir tracking and LRZ fast-clear”. There I implemented on-GPU LRZ direction tracking, LRZ reuse between renderpasses, and fast-clear of LRZ.

In this post I want to give a practical advice, based on things I learnt while reverse-engineering this feature, on how to help driver to enable LRZ. Some of it could be self-evident, some is already written in the official docs, and some cannot be found there. It should be applicable for Vulkan, GLES, and likely for Direct3D.

Do not change the direction of depth comparisons

Or rather, when writing depth - do not change the direction of depth comparisons. If depth comparison direction is changed while writing into depth buffer - LRZ would have to be disabled.

Why? Because if depth comparison direction is GREATER - LRZ stores the lowest depth value of the block of pixels, if direction is LESS - it stores the highest value of the block. So if direction is changed the LRZ value becomes wrong for the new direction.

A few examples:

Going from VK_COMPARE_OP_GREATER -> VK_COMPARE_OP_GREATER_OR_EQUAL is good;
Going from VK_COMPARE_OP_GREATER -> VK_COMPARE_OP_LESS is bad;
From VK_COMPARE_OP_GREATER with depth write -> VK_COMPARE_OP_LESS without depth write is ok;
- LRZ would be just temporally disabled for VK_COMPARE_OP_LESS draw calls.

The rules could be summarized as:

Changing depth write direction disables LRZ;
For calls with different direction but without depth write LRZ is temporally disabled;
VK_COMPARE_OP_GREATER and VK_COMPARE_OP_GREATER_OR_EQUAL have same direction;
VK_COMPARE_OP_LESS and VK_COMPARE_OP_LESS_OR_EQUAL have same direction;
VK_COMPARE_OP_EQUAL and VK_COMPARE_OP_NEVER don’t have a direction, LRZ is temporally disabled;
- Surprise, your VK_COMPARE_OP_EQUAL compares don’t benefit from LRZ;
VK_COMPARE_OP_ALWAYS and VK_COMPARE_OP_NOT_EQUAL either temporally or completely disable LRZ, depending on depth being written.

Simple rules for fragment shader

Do not write depth

This obviously makes resulting depth value unpredictable, so LRZ has to be completely disabled.

Note, the output values of manually written depth could be bound by conservative depth modifier, for GLSL this is achieved by GL_ARB_conservative_depth extension, like this:

layout (depth_greater) out float gl_FragDepth;

However, Turnip at the moment does not consider this hint, and it is unknown if Qualcomm’s proprietary driver does.

Do not use Blending/Logic OPs/colorWriteMask

All of them make a new fragment value depend on the old fragment value. LRZ is temporary disabled in this case.

Do not have side-effects in fragment shaders

Writing to SSBO, images, … from fragment shader forces late Z, thus it is incompatible with LRZ. At the moment Turnip completely disables LRZ when shader has such side-effects.

Do not discard fragments

Discarding fragments moves the decision whether fragment contributes to the depth buffer to the time of fragment shader execution. LRZ is temporary disabled in this case.

LRZ in secondary command buffers and dynamic rendering

TLDR: Since Snapdragon 865 (Adreno 650) LRZ supported in secondary command buffers.

TLDR: LRZ would work with VK_KHR_dynamic_rendering, but you’d like to avoid using this extension because it isn’t nice to tilers.

Official docs state that LRZ is disabled with “Use of secondary command buffers (Vulkan)”, and on another page that “Snapdragon 865 and newer will not disable LRZ based on this criteria”.

Why?

Because up to Snapdragon 865 tracking of the direction is done on the CPU, meaning that LRZ direction is kept in internal renderpass object, updated and checked without any GPU involvement.

But starting from Snapdragon 865 the direction could be tracked on GPU which allows driver not to know previous LRZ direction during a command buffer construction. Therefor secondary command buffers could now use LRZ!

Recently Vulkan 1.3 came out and mandated the support of VK_KHR_dynamic_rendering. It gets rid of complicated VkRenderpass and VkFramebuffer setup, but much more exciting is a simpler way for parallel renderpasses construction (with VK_RENDERING_SUSPENDING_BIT / VK_RENDERING_RESUMING_BIT flags).

VK_KHR_dynamic_rendering poses a similar challenge for LRZ as secondary command buffers and has the same solution.

Reusing LRZ between renderpasses

TLDR: Since Snapdragon 865 (Adreno 650) LRZ would work if you store depth in one renderpass and load it later, giving depth image isn’t changed in-between.

Another major improvement brought by Snapdragon 865 is the possibility to reuse LRZ state between renderpasses.

The on-GPU direction tracking is part of the equation here, another part is the tracking of a depth view being used. Depth image has a single LRZ buffer which corresponds to a single array layer + single mip level of the image. So if view with different array layer or mip layer is used - LRZ state couldn’t be reused and will be invalidated.

With the above knowledge here are the conditions when LRZ state could be reused:

Depth attachment was stored (STORE_OP_STORE) at the end of some past renderpass;
The same depth attachment with the same depth view settings is being loaded (not cleared) in the current renderpass;
There were no changes in the underlying depth image, meaning there was no vkCmdBlitImage*, vkCmdCopyBufferToImage*, or vkCmdCopyImage*. Otherwise LRZ state would be invalidated;

Misc notes:

LRZ state is saved per depth image so you don’t lose the state if you you have several renderpasses with different depth attachments;
vkCmdClearAttachments + LOAD_OP_LOAD is just equal to LOAD_OP_CLEAR.

Conclusion

While there are many rules listed above - it all boils down to keeping things simple in the main renderpass(es) and not being too clever.

Danylo Piliaiev