Turnips in the wild (Part 2)

8 minute read

In Turnips in the wild (Part 1) we walked through two issues, one in TauCeti Benchmark and the other in Genshin Impact. Today, I have an update about the one I didn’t have plan to fix, and a showcase of two remaining issues I met in Genshin Impact.

Genshin Impact

Gameplay – Disco Water

In the previous post I said that I’m not planning to fix the broken water effect since it relied on undefined behavior.

As a refresher, the issue was caused by fragment shader not writing anything into the second attachment, which is UB in Vulkan:

UNASSIGNED-CoreValidation-Shader-InputNotProduced(WARN / SPEC): Validation Warning:
Attachment 1 not written by fragment shader; undefined values will be written to attachment
Screenshot of the gameplay with body of water that has large colorful artifacts

However, I was notified that same issue was fixed in OpenGL driver for Adreno (Freedreno) and the fix is rather easy. Even though for Vulkan it is clearly an undefined behavior, with other APIs it might not be so clear. Thus, given that we want to support translation from other APIs, there are already apps which rely on this behavior, and it would be just a bit more performant - I made a fix for it.

Screenshot of the gameplay with body of water without artifacts

The issue was fixed by “tu: do not corrupt unwritten render targets (!10489)”

Login Screen

The login screen welcomes us with not-so-healthy colors:

Screenshot of a login screen in Genshin Impact which has wrong colors - columns and road are blue and white

And with a few failures to allocate registers in the logs. The failure to allocate registers isn’t good and may cause some important shader not to run, but let’s hope it’s not that. Thus, again, we should take a closer look at the frame.

Once the frame is loaded I’m staring at an empty image at the end of the frame… Not a great start.

Such things mostly happen due to a GPU hang. Since I’m inspecting frames on Linux I took a look at dmesg and confirmed the hang:

 [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence ...

Fortunately, after walking through draw calls, I found that the mis-rendering happens before the call which hangs. Let’s look at it:

Screenshot of a correct draw call right before the wrong one being inspected in RenderDoc
Draw call right before
Screenshot of a draw call, that draws the wrong colors, being inspected in RenderDoc
Draw call with the issue

It looks like some fullscreen effect. As in the previous case - the inputs are fine, the only image input is a depth buffer. Also, there are always uniforms passed to the shaders, but when there is only a single problematic draw call - they are rarely an issue (also they are easily comparable with the proprietary driver if I spot some nonsensical values among them).

Now it’s time to look at the shader, ~150 assembly instructions, nothing fancy, nothing obvious, and a lonely kill near the top. Before going into the most “fun” part, it’s a good idea to make sure that the issue is 99% in the shader. RenderDoc has a cool feature which allows to debug shader (its SPIRV code) at a certain fragment (or vertex, or CS invocation), it does the evaluation on CPU, so I can use it as some kind of a reference implementation. In our case the output between RenderDoc and actual shader evaluation on GPU is different:

Screenshot of the color value calculated on CPU by RenderDoc
Evaluation on CPU: color = vec4(0.17134, 0.40289, 0.69859, 0.00124)
Screenshot of the color value calculated on GPU
On GPU: color = vec4(3.1875, 4.25, 5.625, 0.00061)

Knowing the above there is only one thing left to do - reduce the shader until we find the problematic instruction(s). Fortunately there is a proprietary driver which renders the scene correctly, therefor instead of relying on intuition, luck, and persistance - we could quickly bisect to the issue by editing and comparing the edited shader with a reference driver. Actually, it’s possible to do this with shader debugging in RenderDoc, but I had problems with it at that moment and it’s not that easy to do.

The process goes like this:

  1. Decompile SPIRV into GLSL and check that it compiles back (sometimes it requires some editing)
  2. Remove half of the code, write the most promising temporary variable as a color, and take a look at results
  3. Copy the edited code to RenderDoc instance which runs on proprietary driver
  4. Compare the results
  5. If there is a difference - return deleted code, now we know that the issue is probably in it. Thus, bisect it by returning to step 2.

This way I bisected to this fragment:

_243 = clamp(_243, 0.0, 1.0);
_279 = clamp(_279, 0.0, 1.0);
float _290;
if (_72.x) {
  _290 = _279;
} else {
  _290 = _243;
}

color0 = vec4(_290);
return;

Writing _279 or _243 to color0 produced reasonable results, but writing _290 produced nonsense. The difference was only the presence of condition. Now, having a minimal change which reproduces the issue, it’s possible to compare native assembly.

Bad:

mad.f32 r0.z, c0.y, r0.x, c6.w
sqrt r0.y, r0.y
mul.f r0.x, r1.y, c1.z
(ss)(nop2) mad.f32 r1.z, c6.x, r0.y, c6.y
(nop3) cmps.f.ge r0.y, r0.x, r1.w
(sat)(nop3) sel.b32 r0.w, r0.z, r0.y, r1.z

Good:

(sat)mad.f32 r0.z, c0.y, r0.x, c6.w
sqrt r0.y, r0.y
(ss)(sat)mad.f32 r1.z, c6.x, r0.y, c6.y
(nop2) mul.f r0.y, r1.y, c1.z
add.f r0.x, r0.z, r1.z
(nop3) cmps.f.ge r0.w, r0.y, r1.w
cov.u r1.w, r0.w
(rpt2)nop
(nop3) add.f r0.w, r0.x, r1.w

By running them in my head I reasoned that they should produce the same results. Something works not as expected. After a bit more changes in GLSL, it became apparent that something wrong with clamp(x, 0, 1) which is translated into (sat) modifier for instructions. A bit more digging and I found out that hardware doesn’t understand saturation modifier being placed on sel. instruction (sel is a selection between two values based on third).

Disallowing compiler to place saturation on sel instruction resolved the bug:

Login screen after the fix

The issue was fixed by “ir3: disallow .sat on SEL instructions (!9666)”

Gameplay – Where did the trees go?

Screenshot of the gameplay with trees and grass being almost black

The trees and grass are seem to be rendered incorrectly. After looking through the trace and not finding where they were actually rendered, I studied the trace on proprietary driver and found them. However, there weren’t any such draw calls on Turnip!

The answer was simple, shaders failed to compile due to the failure in a register allocation I mentioned earlier… The general solution would be an implementation of register spilling. However in this case there is a pending merge request that implements a new register allocator, which later would help us implement register spilling. With it shaders can now be compiled!

Screenshot of the gameplay with trees and grass being rendered correctly

More Turnip adventures to come!

Comments