Command stream editing as an effective method to debug driver issues
In previous posts, “Graphics Flight Recorder - unknown but handy tool to debug GPU hangs” and “Debugging Unrecoverable GPU Hangs”, I demonstrated a few tricks of how to identify the location of GPU fault.
But what’s the next step once you’ve roughly pinpointed the issue? What if the problem is only sporadically reproducible and the only way to ensure consistent results is by replaying a trace of raw GPU commands? How can you precisely determine the cause and find a proper fix?
Sometimes, you may have an inkling of what’s causing the problem, then and you can simply modify the driver’s code to see if it resolves the issue. However, there are instances where the root cause remains elusive or you only want to change a specific value without affecting the same register before and after it.
The optimal approach in these situations is to directly modify the commands sent to the GPU. The ability to arbitrarily edit the command stream was always an obvious idea and has crossed my mind numerous times (and not only mine – proprietary driver developers seem to employ similar techniques). Finally, the stars aligned: my frustration with a recent bug, the kernel’s new support for user-space-defined GPU addresses for buffer objects, the tool I wrote to replay command stream traces not so long ago, and the realization that implementing a command stream editor was not as complicated as initially thought.
The end result is a tool for Adreno GPUs (with msm kernel driver) to decompile, edit, and compile back command streams: “freedreno,turnip: Add tooling to edit command streams and use them in ‘replay’”.
The primary advantage of this command stream editing tool lies the ability to rapidly iterate over hypotheses. Another highly valuable feature (which I have plans for) would be the automatic bisection of the command stream, which would be particularly beneficial in instances where only the bug reporter has the necessary hardware to reproduce the issue at hand.
How the tool is used
# Decompile one command stream from the trace
./rddecompiler -s 0 gpu_trace.rd > generate_rd.c
# Compile the executable which would output the command stream
meson setup . build
ninja -C build
# Override the command stream with the commands from the generator
./replay gpu_trace.rd --override=0 --generator=./build/generate_rd
Reading dEQP-VK.renderpass.suballocation.formats.r5g6b5_unorm_pack16.clear.clear.rd...
gpuid: 660
Uploading iova 0x100000000 size = 0x82000
Uploading iova 0x100089000 size = 0x4000
cmdstream 0: 207 dwords
generating cmdstream './generate_rd --vastart=21441282048 --vasize=33554432 gpu_trace.rd'
Uploading iova 0x4fff00000 size = 0x1d4
override cmdstream: 117 dwords
skipped cmdstream 1: 248 dwords
skipped cmdstream 2: 223 dwords
The decompiled code isn’t pretty:
/* pkt4: GRAS_SC_SCREEN_SCISSOR[0].TL = { X = 0 | Y = 0 } */
pkt4(cs, REG_A6XX_GRAS_SC_SCREEN_SCISSOR_TL(0), (2), 0);
/* pkt4: GRAS_SC_SCREEN_SCISSOR[0].BR = { X = 32767 | Y = 32767 } */
pkt(cs, 2147450879);
/* pkt4: VFD_INDEX_OFFSET = 0 */
pkt4(cs, REG_A6XX_VFD_INDEX_OFFSET, (2), 0);
/* pkt4: VFD_INSTANCE_START_OFFSET = 0 */
pkt(cs, 0);
/* pkt4: SP_FS_OUTPUT[0].REG = { REGID = r0.x } */
pkt4(cs, REG_A6XX_SP_FS_OUTPUT_REG(0), (1), 0);
/* pkt4: SP_TP_RAS_MSAA_CNTL = { SAMPLES = MSAA_FOUR } */
pkt4(cs, REG_A6XX_SP_TP_RAS_MSAA_CNTL, (2), 2);
/* pkt4: SP_TP_DEST_MSAA_CNTL = { SAMPLES = MSAA_FOUR } */
pkt(cs, 2);
/* pkt4: GRAS_RAS_MSAA_CNTL = { SAMPLES = MSAA_FOUR } */
pkt4(cs, REG_A6XX_GRAS_RAS_MSAA_CNTL, (2), 2);
Shader assembly is editable:
const char *source = R"(
shps #l37
getone #l37
cov.u32f32 r1.w, c504.z
cov.u32f32 r2.x, c504.w
cov.u32f32 r1.y, c504.x
....
end
)";
upload_shader(&ctx, 0x100200d80, source);
emit_shader_iova(&ctx, cs, 0x100200d80);
However, not everything is currently editable, such as descriptors. Despite this limitations, the existing functionality is sufficient for the majority of cases.
Comments