Modesetting: A Glamor-less RPi adventure

Posted in Uncategorized on May 30th, 2022 by cmichael – 2 Comments
The goal of this adventure is to have hardware acceleration for applications when we have Glamor disabled in the X server.

What is Glamor ?

Glamor is a GL-based rendering acceleration library for the X server that can use OpenGL, EGL, or GBM. It uses GL functions & shaders to complete 2D graphics operations, and uses normal textures to represent drawable pixmaps where possible. Glamor calls GL functions to render to a texture directly and is somehow hardware independent. If the GL rendering cannot complete due to failure (or not being supported), then Glamor will fallback to software rendering (via llvmpipe) which uses framebuffer functions.

Why disable Glamor ?

On current RPi images like bullseye, Glamor is disabled by default for RPi 1-3 devices. This means that there is no hardware acceleration out of the box. The main reason for not using Glamor on RPi 1-3 hardware is because it uses GPU memory (CMA memory) which is limited to 256Mb. If you run out of CMA memory, then the X server cannot allocate memory for pixmaps and your system will crash. RPi 1-3 devices currently use V3D as the render GPU. V3D can only sample from tiled buffers, but it can render to tiled or linear buffers. If V3D needs to sample from a linear buffer, then we allocate a shadow buffer and transform the linear image to a tiled layout in the shadow buffer and sample from the shadow buffer. Any update of the linear texture implies updating the shadow image… and that is SLOW. With Glamor enabled in this scenario, you will quickly run out of CMA memory and crash. This issue is especially apparent if you try launching Chromium in full screen with many tabs opened.

Where has my hardware acceleration gone ?

On RPi 1-3 devices, we default to the modesetting driver from the X server. For those that are not aware, ‘modesetting’ is an Xorg driver for Kernel Modesetting (KMS) devices. The driver supports TrueColor visuals at various framebuffer depths and also supports RandR 1.2 for multi-head configurations. This driver supports all hardware where a KMS device is available and uses the Linux DRM ioctls or dumb buffer objects to create & map memory for applications to use. This driver can be used with Glamor to provide hardware acceleration, however that can lead to the X server crashing as mentioned above. Without enabling Glamor, then the modesetting driver cannot do hardware acceleration and applications will render using software (dumb buffer objects). So how can we get hardware acceleration without Glamor ? Let’s take an adventure into the land of Direct Rendering…

What is Direct Rendering ?

Direct rendering allows for X client applications to perform 3D rendering using direct access to the graphics hardware. User-space programs can use the DRM API to command the GPU to do hardware-accelerated 3D rendering and video decoding. You may be thinking “Wow, this could solve the problem” and you would be correct. If this could be enabled in the modesetting driver without using Glamor, then we could have hardware acceleration without having to worry about the X server crashing. It cannot be that difficult, right ? Well, as it turns out, things are not so simple. The biggest problem with this approach is that the DRI2 implementation inside the modesetting driver depends on Glamor. DRI2 is a version of the Direct Rendering Infrastructure (DRI). It is a framework comprising the modern Linux graphics stack that allows unprivileged user-space programs to use graphics hardware. The main use of DRI is to provide hardware acceleration for the Mesa implementation of OpenGL. So what approach should be taken ? Do we modify the modesetting driver code to support DRI2 without Glamor ? Is there a better way to get direct rendering without DRI2 ? As it turns out, there is a better way…enter DRI3.

DRI3 to the rescue ?

The main purpose of the DRI3 extension is to implement the mechanism to share direct rendered buffers between DRI clients and the X Server. With DRI3, clients can allocate the render buffers themselves instead of relying on the X server for doing the allocation. DRI3 clients allocate and use GEM buffers objects as rendering targets, while the X Server represents these render buffers using a pixmap. After initialization the client doesn’t make any extra calls to the X server, except perhaps in the case of window resizing. Utilizing this method, we should be able to avoid crashing the X server if we run out of memory, right ? Well once again, things are not as simple as they appear to be…

So using DRI3 & GEM can save the day ?

With GEM, a user-space program can create, handle and destroy memory objects living in the GPU memory. When a user-space program needs video memory (for a framebuffer, texture or any other data), it requests the allocation from the DRM driver using the GEM API. The DRM driver keeps track of the used video memory and is able to comply with the request if there is free memory available. You may recall from earlier that the main reason for not using Glamor on RPi 1-3 hardware is because it uses GPU memory (CMA memory) which is limited to 256Mb, so how can using DRI3 with GEM help us ? The short answer is “it does not”…at least, not if we utilize GEM.

Where do we go next ?

Surely there must be a way to have hardware acceleration without using all of our GPU memory ? I am glad you asked because there is a solution that we will explore in my next blog post.