Example: Run a headless OpenGL (ES) compute shader via DRM render-nodes

It has been a long time indeed since my last entry here. But I have actually been quite busy on a new adventure: graphics driver development.

Two years ago I started contributing to Mesa, mostly to the Intel i965 backend, as a member of the Igalia Graphics Team. During this time I have been building my knowledge around GPU driver development, the different parts of the Linux graphics stack, its different projects, tools, and the developer communities around them. It is still the beginning of the journey, but I’m already very proud of the multiple contributions our team has made.

But today I want to start a series of articles discussing basic examples of how to do cool things with the Khronos APIs and the Linux graphics stack. It will be a collection of short and concise programs that achieve a concrete goal, pretty much what I wish existed when I looked myself up for them in the public domain.

If you, like me, are the kind of person that learns by doing, growing existing examples; then I hope you will find this series interesting, and also encouraging to write your own examples.

Before finishing this sort of introduction and before we enter into the matter, I want to leave my stance on what I consider a good (minimal) example program, because I think it will be useful for this and future examples, to help the reader understand what to expect and how are they different from similar code found online. That said, not all examples in the series will be minimal, though.

A minimal example should:

  • provide as minimum boilerplate as possible. E.g, wrapping the example in a C++ class is unnecessary noise unless your are demoing OOP.
  • be contained in a single source code file if possible. Following the code flow across multiple files adds mental overhead.
  • have none or as minimum dependencies as possible. The reader should not need to install stuff that are not strictly necessary to try the example.
  • not favor function over readability. It doesn’t matter if it is not a complete or well-behaving program if that adds stuff not related with the one thing being showcased.

Ok, now we are ready to move to the first example in this series: the simplest way to embed and run an OpenGL-ES compute shader in your C program on Linux, without the need of any GUI window or connection to the X (or Wayland) server. Actually, it isn’t even necessary to be running any window system at all. Also, the program doesn’t need any special privilege, other than access to the unprivileged DRI/DRM infrastructure. In modern distros, this access is typically granted by being member of the video group.

The code is fairly short, so I’m inline-ing it below. It can also be checked-up on my gpu-playground repository. See below for a quick explanation of its most interesting parts.

#include <EGL/egl.h>
#include <EGL/eglext.h>
#include <GLES3/gl31.h>
#include <assert.h>
#include <fcntl.h>
#include <gbm.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

/* a dummy compute shader that does nothing */
#define COMPUTE_SHADER_SRC "          \
#version 310 es\n                                                       \
                                                                        \
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;       \
                                                                        \
void main(void) {                                                       \
   /* awesome compute code here */                                      \
}                                                                       \
"

int32_t
main (int32_t argc, char* argv[])
{
   bool res;

   int32_t fd = open ("/dev/dri/renderD128", O_RDWR);
   assert (fd > 0);

   struct gbm_device *gbm = gbm_create_device (fd);
   assert (gbm != NULL);

   /* setup EGL from the GBM device */
   EGLDisplay egl_dpy = eglGetPlatformDisplay (EGL_PLATFORM_GBM_MESA, gbm, NULL);
   assert (egl_dpy != NULL);

   res = eglInitialize (egl_dpy, NULL, NULL);
   assert (res);

   const char *egl_extension_st = eglQueryString (egl_dpy, EGL_EXTENSIONS);
   assert (strstr (egl_extension_st, "EGL_KHR_create_context") != NULL);
   assert (strstr (egl_extension_st, "EGL_KHR_surfaceless_context") != NULL);

   static const EGLint config_attribs[] = {
      EGL_RENDERABLE_TYPE, EGL_OPENGL_ES3_BIT_KHR,
      EGL_NONE
   };
   EGLConfig cfg;
   EGLint count;

   res = eglChooseConfig (egl_dpy, config_attribs, &cfg, 1, &count);
   assert (res);

   res = eglBindAPI (EGL_OPENGL_ES_API);
   assert (res);

   static const EGLint attribs[] = {
      EGL_CONTEXT_CLIENT_VERSION, 3,
      EGL_NONE
   };
   EGLContext core_ctx = eglCreateContext (egl_dpy,
                                           cfg,
                                           EGL_NO_CONTEXT,
                                           attribs);
   assert (core_ctx != EGL_NO_CONTEXT);

   res = eglMakeCurrent (egl_dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, core_ctx);
   assert (res);

   /* setup a compute shader */
   GLuint compute_shader = glCreateShader (GL_COMPUTE_SHADER);
   assert (glGetError () == GL_NO_ERROR);

   const char *shader_source = COMPUTE_SHADER_SRC;
   glShaderSource (compute_shader, 1, &shader_source, NULL);
   assert (glGetError () == GL_NO_ERROR);

   glCompileShader (compute_shader);
   assert (glGetError () == GL_NO_ERROR);

   GLuint shader_program = glCreateProgram ();

   glAttachShader (shader_program, compute_shader);
   assert (glGetError () == GL_NO_ERROR);

   glLinkProgram (shader_program);
   assert (glGetError () == GL_NO_ERROR);

   glDeleteShader (compute_shader);

   glUseProgram (shader_program);
   assert (glGetError () == GL_NO_ERROR);

   /* dispatch computation */
   glDispatchCompute (1, 1, 1);
   assert (glGetError () == GL_NO_ERROR);

   printf ("Compute shader dispatched and finished successfully\n");

   /* free stuff */
   glDeleteProgram (shader_program);
   eglDestroyContext (egl_dpy, core_ctx);
   eglTerminate (egl_dpy);
   gbm_device_destroy (gbm);
   close (fd);

   return 0;
}

You have probably noticed that the program can be divided in 4 main parts:

  1. Creating a GBM device from a render-node
  2. Setting up a (surfaceless) EGL/OpenGL-ES context
  3. Creating a compute shader program
  4. Dispatching the compute shader

The first two parts are the most relevant to the purpose of this article, since they allow our program to run without requiring a window system. The rest is standard OpenGL code to setup and execute a compute shader, which is out of scope for this example.

Creating a GBM device from a render-node

What’s a render-node anyway? Render-nodes are the “new” DRM interface to access the unprivileged functions of a DRI-capable GPU. It is actually not new, only that this API was previously part of the single (privileged) interface exposed at /dev/dri/cardX. During Linux kernel 3.X series, the DRM driver started exposing the unprivileged part of its user-space API via the render-node interface, as a separate device file (/dev/dri/renderDXX). If you want to know more about render-nodes, there is a section about it in the Linux Kernel documentation and also a brief explanation on Wikipedia.

   int32_t fd = open ("/dev/dri/renderD128", O_RDWR);

The first step is to open the render-node file for reading and writing. On my system, it is exposed as /dev/dri/renderD128. It may be different on other systems, and any serious code would want to detect and select the appropiate render node file first. There can be more than one, one for each DRI-capable GPU in your system.

It is the render-node interface that ultimately allows us to use the GPU for computing only, from an unprivileged program. However, to be able to plug into this interface, we need to wrap it up with a Generic Buffer Manager (GBM) device, since this is the interface that Mesa (the EGL/OpenGL driver we are using) understands.

   struct gbm_device *gbm = gbm_create_device (fd);

That’s it. We now have a GBM device that is able to send commands to a GPU via its render-node interface. Next step is to setup an EGL and OpenGL context.

Setting up a (surfaceless) EGL/OpenGL-ES context

EGLDisplay egl_dpy = eglGetPlatformDisplay (EGL_PLATFORM_GBM_MESA, gbm, NULL);

This is the most interesting line of code in this section. We get access to an EGL display using the previously created GBM device as the platform-specific handle. Notice the EGL_PLATFORM_GBM_MESA identifier, which is the Mesa specific enum to interface a GBM device.

   const char *egl_extension_st = eglQueryString (egl_dpy, EGL_EXTENSIONS);
   assert (strstr (egl_extension_st, "EGL_KHR_surfaceless_context") != NULL);

Here we check that we got a valid EGL display that is able to create a surfaceless context.

   res = eglMakeCurrent (egl_dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, core_ctx);

After setting up the EGL context and binding the OpenGL API we want (ES 3.1 in this case), we have the last line of code that requires attention. When activating the EGL/OpenGL-ES context, we want to make sure we specify EGL_NO_SURFACE, because that’s what we want right?

And that’s it. We now have an OpenGL context and we can start making OpenGL calls normally. Setting up a compute shader and dispatching it should be uncontroversial, so we leave it out of this article for simplicity.

What next?

Ok, with around 100 lines of code we were able to dispatch a (useless) compute shader program. Here are some basic ideas on how to grow this example to do something useful:

  • Dynamically detect and select among the different render-node interfaces available.
  • Add some input and output buffers to get data in and out of the compute shader execution.
  • Do some cool processing of the input data in the shader, then use the result back in the C program.
  • Add proper error checking, obviously.

In future entries, I would like to explore also minimal examples on how to do this using the Vulkan and OpenCL APIs. And also implement some cool routine in the shader that does something interesting.

Conclusion

With this example I tried to demonstrate how easy is to exploit the capabilities of modern GPUs for general purpose computing. Whether for off-screen image processing, computer vision, crypto, physics simulation, machine learning, etc; there are potentially many cases where off-loading work from the CPU to the GPU results in greater performance and/or energy efficiency.

Using a fairly modern Linux graphics stack, Mesa, and a DRI-capable GPU such as an Intel integrated graphics card; any C program or library can embed a compute shader today, and use the GPU as another computing unit.

Does your application have routines that could potentially be moved to an GLSL compute program? I would love to hear about it.