To stream or not to stream

Multimedia blog and other fancy stuff

Use EGLStreams in a WPE WebKit backend

Build a WPE WebKit backend based on the EGLStream extension.

In the previous post we saw how to build a basic WPE Backend from scratch. Now we are going to transfer the frames produced by the WPEWebProcess to the application process without copying the graphical data out of the GPU memory.

There are different mechanisms to transfer hardware buffers from one process to another. We are going to use here the EGLStream extension that is available on compatible hardware from the EGL API.

The reference project for this WPE Backend is WPEBackend-offscreen-nvidia. It is using the EGLStream extension to produce the hardware buffers on the WPEWebProcess side, and the EGL_NV_stream_consumer_eglimage extension (that is specific to NVidia hardware) to transform those buffers into EGLImages on the application process side.

N.B. the WPEBackend-offscreen-nvidia project and the content of this post are compatible with the WPE WebKit 2.38.x or 2.40.x versions using the 1.14.x version of libwpe.

With the growing adoption of DMA Buffers on all modern Linux platforms, the WPE WebKit architecture is evolving and, in the future, the need for a WPE Backend should disappear.

Future designs of the libwpe API should allow to directly receive the video buffers on the application process side without needing to implement a different WPE Backend for each hardware platform. From the application developer point of view, it will simplify the usage of WPE WebKit by hiding all multiprocesses considerations.

The EGLStream extension #

EGLStream is an extension from the EGL API. It defines some mechanisms to transfer hardware video buffers from one process to another without getting out of the GPU memory. In this aspect, it fulfils the same purpose as DMA Buffers but without the complexity of negociating the different parameters and handling tranparently multi-planes images. On the other end, the EGLStream extension is less flexible than DMA Buffers and is unlikely to be available on non-NVidia hardware.

In other words: the EGLStream extension is quite easy to use but you will need an NVidia GPU.

The EGLStream goes in one exclusive direction from a producer in charge of creating the video buffer, to a consumer that will receive and present the video buffer. So you need two endpoints, one in the consumer process and one in the producer process.

Indeed, apart from the EGLStream extension itself, you will need other extensions to create the content of the video buffer on the producer side, and to consume this content on the consumer side.

The producer extension used in the WPEBackend-offscreen-nvidia project is EGL_KHR_stream_producer_eglsurface. It is straightforward to use because you create a specific surface the same way you would be creating a PBuffer surface and then when calling eglSwapBuffers the EGL implementation will take care of finishing the drawing and sending the content to the consumer.

On the consumer side, you can use different extensions. The most interesting in our case are:

In WPEBackend-offscreen-nvidia we are using the latter approach, so we can directly transfer the obtained EGLImage to the user application for presentation.

How to initialize the EGLStream between the producer and the consumer #

The order of creation of the EGLStream endpoints, the steps for their connection and configuration of the producer and consumer extensions are quite strict:

1. You start with creating the consumer endpoint of the ELGStream: #

EGLStream createConsumerStream(EGLDisplay eglDisplay) noexcept
{
    static constexpr const EGLint s_streamAttribs[] = {EGL_STREAM_FIFO_LENGTH_KHR, 1,
                                                       EGL_CONSUMER_ACQUIRE_TIMEOUT_USEC_KHR, 1000 * 1000,
                                                       EGL_NONE};
    return eglCreateStreamKHR(eglDisplay, s_streamAttribs);
}

The EGL_STREAM_FIFO_LENGTH_KHR parameter defines the length of the EGLStream queue. If set to 0, the EGLStream will work in mailbox mode which means that each time the producer has a new frame it will empty the stream content and replace the actual frame by the new one. If greater than 0, the EGLStream will work in fifo mode which means that the stream queue can contain up to EGL_STREAM_FIFO_LENGTH_KHR frames.

Here we are configuring a queue of 1 frame because the specification of EGL_KHR_stream_producer_eglsurface guaranties that, in this case, when calling eglSwapBuffers(...) on the producer side the call will block until the consumer reads the previous frame in the EGLStream queue. It will ensure proper rendering synchronization between the application process side and the WPEWebProcess side without needing to rely on IPC which would add an extra delay between frames.

The EGL_CONSUMER_ACQUIRE_TIMEOUT_USEC_KHR parameter defines the maximum timeout in microseconds to wait on the consumer side to acquire a frame when calling eglStreamConsumerAcquireKHR(...). It is only used with the EGL_KHR_stream_consumer_gltexture extension as EGL_NV_stream_consumer_eglimage offers to set a different timeout for each call of its eglQueryStreamConsumerEventNV(...) acquire function.

2. Still on the consumer side, you get the EGLStream file descriptor by calling eglGetStreamFileDescriptorKHR(...). #

Then you need to initialize the consumer mechanism.

If you are using the EGL_KHR_stream_consumer_gltexture extension #

You initialize your OpenGL or GLES context and create the external texture associated with the EGLStream:

glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_EXTERNAL_OES, texture);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

eglStreamConsumerGLTextureExternalKHR(eglDisplay, eglStream);

By calling eglStreamConsumerGLTextureExternalKHR(...) the currently bound GL_TEXTURE_EXTERNAL_OES texture is associated with the provided EGLStream.

If you are using the EGL_NV_stream_consumer_eglimage extension #

In this case the initial configuration is easier as you only need to call the eglStreamImageConsumerConnectNV(...) function to initialize the consumer.

Sending the consumer EGLStream file descriptor to the producer #

Once the consumer initialized, you need to send the EGLStream file descriptor to the producer process through IPC. If the producer process is the parent process of the consumer, you can just send the integer value of the file descriptor and then, on the producer process, get an open handle to the same resource by calling:

long procFD = syscall(SYS_pidfd_open, cpid, 0);
streamFD = syscall(SYS_pidfd_getfd, procFD, streamFD, 0);
close(procFD);

with cpid the PID of the consumer child process and streamFD the integer value of the EGLStream file descriptor on the consumer process side.

If producer and consumer processes are not bound, you cannot use this technic because of processes access rights. In this case, the only way of transfering the file descriptor resource from the consumer to the producer process is to use a Unix socket using the sendmsg(...) and recvmsg(...) functions with a SCM_RIGHTS message type (see the IPC communication integration).

3. During all this time, on the producer side, you are waiting for the EGLStream file descriptor. #

When you finally receive the file descriptor, you can create the producer endpoint of the EGLStream and initialize the producer mechanism with the EGL_KHR_stream_producer_eglsurface extension.

EGLStream eglStream = eglCreateStreamFromFileDescriptorKHR(eglDisplay, consumerFD);

const EGLint surfaceAttribs[] = {EGL_WIDTH, width, EGL_HEIGHT, height, EGL_NONE};
EGLSurface eglSurface = eglCreateStreamProducerSurfaceKHR(eglDisplay, config, eglStream, surfaceAttribs);

Just like with a classical PBuffer surface, you specify the dimensions of the EGLStream producer surface in the surface attributes. The provided EGL config is the same configuration as the one used to create the EGLContext. When you choose the EGLConfig with eglChooseConfig(...), you must specify the value EGL_STREAM_BIT_KHR for the EGL_SURFACE_TYPE parameter.

At the moment of selecting the EGLContext before drawing, you will bind this newly created eglSurface as target surface. Then eglSwapBuffers(...) will automatically takes care of finishing the rendering and sending the video buffer content to the consumer endpoint of the EGLStream.

sequenceDiagram
    participant A as Producer
    participant B as Consumer
    B ->> B: Create the EGLStream
    B ->> B: Get the EGLStream file descriptor
    B ->> B: Initialize the consumer
    B ->> A: Send the file descriptor
    A ->> A: Create the EGLStream
    A ->> A: Initialize the producer
    A --> B: Connected
    activate A
    activate B
    loop Rendering
        A ->> A: Draw frame
        A ->> B: Send frame
        B ->> B: Consume frame
        B -->> A: EGLStream ready for next frame
    end
    deactivate A
    deactivate B

How to consume the EGLStream frames #

As explained above, on the producer side there is nothing special to do. You can do the rendering just like with a conventional surface as the whole EGLStream mechanism is transparently handled during the eglSwapBuffers(...) call.

On the consumer side, you need to manually acquire and release the frames. The principle is the same with any of the consumer extensions:

but the actual functions calls are slightly different.

With the EGL_KHR_stream_consumer_gltexture extension #

In this case, the rendering loop will look like this:

while (drawing)
{
    EGLint status = 0;
    if (!eglQueryStreamKHR(eglDisplay, eglStream, EGL_STREAM_STATE_KHR, &status))
        break;

    switch (status)
    {
    case EGL_STREAM_STATE_CREATED_KHR:
    case EGL_STREAM_STATE_CONNECTING_KHR:
        continue;

    case EGL_STREAM_STATE_DISCONNECTED_KHR:
        drawing = false;
        break;

    case EGL_STREAM_STATE_EMPTY_KHR:
    case EGL_STREAM_STATE_OLD_FRAME_AVAILABLE_KHR:
    case EGL_STREAM_STATE_NEW_FRAME_AVAILABLE_KHR:
        break;
    }
    if (!drawing)
        break;

    if (!eglStreamConsumerAcquireKHR(eglDisplay, eglStream))
        continue;

    // Drawing code...
    // The currently bound GL_TEXTURE_EXTERNAL_OES holds the content
    // of the new EGLStream frame

    eglSwapBuffers(eglDisplay, eglSurface);
    eglStreamConsumerReleaseKHR(eglDisplay, eglStream);
}

The eglStreamConsumerAcquireKHR(...) call will block during maximum EGL_CONSUMER_ACQUIRE_TIMEOUT_USEC_KHR microseconds or until receiving a frame (or until the EGLStream is disconnected). The value of EGL_CONSUMER_ACQUIRE_TIMEOUT_USEC_KHR is set once for all when creating the consumer EGLStream endpoint.

With the EGL_NV_stream_consumer_eglimage extension #

In this case, the rendering loop will look like this:

static constexpr EGLTime ACQUIRE_MAX_TIMEOUT_USEC = 1000 * 1000;
EGLImage eglImage = EGL_NO_IMAGE;

while (drawing)
{
    EGLenum event = 0;
    EGLAttrib data = 0;
    // WARNING: specifications state that the timeout is in nanoseconds
    // (see: https://registry.khronos.org/EGL/extensions/NV/EGL_NV_stream_consumer_eglimage.txt)
    // but in reality it is in microseconds (at least with the version 535.113.01 of the NVidia drivers)
    if (!eglQueryStreamConsumerEventNV(eglDisplay, eglStream, ACQUIRE_MAX_TIMEOUT_USEC, &event, &data))
    {
        drawing = false;
        continue;
    }

    switch (event)
    {
    case EGL_STREAM_IMAGE_ADD_NV:
        if (eglImage)
            eglDestroyImage(eglDisplay, eglImage);

        eglImage = eglCreateImage(eglDisplay, EGL_NO_CONTEXT, EGL_STREAM_CONSUMER_IMAGE_NV,
                                  static_cast<EGLClientBuffer>(eglStream), nullptr);
        continue;

    case EGL_STREAM_IMAGE_REMOVE_NV:
        if (data)
        {
            EGLImage image = reinterpret_cast<EGLImage>(data);
            eglDestroyImage(eglDisplay, image);
            if (image == eglImage)
                eglImage = EGL_NO_IMAGE;
        }
        continue;

    case EGL_STREAM_IMAGE_AVAILABLE_NV:
        if (eglStreamAcquireImageNV(eglDisplay, eglStream, &eglImage, EGL_NO_SYNC))
            break;
        else
            continue;

    default:
        continue;
    }

    // Present the EGLImage...

    eglStreamReleaseImageNV(eglDisplay, eglStream, eglImage, EGL_NO_SYNC);
}

if (eglImage)
    eglDestroyImage(eglDisplay, eglImage);

Integration of EGLStreams in WPEBackend-offscreen-nvidia #

The WPEBackend-offscreen-nvidia project is based on the WPEBackend-direct architecture presented in the previous post, with the following differences:

On the application process side #

The ViewBackend initializes an EGL surfaceless display using the EGL_MESA_platform_surfaceless extension in order to be able to create a consumer EGLStream.

Then the WPE Backend consumes the EGLStream frames from a separated thread while the end-user application callback is called from the application main thread. This separation is needed to avoid border effects between the application rendering EGL display and the consumer surfaceless display.

After acquiring a new frame, the consumer thread waits until the end-user application main thread calls the wpe_offscreen_nvidia_view_backend_dispatch_frame_complete(...) function. This advertises the WPE Backend that the frame can be released. Then the consumer thread releases the frame and tries to acquire the next one. Unlike in WPEBackend-direct, there is no need for further IPC synchronization as the EGLStream producer (on the WPEWebProcess side) is blocked until the frame is released on the consumer side. The rendering synchronization is implicit through the EGLStream mechanisms.

sequenceDiagram
    participant A as WPE Backend - Consumer thread
    participant B as WPE Backend - Application main thread
    participant C as End-user application
    A ->> A: Acquire frame from producer
    activate A
    A ->> A: wait
    B ->> C: Call presentation callback with actual EGLImage
    C ->> C: Use the EGLImage
    C ->> B: wpe_offscreen_nvidia_view_backend_dispatch_frame_complete(...)
    B ->> A: Unlock consumer thread
    deactivate A
    A ->> A: Release frame

The IPC communication #

As far as IPC communication is concerned, WPEBackend-offscreen-nvidia adds the possibility to transfer file descriptors between processes using a Unix socket channel and the SCM_RIGHTS message type.

bool Channel::writeFileDescriptor(int fd) noexcept
{
    assert(m_localFd != -1);
    if (fd == -1)
        return false;

    union {
        cmsghdr header;
        char buffer[CMSG_SPACE(sizeof(int))];
    } control = {};

    msghdr msg = {};
    msg.msg_control = control.buffer;
    msg.msg_controllen = sizeof(control.buffer);

    cmsghdr* header = CMSG_FIRSTHDR(&msg);
    header->cmsg_len = CMSG_LEN(sizeof(int));
    header->cmsg_level = SOL_SOCKET;
    header->cmsg_type = SCM_RIGHTS;
    *reinterpret_cast<int*>(CMSG_DATA(header)) = fd;

    if (sendmsg(m_localFd, &msg, MSG_EOR | MSG_NOSIGNAL) == -1)
    {
        closeChannel();
        m_handler.handleError(*this, errno);
        return false;
    }

    return true;
}

int Channel::readFileDescriptor() noexcept
{
    assert(m_localFd != -1);

    union {
        cmsghdr header;
        char buffer[CMSG_SPACE(sizeof(int))];
    } control = {};

    msghdr msg = {};
    msg.msg_control = control.buffer;
    msg.msg_controllen = sizeof(control.buffer);

    if (recvmsg(m_localFd, &msg, MSG_WAITALL) == -1)
    {
        closeChannel();
        m_handler.handleError(*this, errno);
        return -1;
    }

    cmsghdr* header = CMSG_FIRSTHDR(&msg);
    if (header && (header->cmsg_len == CMSG_LEN(sizeof(int))) && (header->cmsg_level == SOL_SOCKET) &&
        (header->cmsg_type == SCM_RIGHTS))
    {
        int fd = *reinterpret_cast<int*>(CMSG_DATA(header));
        return fd;
    }

    return -1;
}

On the WPEWebProcess side #

The RendererBackendEGL provides a surfaceless platform with the EGL_PLATFORM_SURFACELESS_MESA value. In order to work correctly with WPE WebKit it requires a small patch in the WebKit source code (see here).

By default, the WPE WebKit code sets the EGL_WINDOW_BIT value for the EGL_SURFACE_TYPE parameter when using surfaceless EGL contexts in the GLContextEGL.cpp file. We need to set the EGL_STREAM_BIT_KHR value to select EGLConfigs compatible with an EGLStream producer surface.

Then in RendererBackendEGLTarget, the only differences are:

The rest of the code is the same as in WPEBackend-direct.

Using WPEBackend-offscreen-nvidia in a docker container without windowing system #

As the producer (WPEWebProcess side) and the consumer (application process side) are both using EGL surfaceless contexts, WPE WebKit running with the WPEBackend-offscreen-nvidia backend can work on a Linux system with no windowing system at all.

This is very useful when doing offscreen rendering on server-side using docker containers for example. As soon as you have a NVidia GPU correctly configured on the host, you can create lightweight containers with a minimal set of packages and do web rendering into EGLImages with full hardware acceleration and zero-copy of the video buffers from the GPU to the CPU (you still may have copies - or not - inside the GPU memory depending on the drivers implementation and the usage of your final EGLImages).

To use the host NVidia GPU from within a docker container, you first need to install the NVIDIA Container Toolkit.

Then the WPEBackend-offscreen-nvidia project provides a Dockerfile example to build the whole backend with the WPE WebKit patched library, and to run it with the provided webview-sample. It also provides a docker-compose.yml example to run the previous webview-sample with the NVidia GPU.

These examples still install some X11 dependencies as the webview-sample creates an X11 window to present the content of the produced EGLImages. If you are using WPE WebKit and the WPEBackend-offscreen-nvidia backend in an environment with no windowing system, you will only need to install the following dependencies:

ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=graphics

RUN apt-get update -qq && \
    apt-get upgrade -qq && \
    apt-get install -qq --no-install-recommends \
        gstreamer1.0-plugins-good gstreamer1.0-plugins-bad \
        libgstreamer-gl1.0-0 libepoxy0 libharfbuzz-icu0 libjpeg8 \
        libxslt1.1 liblcms2-2 libopenjp2-7 libwebpdemux2 libwoff1 libcairo2 \
        libglib2.0-0 libsoup2.4-1 libegl1-mesa

WORKDIR /root
RUN mkdir -p /usr/share/glvnd/egl_vendor.d && \
    cat <<EOF > /usr/share/glvnd/egl_vendor.d/10_nvidia.json
{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libEGL_nvidia.so.0"
    }
}
EOF

The gstreamer plugins are only needed if you want to use multimedia features, else you can also remove them.