The process of creating a new WPE backend

27 October 2023
wpe
graphics

The basics to understand and build a WPE WebKit backend from scratch.

What is a WPE Backend? #

WPE is the official port of WebKit for embedded platforms. Depending on the platform hardware it may need to use different technics and technologies to ensure correct graphical rendering.

In order to be independent from any user-interface toolkit and/or windowing system, WPE WebKit delegates the rendering to a third-party API defined in libwpe. The implementation of this API is called a WPE Backend. You can find more explainations on the WPE Architecture page.

WPE WebKit is a multiprocesses application, the end-user starts and controls the web widgets in the application process while the web engine itself is running in different subprocesses like the WPENetworkProcess in charge of managing the underlying network connections and the WPEWebProcess in charge of the HTML and Javascript parsing, execution and rendering.

The WPEWebProcess is, in particular, in charge of drawing the final composition of the web page through the ThreadedCompositor component.

The WPE Backend is at a crossroads between the WPEWebProcess and the user application process.

graph LR;
    subgraph WPEWebProcess
        A(WPE ThreadedCompositor)
    end
    subgraph Application Process
        C(User Application)
    end
    A -->|draw| B([WPE Backend]) --> C

The WPE Backend is a shared library that is loaded at runtime by the WPEWebProcess and by the user application process. It is used to render the visual aspect of a web page and transfer the resulting video buffer from the WPE WebKit engine process to the application process.

N.B. with the growing integration of DMA Buffers on all modern Linux platforms, the WPE WebKit architecture is evolving and, in the future, the need for a WPE Backend should disappear.

Future designs of the libwpe API should allow to directly receive the video buffers on the application process side without needing to implement a different WPE Backend for each hardware platform. From the application developer point of view, it will simplify the usage of WPE WebKit by hiding all multiprocesses considerations.

The WPE Backend interfaces #

The WPE Backend shared library must export at least one symbol called _wpe_loader_interface of type wpe_loader_interface as defined here in the libwpe API.

This instance holds a simpler loader function that must return concrete implementations for the following libwpe interfaces:

When the WPE Backend is loaded by the WPEWebProcess (or the application process), the process will look for the _wpe_loader_interface symbol and then call _wpe_loader_interface.load_object("...") with predefined names whenever it needs access to a specific interface:

“_wpe_renderer_host_interface” for the wpe_renderer_host_interface
“_wpe_renderer_backend_egl_interface” for the wpe_renderer_backend_egl_interface
“_wpe_renderer_backend_egl_target_interface” for the wpe_renderer_backend_egl_target_interface
“_wpe_renderer_backend_egl_offscreen_target_interface” for the wpe_renderer_backend_egl_offscreen_target_interface

The WPE Backend also needs to implement a fifth interface of type wpe_view_backend_interface that is used on the application process side when creating the specific backend instance.

All those interfaces follow the same structure:

a create(...) function that acts like an object constructor
a destroy(...) function that acts like an object destructor
some functions that act like the methods of an object (each function receives the previously created object instance as first parameter)

Taking the names used in the WPEBackend-direct project, on the application process side WPE will create:

a rendererHost instance using the wpe_renderer_host_interface::create(...) function
multiple rendererHostClient instances using the wpe_renderer_host_interface::create_client(...) function (those instances are mainly used for IPC communication, one instance is created for each WPEWebProcess launched by WPE WebKit)
multiple viewBackend instances created by the wpe_view_backend_interface::create(...) function (one instance is created for each rendering target on the WPEWebProcess side)

On the WPEWebProcess side (there can be more than one WPEWebProcess depending of the URIs loaded by the WebKitWebView instances on the application process side), the web process will create:

a rendererBackendEGL instance (one per WPEWebProcess) using the wpe_renderer_backend_egl_interface::create(...) function
multiple rendererBackendEGLTarget instances using the wpe_renderer_backend_egl_target_interface::create(...) function (one instance is created for each new rendering target requested by the application)

N:B: the rendererBackendEGLTarget instances may be created by the wpe_renderer_backend_egl_target_interface or the wpe_renderer_backend_egl_offscreen_target_interface depending on the interfaces implemented in the WPE Backend.

Here we are only focussing on the wpe_renderer_backend_egl_target_interface that is relying on a classical EGL display (defined in the rendererBackendEGL instance). The wpe_renderer_backend_egl_offscreen_target_interface may be used in very specific use-cases that are out of the scope of this post. You can check its usage in the WPE WebKit source code for more information.

Former instances are communicating between each others using classical IPC technics (Unix sockets). The IPC layer must be implemented in the WPE Backend, the libwpe interfaces only share the endpoints file descriptors between the different processes when creating the different instances.

From a topological point of view, all those instances are organized as follows:

graph LR;
    subgraph S1 [WPEWebProcess]
        A1(rendererBackendEGL)
        subgraph S1TARGETS [ ]
            B1(rendererBackendEGLTarget)
            C1(rendererBackendEGLTarget)
            D1(rendererBackendEGLTarget)
        end
    end
    style S1 fill:#fb3,stroke:#333
    style S1TARGETS fill:#fb3,stroke:#333
    style A1 fill:#ff5
    style B1 fill:#9f5
    style C1 fill:#9f5
    style D1 fill:#9f5

    subgraph S2 [WPEWebProcess]
        A2(rendererBackendEGL)
        subgraph S2TARGETS [ ]
            B2(rendererBackendEGLTarget)
            C2(rendererBackendEGLTarget)
        end
    end
    style S2 fill:#fb3,stroke:#333
    style S2TARGETS fill:#fb3,stroke:#333
    style A2 fill:#ff5
    style B2 fill:#9f5
    style C2 fill:#9f5

    subgraph S3 [Application Process]
        E(rendererHost)
        subgraph S4 [Clients]
            F1(rendererHostClient)
            F2(rendererHostClient)
        end
        F1 ~~~ F2
        E -.- S4

        subgraph S5 [ ]
            G1(viewBackend)
            G2(viewBackend)
            G3(viewBackend)
            G4(viewBackend)
            G5(viewBackend)
        end
        S4 ~~~ S5
        G1 ~~~ G2
        G2 ~~~ G3
        G3 ~~~ G4
        G4 ~~~ G5
    end
    style S3 fill:#3cf,stroke:#333
    style S4 fill:#9ef,stroke:#333
    style S5 fill:#3cf,stroke:#333
    style E fill:#ff5
    style F1 fill:#ff5
    style F2 fill:#ff5
    style G1 fill:#9f5
    style G2 fill:#9f5
    style G3 fill:#9f5
    style G4 fill:#9f5
    style G5 fill:#9f5

    A1 <--IPC--> F1
    A2 <--IPC--> F2

    B1 <--IPC--> G1
    C1 <--IPC--> G2
    D1 <--IPC--> G3
    B2 <--IPC--> G4
    C2 <--IPC--> G5

From a usage point of view:

the rendererHost and rendererHostClient instances are only used to manage IPC endpoints on the application process side that are connected to each running WPEWebProcess. They are not used by the graphical rendering system.
the rendererBackendEGL instance (one per WPEWebProcess) is only used to connect to the native display for a specific platform. For example, on a desktop Linux, the platform may be X11 and the native display can be obtained by calling XOpenDisplay(...); or the platform may be Wayland and the native display can be obtained by calling wl_display_connect(...); etc…
the rendererBackendEGLTarget (on the WPEWebProcess side) and viewBackend (on the application process side) instances are the ones truly managing the web page graphical rendering.

The graphical rendering mechanism #

As seen above, the interfaces in charge of the rendering are: wpe_renderer_backend_egl_target_interface and wpe_view_backend_interface. During the instances creation, WPE WebKit exchanges the file descriptors used to establish a direct IPC connection between a rendererBackendEGL on the WPEWebProcess side and a viewBackend on the application process side.

During the EGL initialization phase, run when a new WPEWebProcess is launched, WPE WebKit will use the native display and platform provided by wpe_renderer_backend_egl_interface::get_native_display(...) and get_platform(...) functions to create a suitable GLES context.

When the WPE WebKit ThreadedCompositor is ready to render a new frame (on the WPEWebProcess side), it will call the wpe_renderer_backend_egl_target_interface::frame_will_render(...) function to advertise the WPE Backend that rendering is going to start. At this moment, the previously created GLES context is current to the calling thread and then all GLES drawing commands will be issued right after calling the former function.

Once the ThreadedCompositor has finished drawing, it will swap the EGL buffers and call the wpe_renderer_backend_egl_target_interface::frame_rendered(...) function to advertise the WPE Backend that the frame is ready. Then the ThreadedCompositor will wait until the wpe_renderer_backend_egl_target_dispatch_frame_complete(...) function is called by the WPE Backend.

What happens during frame_will_render(...) and frame_rendered(...) calls is up to the WPE Backend. It can, for example, select a Framebuffer Object to draw in a texture and then pass the texture content to the application process, or use EGL extensions like EGLStream or DMA Buffers to transfer the frame to the application process without doing a copy through the computer main RAM.

In all cases, in frame_rendered(...) function the WPE Backend generally sends the new frame to the corresponding viewBackend instance on the application side. Then the application process uses or presents this frame and then sends back an IPC message to the rendererBackendEGLTarget instance to advertise the WPEWebProcess side that the frame is not used anymore and can be recycled. When the rendererBackendEGLTarget instance receives this IPC message, this is generally the moment it calls the wpe_renderer_backend_egl_target_dispatch_frame_complete(...) function to trigger a new frame production.

With this mechanism, the application has full control over the synchronization of the rendering on the WPEWebProcess side.

sequenceDiagram
    participant A as ThreadedCompositor
    participant B as WPE Backend
    participant C as Application
    loop Rendering
      A->>B: frame_will_render(...)
      activate A
      A->>A: GLES rendering
      A->>B: frame_rendered(...)
      deactivate A
      B->>C: frame transfer
      activate C
      C->>C: frame presentation
      C->>A: dispatch_frame_complete(...)
      deactivate C
    end

A simple example: WPEBackend-direct #

The WPEBackend-direct project aims to implement a very simple WPE Backend that is not transfering any frame to the application process.

The frame presentation is directly done by the WPEWebProcess using a native X11 or Wayland window. Objective is to provide an easy means of showing and debugging the output of the ThreadedCompositor without the inherent complexity of transfering frames between different processes.

The WPEBackend-direct implements the minimum set of libwpe interfaces needed by a functional backend.

The interfaces loader is implemented in wpebackend-direct.cpp. The wpe_renderer_backend_egl_offscreen_target_interface is disabled as the backend is only providing the default wpe_renderer_backend_egl_target_interface for the target rendererBackendEGLTarget instances.

extern "C"
{
    __attribute__((visibility("default"))) wpe_loader_interface _wpe_loader_interface = {
        +[](const char* name) -> void* {
            if (std::strcmp(name, "_wpe_renderer_host_interface") == 0)
                return RendererHost::getWPEInterface();

            if (std::strcmp(name, "_wpe_renderer_backend_egl_interface") == 0)
                return RendererBackendEGL::getWPEInterface();

            if (std::strcmp(name, "_wpe_renderer_backend_egl_target_interface") == 0)
                return RendererBackendEGLTarget::getWPEInterface();

            if (std::strcmp(name, "_wpe_renderer_backend_egl_offscreen_target_interface") == 0)
            {
                static wpe_renderer_backend_egl_offscreen_target_interface s_interface = {
                    +[]() -> void* { return nullptr; },
                    +[](void*) {},
                    +[](void*, void*) {},
                    +[](void*) -> EGLNativeWindowType { return nullptr; },
                    nullptr,
                    nullptr,
                    nullptr,
                    nullptr};
                return &s_interface;
            }

            return nullptr;
        },
        nullptr, nullptr, nullptr, nullptr};
}

On the application process side, the RendererHost and RendererHostClient classes do nothing particular, they are just straightforward implementations of the libwpe interfaces used to maintain the IPC channel connection. Because rendering and frame presentation are both handled on the WPEWebProcess side, the ViewBackend class is only managing the render loop synchronization. It receives a message from the IPC channel each time a frame has been rendered and then it calls a callback from the sample application.

When the sample application is ready, it calls a specific function from the WPEBackend-direct API that will trigger a call to ViewBackend::frameComplete() that, in turn, will send an IPC message to the RendererBackendEGLTarget instance on the WPEWebProcess side to call the wpe_renderer_backend_egl_target_dispatch_frame_complete(...) function.

WebKitWebViewBackend* createWebViewBackend()
{
    auto* directBackend = wpe_direct_view_backend_create(
        +[](wpe_direct_view_backend* backend, void* /*userData*/) {

            // This callback function is called by the WPE Backend each time a
            // frame has been rendered and presented on the WPEWebProcess side.

            // Calling the next function will trigger an IPC message sent to the
            // corresponding RendererBackendEGLTarget instance on the WPEWebProcess
            // side that will trigger the rendering of the next frame.
            wpe_direct_view_backend_dispatch_frame_complete(backend);
        },
        nullptr, 800, 600);

    return webkit_web_view_backend_new(wpe_direct_view_backend_get_wpe_backend(directBackend), nullptr, nullptr);
}

On the WPEWebProcess side, the RendererBackendEGL class is a wrapper around an X11 or Wayland native display, it also maintains an IPC connection with a corresponding RendererHostClient instance on the application process side.

The RendererBackendEGLTarget class is the one in charge of the presentation. It maintains an IPC connection with a corresponding ViewBackend instance on the application process side. During its initialization the RendererBackendEGLTarget instance uses the native display held by the RendererBackendEGL instance to create a native X11 or Wayland window. This native window is communicated to the WPE WebKit engine through the wpe_renderer_backend_egl_target_interface implementation.

The WPE WebKit engine will use the provided native display and window to create a compatible EGL display and surface (see the source code here).

When the ThreadedCompositor calls frame_will_render(...), the compatible GLES context is already selected with the EGL surface associated with the native window previously created by the RendererBackendEGLTarget instance, so there is nothing more to do. When the ThreadedCompositor calls m_context->swapBuffers(); the rendered frame is automatically presented in the native window.

Then, when the ThreadedCompositor calls frame_rendered(...), the only thing left to do by the RendererBackendEGLTarget instance is to send an IPC message to the corresponding ViewBackend instance, on the application process side, to advertise that a frame has been rendered and that the ThreadedCompositor is waiting before rendering the next frame.

When dealing with WPE Backends the WPEBackend-direct example is an interesting starting point as it mainly focusses on the backend implementation. The extra code, that is not directly needed by the libwpe interfaces, is minimal and clearly separated from the backend implementation code.

Next step would be to transfer the frame content from the WPEWebProcess to the application process and do the presentation on the application process side. One obvious way of doing it would be to render to a texture on the WPEWebProcess side, transfer the texture content through a shared memory map and then create a new texture on the application process side with the shared graphical data. That would work but that would not be really efficient as, for each frame, data would need to be copied from the GPU to the CPU in the WPEWebProcess and then back from the CPU to the GPU in the application process.

What is really interesting would be to keep the frame data on the GPU side and just transfer an handle to this memory between the WPEWebProcess and the application process. This will be the topic of a future post, so stay tuned!

Previous: WebRTC, GStreamer and HTML5 - Part 2
Next: Use EGLStreams in a WPE WebKit backend

To stream or not to stream

Multimedia blog and other fancy stuff

The process of creating a new WPE backend

What is a WPE Backend? #

The WPE Backend interfaces #

The graphical rendering mechanism #

A simple example: WPEBackend-direct #