Implementing WebXR in WebKit for WPE

Since 2022, my main focus has been working on the Wolvic browser, still the only open source WebXR-capable browser for Android/AOSP devices (Meta, Pico, Huawei, Lenovo, Lynx, HTC…) out there. That’s an effort that continues to this day (although to a much lesser extent nowadays). In early 2025, as a consequence of all that work in XR on the web, an opportunity emerged to implement WebXR support in WebKit for the WPE port, and we decided to take it.

This post explains how Igalia implemented WebXR in WebKit for WPE (and as a side effect for GTK port), the multiprocess architecture for XR we adopted, the challenges we faced and the technical decisions made to solve them, and our future plans.

A Bit of History

WebXR and OpenXR

WebXR is the W3C standard that allows web applications to present immersive XR content using WebGL, WebGL2, or WebGPU (by using the WebXR WebGPU bindings). On the backend, we rely on OpenXR, a Khronos standard API that abstracts the underlying XR hardware. By using OpenXR, the browser handles things like device tracking, input methods and frame timing regardless of the device.

It’s important not to confuse the two, WebXR is the JavaScript API that web authors use to develop their XR applications. OpenXR on the other hand, is the platform API that the WebKit WPE port use to talk to the actual hardware devices. Each port makes its own choices, for example visionOS uses Apple’s ARKit.

The Initial 2020 Implementation

Igalia initially started implementing the WebXR Core and Test APIs in WebKit around February 2020. Back then, it looked like XR was about to gain relevance, so we decided to invest our own resources and contribute our web engines expertise to the field.

At the time, we ran the OpenXR code directly inside the WebProcess. While this was easier to implement and avoided inter-process communication (IPC) overhead, it broke WebKit’s security sandboxing. Granting the WebProcess direct access to the filesystem and external XR devices is a security risk, and it also introduced the possibility of blocking the browser’s main thread. (Technically we didn’t break the sandboxing, because back then the Linux ports had no WebProcess sandboxing, but in any case it was something that must be avoided. Sometimes experimentation requires shortcuts at the proof of concept stage)

Today all major vendors include an OpenXR runtime in their devices, but unfortunately OpenXR was not as widely adopted back then, and there was no Android support in WPE. That combined with a higher workload caused by incoming client projects forced us to pause the investment.

Multiprocess Architecture

The architecture changed significantly when Apple took over the WebKit WebXR implementation, redesigning it around the WebKit’s multiprocess model as part of their work to ship WebXR on VisionOS. The main idea was to do all the communication with the XR hardware in the (privileged) UIProcess. Then all data coming from the device, mainly poses and textures, are then shared with the WebProcess, which is where JavaScript code is executed and also where WebGL/WebGL2 rendering happens.

Recently, sponsored by a nice client who is developing an XR device with a WPE-powered browser, Igalia resumed the work on WebXR for WPE. We migrated our implementation to Apple’s multiprocess architecture. OpenXR calls now happen inside the UIProcess. Because the OpenXR API is largely thread-safe, these calls run on a dedicated thread to avoid blocking the UI.

A major benefit of aligning with that new architecture is code sharing and maintainability. For instance, the WebXROpaqueFramebuffer class, which holds the opaque framebuffer implementation, is largely identical across the Cocoa, WPE, and GTK ports. This shared key piece of the WebXR implementation makes long-term maintenance much more sustainable.

The Challenges

Our initial target platform for this resumed work was Android (or more generally AOSP). Coincidentally it tied into broader WPE Android efforts by Igalia to make WPE work on AOSP devices. We subsequently extended the implementation to Linux around the summer of 2025. Supporting both operating systems dictates how we handle the sharing of graphical resources across process boundaries.

OpenXR expects the final output to reside in textures generated by its own swapchain (or swapchains, if depth textures are used along with color textures). There is no way to provide resources to the OpenXR compositor that were not previously allocated by OpenXR itself. This means that the WebProcess should not render WebGL content into its own resources. Instead we need to take the textures allocated by OpenXR in the UIProcess, and share them with the WebProcess.

On Linux, we handle this using DMA-Buf. The UIProcess exports the OpenXR textures (plural indeed, as the swapchain typically generates two or three textures for double/triple buffering) as file descriptors and passes them over IPC to the WebProcess, which imports them as EGL images. We only do this export once, caching the imported images in the WebProcess to avoid per-frame overhead.

Android requires a different approach though, since DMA-Buf is not available. We instead use the native Android solution for texture sharing, the AndroidHardwareBuffer (AHB). AHB allocates its own memory resources, meaning we cannot map them directly to the OpenXR images. This constraint forces us to perform an extra blit (copy) from the AHB to the OpenXR texture before submitting it to the XR compositor.

Not all Linux architectures and drivers support the Mesa DMA-Buf export extensions required for our primary texture sharing path. To ensure broader compatibility, we implemented a fallback using GBM. Much like the AndroidHardwareBuffer approach, GBM allocates its own separate resources. We again cannot map these resources directly to the OpenXR images, so using the GBM fallback forces us to perform an extra blit from the GBM buffer to the OpenXR texture as well. We are looking into using Vulkan as a transport mechanism in the future, which would hopefully bypass these constraints.

View Layouts: WebGL vs WebGPU rendering

The way views are submitted to the compositor also differs between operating systems, or rather XR platform libraries, and interestingly has direct performance implications.

VisionOS uses one platform texture per eye. WebXR running on WebGL, however, provides a single wide framebuffer for both eyes. Because of this mismatch, Apple’s VisionOS implementation is forced to do an extra blit to split the WebGL framebuffer into two separate textures. This overhead is one of the main reasons Apple decided to implement the WebXR WebGPU bindings —which natively supports one texture per eye— to achieve good performance on their platform.

For WPE we use a shared layout. A single texture holds the views for both eyes, which maps naturally to WebGL’s behavior and avoids the extra splitting blit entirely. Note that it is entirely possible to have one OpenXR swapchain per eye (the OpenXR API is quite flexible in that regard), and thus one texture per eye. However that would make the implementation more complex, and again, require an extra copy.

Synchronization

Because the UIProcess and WebProcess operate using different OpenGL contexts, we have to synchronize them. Without synchronization, the UIProcess might submit a texture to OpenXR while the WebProcess is still rendering to it. Check the amount of stuttering that is observed when rendering without any synchronization:

WebXR rendering without synchronization

We solved this using EGL fences:

After the WebProcess finishes issuing its render commands for a frame, we insert a fence into the WebProcess GPU command buffer.
The fence is exported as a file descriptor and passed to the UIProcess.
The UIProcess imports the fence and issues a GPU-level wait (serverWait()). The CPU is not blocked, but if the fence has not yet been signaled,the UIProcess GL context halts until the fence is signaled.

Once the WebProcess GPU queue arrives at the fence (and signals it), the UIProcess GL context wakes up (if it was halted), resumes execution and submits the frame to the OpenXR compositor. The difference is quite noticeable, with the flickering completely gone:

WebXR rendering with synchronization

Interacting with the Scene: Hand Input

WebXR is not only about rendering a 3D scene (and even then, that’s actually WebGL’s job), it notably provides users with ways to interact with it. Most XR gear, that is VR headsets, are accompanied by a couple of high-accuracy controllers full of buttons, joysticks, touchpads and/or trackpads. However our main target device was however closer to a pair of smartglasses than a full fledged headset. Since there were no controllers, interaction had to rely mainly on hand input.

OpenXR’s interaction system decouples logical application actions from physical hardware elements. Instead of directly querying the state of specific hardware components like an “A button” or a “trigger,” applications define abstract, semantic actions (such as teleport or grab) and provide the runtime with “suggested bindings” that map these actions to specific paths on known device interaction profiles (which in turn map those to actual hardware elements of a specific device). This abstraction allows developers to write their input code once and have it work across a wide variety of devices.

This design paradigm directly benefits the hand interaction extension (XR_EXT_hand_interaction), as hand tracking is seamlessly integrated by exposing it as just another interaction profile (/interaction_profiles/ext/hand_interaction_ext). It uses standard component paths for gestures like pinching (.../input/pinch_ext/pose), poking (.../input/poke_ext/pose), gripping, and aiming. As a result, from the application’s perspective, hand input is no different from physical controller input.

Whit this in mind, we implemented controller input support which, while basic, already supported the usage of both physical controllers and users’ hands to point to elements, select and/or grab them, and other actions.

WebXR input profile for hands

By the end of August 2025, we also landed support for the WebXR Hand Input Module, which maps XR devices’ underlying hand tracking data (in our case obtained from OpenXR) into the standard WebXR API. This in turn, exposes the poses (25 of them!) of the hand joints to JavaScript.

WebXR Hand Tracking

WebXR AR (Augmented Reality)

Supporting the WebXR AR module primarily revolved around implementing environment blend modes, which dictate how rendered virtual content is mixed with the real world. For example, we can use an opaque background for fully immersive VR, or we can use additive and alpha blending for see-through AR experiences. Interestingly (and intentionally), the blend modes defined in the WebXR AR module perfectly match the XrEnvironmentBlendMode values already defined by the OpenXR specification. Because of this 1:1 mapping between the two APIs, plumbing the blend mode support through our rendering pipeline was a quite straightforward implementation.

Notably, WPE is currently the only WebKit port supporting the WebXR AR module. In fact, Chromium is the only othe major web engine also supporting AR, since Gecko’s once-leading WebXR implementation is no longer actively maintained.

Next Steps

The current implementation efficiently supports WebXR on both Android and Linux. Looking at the roadmap for 2026, we’re working on WebXR Layers, WebXR Hit Testing and other super interesting stuff which will be the main topics of follow up posts. We also plan to investigate using Vulkan as an image sharing mechanism for synchronization, in order to have a widely supported cross-platform solution. Stay tuned!

A Bit of History#

WebXR and OpenXR#

The Initial 2020 Implementation#

Multiprocess Architecture#

The Challenges#

Texture Sharing and Platform Differences#

View Layouts: WebGL vs WebGPU rendering#

Synchronization#

Interacting with the Scene: Hand Input#

WebXR AR (Augmented Reality)#

Next Steps#