WebXR landing in WebKit

Since I joined Igalia, I have been working on finishing up the core WebXR implementation in WebKit, focused on the DOM, render loop, graphics and input sources. We are targeting the OpenXR API and we have reached the point where we are able to run some demos, so it is a good time to share a summary of the work that we have done so far.

You can check all the patches and the code at the WebKit WebXR module.

What was done

WebXR render loop and frame data

We initially focused on creating a solid foundation for the WebXR render loop.

The requestFrame(RequestFrameCallback&&) interface offers a flexible API that adapts to a variety of platforms, such as OpenXR and Cocoa, and to different scenarios like the one where the GPU process is present or absent.

On the OpenXR platform, the render loop at native headset frame rate is implemented using a dedicated VR compositor queue. It syncs frame data with the headset on every loop iteration, it submits frame’s layers to the display and it waits for VSync. It is optimized to achieve very low latency and a stable frame rate. We plan to dynamically support frame ahead rendering in situations where it helps to improve the experience.

namespace PlatformXR {
struct FrameData {
    bool isTrackingValid { false };
    bool isPositionValid { false };
    bool isPositionEmulated { false };
    bool shouldRender { false };
    long predictedDisplayTime { 0 };
    Pose origin;
    Optional<Pose> floorTransform;
    StageParameters stageParameters;
    Vector<View> views;
    HashMap<LayerHandle, LayerData> layers;
    Vector<InputSource> inputSources;

    template<class Encoder> void encode(Encoder&) const;
    template<class Decoder> static Optional<FrameData> decode(Decoder&);
};
}

PlatformXR Frame Data

WebXR coordinate system: XRRigidTransform, XRReferenceSpace, XRSpace and XRPose

The next step after working on the render loop was the implementation of the WebXR coordinate system.

We started by implementing XRRigidTransform, the base matrix used in WebXR transforms. A rigid transform exclusively represents transforms in terms of position and orientation and by definition it cannot contain scale or skew.

One of the core features of WebXR is its ability to track spatial relationships. The wide range of XR hardware comes with different coordinate systems that makes it impractical to expect developers to work directly with raw tracking data. WebXR spec solved this by design by building spatial relationships on top of the XRSpace interface.

Each XRSpace represents something being tracked by the XR system, such as an XRReferenceSpace, and each one of them has a native origin that represents its pose (position and orientation) in the tracking system. The viewer and controller poses are good examples of VR tracking features. The XRSpace concept works as well for AR related tracking features such as anchors and hit testing.

All the coordinate system and spatial tracking have been implemented in the getViewerPose and getPose patch.

WebXR WPT tests have been extremelly helpful to ensure that all the math is correct. In order to show a visual progress we hacked one of the WebXR inline session samples and connected it to a real pose from a device:

WebXR inline session test

WebXR rendering and layers

Once we had a good foundation for the WebXR render loop and the coordinate system, we focused our efforts on getting frames visible on HMD displays.

The implementation adds a base XR Layers API. This is initially used for the opaque framebuffer pattern used in XRWebGLLayer. We have designed the API from the ground up to support the whole WebXR Layers spec in the future.

Layers are connected to WebGL by using shared textures. In the OpenXR platform we built a render path with no extra blits by rendering straight to the swapchains exposed from the VR compositor. The API is flexible enough to support different scenarios such as sending different textures for every frame to apply buffering techniques and it can scale up to support texture-arrays and multiview in the future. The texture for the current frame is attached in WebXRLayer::startFrame() and it is prepared to be sent back to the VR compositor in WebXRLayer::endFrame().

One of the challenges we found is that the default framebuffer logic we wanted to reuse is coupled in the WebKit GLContextOpenGLBase and GLContextOpenGLES classes and it’s not straightforward to abstract it out. We considered creating a concept similar to the Gecko MozFramebuffer or Chromium DrawingBuffer classes. That is a bigger change that is out of the scope of the WebXR patches for now. We decided to extract out all the required framebuffer logic to an ad-hoc WebXROpaqueFramebuffer class. Anyway, the path forward for WebXR rendering is the Layers API, which is based on opaque textures instead of opaque framebuffers, making all the native side framebuffer creation complexity unneeded in the future.

WebXR rendering sample

WebXR Input Sources

A fulfilling WebXR experience is not just about rendering a virtual world. Picture yourself inside a medieval world unable to use a sword, or in a tennis court without the ability to use a racket. In order to make an engaging WebXR experience, the user needs to be able to interact with it.

The WebXR Input Sources API is a great evolution from WebVR gamepads, where developers relied on some UA sniffing to detect the type of controllers. Thanks to the WebXR Input Profiles, the input detection has been standardised and developers can now seamlessly handle and show the right controller models for each platform or fallback to a good enough default.

We added input source support in two steps, first implementing the required DOM related code and after that the OpenXR input code. Once that work was ready, we were in position of trying everything together and seeing some ThreeJS and Babylon based experiences working correctly, including Hello WebXR, the WebXR site of the year.

Hello WebXR running in WebKit (using HTC Vive and Monado OpenXR runtime in Linux)

Conclusions

It is been a lot of fun watching the WebKit WebXR implementation take shape from the early stages of passing a subset of WPT tests until being capable of running complete WebXR experiences. It is also nice to see that all major browser engines have a core WebXR implementation now. This diversity is good for the Open Web.

We have tested the WebKit WebXR implementation using WPE combined with the Monado OpenXR runtime and desktop headsets. For us the next steps will be to focus on deploying all the work into standalone devices while we continue improving the WebXR API performance and stability. Stay tuned!

Some final words of clarification for those of you wondering how this work is related to XR or ARKit support in Safari. Right now OpenXR is not compatible with iOS or macOS. Apple publicly released intent for WebXR support and there is a Cocoa XR port entry point in WebKit but neither the code or plans are public. In any case, a big chunk of our work is multiplatform, so we are happy that our contributions are going to help in that direction.

In case that you are interested in any type of collaboration, drop us a line.

Acknowledgements

Kudos to Sergio Villar for all his previous work in WebKit WebXR and WebVR implementations and for helping me to get landed into the project.
Many thanks to the folks at Apple for the insightful and detailed code reviews.
Huge thanks to the Monado and libsurvive teams for their great open source runtimes and for helping OpenXR to be usable in more platforms.
Kudos to the WebXR Working group for continuing to evolve the spec.

What was done#

WebXR render loop and frame data#

WebXR coordinate system: XRRigidTransform, XRReferenceSpace, XRSpace and XRPose#

WebXR rendering and layers#

WebXR Input Sources#

Conclusions#

Acknowledgements#