v3dv status update 2020-07-31

Iago talked recently about the work done testing and supporting well known applications, like the Vulkan ports of the Quake1, Quake 2 and Quake3. Let’s go back here to the lastest news on feature and bugfixing work.

Pipeline cache

Pipeline cache objects allow the result of pipeline construction to be reused. Usually (and specifically on our implementation) that means caching compiled shaders. Reuse can be achieved between pipelines creation during the same application run by passing the same pipeline cache object when creating multiple pipelines. Reuse across runs of an application is achieved by retrieving pipeline cache contents in one run of an application, saving the contents, and using them to preinitialize a pipeline cache on a subsequent run.

Note that it may happens that a pipeline cache would not improve the performance of an application once it starts to render. This is because application developers are encouraged to create all the pipelines in advance, to avoid any hiccup during rendering. On that situation pipeline cache would help to reduce load times. In any case, that is not always avoidable. In that case the pipeline cache would allow to reduce the hiccup, as a cache hit is far faster than a shader recompilation.

One specific detail about our implementation is that internally we keep a default pipeline cache, used if the user doesn’t provide a pipeline cache when creating a pipeline, and also to cache the custom shaders we use for internal operations. This allowed to simplify our code, discarding some custom caches that were already implemented.

Uniform/storage texel buffer

Uniform texel buffers define a tightly-packed 1-dimensional linear array of texels, with texels going through format conversion when read in a shader in the same way as they are for an image. They are mostly equivalent to OpenGL buffer texture, so you can see them as textures backed up by a VkBuffer (through a VkBufferView). With uniform texel buffers you can only do a formatted load.

Storage texel buffers are the equivalent concept, but applied to images instead of textures. Unlike uniform texel buffers, they can also be written to in the same way as for storage images.

Multisampling

Multisampling is a technique that allows to reduce aliasing artifacts on images, by by sampling pixel coverage at multiple subpixel locations and then averaging subpixel samples to produce a final color value for each pixel. We have already started working on this feature, and included some patches on the development branch, but it is still a work in progress. Having said so, it is enough to get Sascha Willems’s basic multisampling demo working:

Sascha Willems multisampling demo run on rpi4

Bugfixing

Again, in addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. So let’s take a look of some screenshots of Sascha Willem’s demos that are now working:

Sascha Willems deferred demo run on rpi4

Sascha Willems texture array demo run on rpi4

Sascha Willems Compute N-Body demo run on rpi4

Next

We plan to work on supporting the following features next:

  • Robust access
  • Multisample (finish it)

Previous updates

Just in case you missed any of the updates of the vulkan driver so far:

Vulkan raspberry pi first triangle
Vulkan update now with added source code
v3dv status update 2020-07-01
V3DV Vulkan driver update: VkQuake1-3 now working

v3dv status update 2020-07-01

About three weeks ago there was a big announcement about the update of the status of the Vulkan effort for the Raspberry Pi 4. Now the source code is public. Taking into account the interest that it got, and that now the driver is more usable, we will try to post status updates more regularly. Let’s talk about what’s happened since then.

Input Attachments

Input attachment is one of the main sub-features for Vulkan multipass, and we’ve gained support since the announcement. On Vulkan the support for multipass is more tightly supported by the API. Renderpasses can have multiple subpasses. These can have dependencies between each other, and each subpass define a subset of “attachments”. One attachment that is easy to understand is the color attachment: This is where a given subpass writes a given color. Another, input attachment, is an attachment that was updated in a previous subpass (for example, it was the color attachment on such previous subpass), and you get as a input on following subpasses. From the shader POV, you interact with it as a texture, with some restrictions. One important restriction is that you can only read the input attachment at the current pixel location. The main reason for this restriction is because on tile-based GPUs (like rpi4) all primitives are batched on tiles and fragment processing is rendered one tile at a time. In general, if you can live with those restrictions, Vulkan multipass and input attachment will provide better performance than traditional multipass solutions.

If you are interested in reading more details on this, you can check out ARM’s very nice presentation “Vulkan Multipass mobile deferred done right”, or Sascha Willems’ post “Vulkan input attachments and sub passes”. The latter also includes information about how to use them and code snippets of one of his demos. For reference, this is how the input attachment demos looks on the rpi4:

Sascha Willems inputattachment demos run on rpi4

Compute Shader

Given that this was one of the most requested features after the last update, we expect that this will be likely be the most popular news from this post: Compute shaders are now supported.

Compute shaders give applications the ability to perform non-graphics related tasks on the GPU, outside the normal rendering pipeline. For example they don’t have vertices as input, or fragments as output. They can still be used for massivelly parallel GPGPU algorithms. For example, this demo from Sascha Willems uses a compute shader to simulate cloth:

Sascha Willems Compute Cloth demos run on rpi4

Storage Image

Storage Image is another recent addition. It is a descriptor type that represents an image view, and supports unfiltered loads, stores, and atomics in a shader. It is really similar in most other ways to the well-known OpenGL concept of texture. They are really common with compute shaders. Compute shaders will not render (they can’t) directly any image, and it is likely that if they need an image, they will update it. In fact the two Sascha Willem demos using storage images also require compute shader support:

Sascha Willems compute shader demos run on rpi4

Sascha Willems compute raytracing demo run on rpi4

Performance

Right now our main focus for the driver is working on features, targetting a compliant Vulkan 1.0 driver. Having said so, now that we both support a good range of features and can run non-basic applications, we have devoted some time to analyze if there were clear points where we could improve the performance. Among these we implemented:
1. A buffer object (BO) cache: internally we are allocating and freeing really often buffer objects for basically the same tasks, so there are a constant need of buffers of the same size. Such allocation/free require a DRM call, so we implemented a BO cache (based on the existing for the OpenGL driver) so freed BOs would be added to a cache, and reused if a new BO is allocated with the same size.
2. New code paths for buffer to image copies.

Bugfixing!!

In addition to work on specific features, we also spent some time fixing specific driver bugs, using failing Vulkan CTS tests as reference. Thanks to that work, the Sascha Willems’ radial blur demo is now properly rendering, even though we didn’t focus specifically on working on that demo:

Sascha Willems radial blur demo run on rpi4

Next?

Now that the driver supports a good range of features and we are able to test more applications and run more Vulkan CTS Tests with all the needed features implemented, we plan to focus some efforts towards bugfixing for a while.

We also plan to start to work on implementing the support for Pipeline Cache, which allows the result of pipeline construction to be reused between pipelines and between runs of an application.

v3dv: quick guide to build and run some demos

Just today it has published a status update of the Vulkan effort for the Raspberry Pi 4, including that we are moving the development of the driver to an open repository. As it is really likely that some people would be interested on testing it, even if it is not complete at all, here you can find a quick guide to compile it, and get some demos running.

Dependencies

So let’s start installing some dependencies. My personal recipe, that I use every time I configure a new machine to work on mesa is the following one (sorry if some extra unneeded dependencies slipped):

sudo apt-get install libxcb-randr0-dev libxrandr-dev \
        libxcb-xinerama0-dev libxinerama-dev libxcursor-dev \
        libxcb-cursor-dev libxkbcommon-dev xutils-dev \
        xutils-dev libpthread-stubs0-dev libpciaccess-dev \
        libffi-dev x11proto-xext-dev libxcb1-dev libxcb-*dev \
        bison flex libssl-dev libgnutls28-dev x11proto-dri2-dev \
        x11proto-dri3-dev libx11-dev libxcb-glx0-dev \
        libx11-xcb-dev libxext-dev libxdamage-dev libxfixes-dev \
        libva-dev x11proto-randr-dev x11proto-present-dev \
        libclc-dev libelf-dev git build-essential mesa-utils \
        libvulkan-dev ninja-build libvulkan1 python-mako \
        libdrm-dev libxshmfence-dev libxxf86vm-dev \
        python3-mako

Most Raspian libraries are recent enough, but they have been updating some of then during the past months, so just in case, don’t forget to update:

$ sudo apt-get update
$ sudo apt-get upgrade

Additionally, you woud need to install meson. Mesa has just recently bumped up the version needed for meson, so Raspbian version is not enough. There is the option to build meson from the tarball (meson-0.52.0 here), but by far, the easier way to get a recent meson version is using pip3:

$ pip3 install meson

2020-07-04 update

It seems that some people had problems if they have installed meson with apt-get on their system, as when building it would try the older meson version first. For those people, they were able to fix that doing this:

$ sudo apt-get remove meson
$ pip3 install --user meson

Download and build v3dv

This is the simpler recipe to build v3dv:

$ git clone https://gitlab.freedesktop.org/apinheiro/mesa.git mesa
$ cd mesa
$ git checkout wip/igalia/v3dv
$ meson --prefix /home/pi/local-install --libdir lib -Dplatforms=x11,drm -Dvulkan-drivers=broadcom -Ddri-drivers= -Dgallium-drivers=v3d,kmsro,vc4 -Dbuildtype=debug _build
$ ninja -C _build
$ ninja -C _build install

This builds and install a debug version of v3dv on a local directory. You could set a release build, or any other directory. The recipe is also building the OpenGL driver, just in case anyone want to compare, but if you are only interested on the vulkan driver, that is not mandatory.

Run some Vulkan demos

Now, the easiest way to ensure that a vulkan program founds the drivers is setting the following envvar:

export VK_ICD_FILENAMES=/home/pi/local-install/share/vulkan/icd.d/broadcom_icd.armv7l.json

That envvar is used by the Vulkan loader (installed as one of the dependencies listed before) to know which library load. This also means that you don’t need to use LD_PRELOAD, LD_LIBRARY_PATH or similar

So what Vulkan programs are working? For example several of the Sascha Willem Vulkan demos. To make things easier to everybody, here another quick recipe of how to get them build:

$ sudo apt-get install libassimp-dev
$ git clone --recursive https://github.com/SaschaWillems/Vulkan.git  sascha-willems
$ cd sascha-willems
$ mkdir build; cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug  ..
$ make

Update 2020-08-03: When the post was originally written, some demos didn’t need to ask for extra assets. Recently the fonts were moved there, so you would need to gather the assests always:

$ cd ..
$ python3 download_assets.py

So in order to see a really familiar demo:

$ cd build/bin
$ ./gears

And one slightly more complex:

$./scenerendering

As mentioned, not all the demos works. But a list of some that we tested and seem to work:
* distancefieldfonts
* descriptorsets
* dynamicuniformbuffer
* gears
* gltfscene
* imgui
* indirectdraw
* occlusionquery
* parallaxmapping
* pbrbasic
* pbribl
* pbrtexture
* pushconstants
* scenerendering
* shadowmapping
* shadowmappingcascade
* specializationconstants
* sphericalenvmapping
* stencilbuffer
* textoverlay
* texture
* texture3d
* texturecubemap
* triangle
* vulkanscene

Update : rpiMike on the comments, and some people privately, have pointed some errors on the post. Thanks! And sorry for the inconvenience.

Update 2 : Mike Hooper pointed more issues on gitlab