So continuing with the news, here is a fairly recent one: as the tile states, I am happy to announce that the Raspberry Pi 4 is now an OpenGL ES 3.1 conformant product!. This means that the Mesa V3D driver has successfully passed a whole lot of tests designed to validate the OpenGL ES 3.1 feature set, which should be a good sign of driver quality and correctness.
It should be noted that the Raspberry Pi 4 shipped with a V3D driver exposing OpenGL ES 3.0, so this also means that on top of all the bugfixes that we implemented for conformance, the driver has also gained new functionality! Particularly, we merged Eric’s previous work to enable Compute Shaders.
All this work has been in Mesa master since December (I believe there is only one fix missing waiting for us to address review feedback), and will hopefully make it to Raspberry Pi 4 users soon.
Awesome !
Next step: Vulkan?
Can you quickly specify the Mesa version which should contain all needed changes? 19.2,19.3?
I think 19.3 should have all the CTS fixes (except for the one that is pending addressing review feedback). Geometry shaders should come with 20.0.
Thanks for the informations. Great work, I was able to get Tensorflow Lite GPU delegate working (https://www.tensorflow.org/lite/performance/gpu, requires OpenGL ES3.1 ;-)). At least with Mesa 19.3.1, the TFLite GPU delegate on RPi4 is about 3-4 times slower than CPU with 4 Threads. With V3D_DEBUG=perf, I see a comment that currently just 1 WG (work group?) per SG (?) is enabled. Can you give some information for this?
I you want to play around with TF Lite, check https://github.com/jsee23/tensorflow/tree/rpi4. For building, go to tensorflow/lite/tools/make and run ./download_dependencies.sh. Afterwards, switch to tensorflow/lite/tools/cmake and run ./build.sh -DCMAKE_TOOLCHAIN_FILE= -D TFLITE_DELEGATE_GL=1 -DTFLITE_DELEGATE_GL_GBM=1.
This will build a “tflite-benchmark” app, copy it to the target and run it with a tflite-model like “./tflite-benchmark –graph=.tflite –use_gpu=true –num_runs=10
The driver is hardcoding a setup where it only uses a single workgroup (WG) per supergroup (SG). It is likely that this number can be increased for more parallelism and therefore better performance in some cases and it is something we should look into at some point.
Very nice, fantastic work!
Is there a mailing list for development discussion?
Is there a way to create an offscreen context on raspberry pi 4?
The mailing list for development discussion is the one for the Mesa project: mesa-dev at lists.freedesktop.org
Yes, you can create an offscreen context by using a pbuffer surface if that is what you’re asking. You can also render to FBOs of course.
Thank you! Yes, I’m specifically interested in using compute shaders in a headless setup.
Or to rephrase my question; what would be the best way to create a context for only using compute shaders on the Raspberry Pi 4?
In that case I think you probably want to create a surfaceless context: see https://www.khronos.org/registry/OpenGL/extensions/OES/OES_surfaceless_context.txt
Great work on this. I’ve tested the driver on a RPI4 and had issues with it. Check the videos here: https://www.youtube.com/playlist?list=PL1kbWbcg4d6ndxAJbbQ80jse4t_DAjHsQ and their description.