Lately I have been exposing a bit more functionality in V3DV and was wondering how far we are from Vulkan 1.2. Turns out that a lot of the new Vulkan 1.2 features are actually optional and what we have right now (missing a few trivial patches to expose a few things) seems to be sufficient for a minimal implementation.
We actually did a test run with CTS enabling Vulkan 1.2 to verify this and it went surprisingly well, with just a few test failures that I am currently looking into, so I think we should be able to submit conformance soon.
For those who may be interested, here is a list of what we are not supporting (all of these are optional features in Vulkan 1.2):
I think we should be able to support this in the future.
This we can support in theory, since the hardware has support for half-float, however, the way this is designed in hardware comes with significant caveats that I think would make it really difficult to take advantage of it in practice. It would also require significant work, so it is not something we are planning at present.
We can’t implement this without hacks because the Vulkan spec explicitly defined these addresses to be 64-bit values and the V3D GPU only deals with 32-bit addresses and is not capable of doing any kind of native 64-bit operation. At first I thought we could just lower these to 32-bit (since we know they will be 32-bit), but because the spec makes these explicit 64-bit values, it allows shaders to cast a device address from/to uvec2, which generates 64-bit bitcast instructions and those require both the destination and source to be 64-bit values.
These lack required hardware support, so we don’t expect to implement them.
8 thoughts on “Vulkan 1.2 getting closer”
I don’t quite understand the V3D BDA limitation.
ARM Mali doesn’t have native 64-bit integer support either yet supports the buffer device address extension.
In this case, a `uaddCarry()` has to be performed. The references remain as uvec2 in the shader and are never cast into 64-bit values.
Something like this supports devices without support for uint64.
Forgive me if I’m missing a deeper issue.
P.S. Buffer device addresses are so very very good for implementing compute libraries. They also enable make it easier to implement OpenCL on Vulkan.
the issue is that it is legal for a shader to do explicit casts between uvec2 and buffer references which will be turned into SPIR-V 64-bit OpBitcast instructions, so when the driver consumes the shader, it will find 64-bit OpBitast instructions that it is not expecting and can’t really handle natively. There are CTS tests that exercise this, with shader code like:
accum |= int(T1(uvec2(T1(x.c))).e - 6);
where T1 is a buffer reference.
With that said, there are ways in which we can probably work around this issues (which is probably what Mali is doing), that’s why I said we cannot implement it without some hacks and why we have not implemented this for now, but we might get back to this in the future.
To be be more precise: the problem is that the way in which Mesa handles buffer references at the moment seems to require the address to be handled as a single single 32-bit or 64-bit value, and doesn’t allow to express it a 2×32-bit vector instead. This is probably something we can fix in the future.
Looking forward to seeing BufRefs in V3DV!
Also, I’m surprised that CTS test is being run on a Vulkan device that has `.shaderInt64=false`.
Interesting but not too much so 😉
More interesting to users like me is which features of docs/drivers/zink.rst the vulkan driver is lacking, and which items missing from that list the driver is also lacking (assuming that the 64 to 32 bit lowering WIP patches in zink were landed already).
For what is worth we actually had Zink working some time ago, unfortunately Zink has been requiring more features since then :-(. The one feature we can’t support natively is VK_EXT_scalar_block_layout and what I heard from other colleagues is that it would be difficult to get rid of this feature requirement in modern Zink.
As for why we don’t implement this, I documented it in the driver code:
/* V3D 4.2 wraps TMU vector accesses to 16-byte boundaries, so loads and
* stores of vectors that cross these boundaries would not work correctly
* with scalarBlockLayout and would need to be split into smaller vectors
* (and/or scalars) that don't cross these boundaries. For load/stores
* with dynamic offsets where we can't identify if the offset is
* problematic, we would always have to scalarize. Overall, this would
* not lead to best performance so let's just not support it.
.scalarBlockLayout = false,
With that said, degraded performance is not a blocker if people are aware of what they are getting and they still think this is useful for their purposes, so we might want to expose this in the future if there is interest despite the performance caveats.
Many thanks for the in depth answer! We’ll see how far @zmike is willing to go for “world domination” 🙂
I would also love to see VK_EXT_scalar_block_layout support, even if the performance was degraded due to software trickery.
Thanks for your hard work!
Comments are closed.