{"id":506,"date":"2016-07-13T16:48:20","date_gmt":"2016-07-13T14:48:20","guid":{"rendered":"http:\/\/blogs.igalia.com\/itoral\/?p=506"},"modified":"2016-07-13T16:48:20","modified_gmt":"2016-07-13T14:48:20","slug":"story-and-status-of-arb_gpu_shader_fp64-on-intel-gpus","status":"publish","type":"post","link":"https:\/\/blogs.igalia.com\/itoral\/2016\/07\/13\/story-and-status-of-arb_gpu_shader_fp64-on-intel-gpus\/","title":{"rendered":"Story and status of ARB_gpu_shader_fp64 on Intel GPUs"},"content":{"rendered":"<p>In case you haven&#8217;t heard yet, with the recently announced <a href=\"https:\/\/lists.freedesktop.org\/archives\/mesa-dev\/2016-July\/122638.html\" target=\"_blank\"><strong>Mesa 12.0<\/strong><\/a> release, <em>Intel gen8+ GPUs<\/em> expose <strong>OpenGL 4.3<\/strong>, which is quite a big leap from the previous <em>OpenGL 3.3<\/em>!<\/p>\n<figure id=\"attachment_515\" aria-describedby=\"caption-attachment-515\" style=\"width: 295px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blogs.igalia.com\/itoral\/files\/2016\/07\/opengl4.3-295x300.jpg\" alt=\"OpenGL 4.3\" width=\"295\" height=\"300\" class=\"size-medium wp-image-515\" srcset=\"https:\/\/blogs.igalia.com\/itoral\/files\/2016\/07\/opengl4.3-295x300.jpg 295w, https:\/\/blogs.igalia.com\/itoral\/files\/2016\/07\/opengl4.3.jpg 484w\" sizes=\"auto, (max-width: 295px) 85vw, 295px\" \/><figcaption id=\"caption-attachment-515\" class=\"wp-caption-text\">The Mesa i965 Intel driver now exposes OpenGL 4.3 on Broadwell and later!<br \/><\/figcaption><\/figure>\n<p>Although this might surprise some, the truth is that even if the <em>i965<\/em> driver only exposed <em>OpenGL 3.3<\/em> it had been exposing many of the <em>OpenGL 4.x<\/em> extensions for quite some time, however, there was one <em>OpenGL 4.0<\/em> extension in particular that was still missing and preventing the driver from exposing a higher version: <strong>ARB_gpu_shader_fp64<\/strong> (<em>fp64<\/em> for short). There was a good reason for this: it is a very large feature that has been in the works by <em>Intel<\/em> first and <em>Igalia<\/em> later for quite some time. We first started to work on this as far back as November 2015 and by that time <em>Intel<\/em> had already been working on it for months.<\/p>\n<p>I won&#8217;t cover here what made this such a large effort because there would be a lot of stuff to cover and I don&#8217;t feel like spending weeks writing a series of posts on the subject :). Hopefully I will get a chance to talk about all that at <em>XDC<\/em> in September, so instead I&#8217;ll focus on explaining why we only have this working in <em>gen8+<\/em> at the moment and the status of <em>gen7<\/em> hardware.<\/p>\n<p>The plan for <em>ARB_gpu_shader_fp64<\/em> was always to focus on <em>gen8+<\/em> hardware (<em>Broadwell<\/em> and later) first because it has better support for the feature. I must add that it also has fewer hardware bugs too, although we only found out about that later ;). So the plan was to do <em>gen8+<\/em> and then extend the implementation to cover the quirks required by <em>gen7<\/em> hardware (<em>IvyBridge, Haswell, ValleyView<\/em>).<\/p>\n<p>At this point I should explain that <em>Intel GPUs<\/em> have two code generation backends: scalar and vector. The main difference between both backends is that the vector backend (also known as <em>align16<\/em>) operates on vectors (surprise, right?) and has native support for things like swizzles and writemasks, while the scalar backend (known as <em>align1<\/em>) operates on scalars, which means that, for example, a <em>vec4 GLSL<\/em> operation running is broken up into 4 separate instructions, each one operating on a single component. You might think that this makes the scalar backend slower, but that would not be accurate. In fact it is usually faster because it allows the <em>GPU<\/em> to exploit <em>SIMD<\/em> better than the vector backend.<\/p>\n<p>The thing is that different hardware generations use one backend or the other for different shader stages. For example, <em>gen8+<\/em> used to run <em>Vertex<\/em>, <em>Fragment<\/em> and <em>Compute<\/em> shaders through the scalar backend and <em>Geometry<\/em> and <em>Tessellation<\/em> shaders via the vector backend, whereas <em>Haswell<\/em> and <em>IvyBridge<\/em> use the vector backend also for <em>Vertex<\/em> shaders.<\/p>\n<p>Because you can use 64-bit floating point in any shader stage, the original plan was to implement <em>fp64<\/em> support on both backends. Implementing fp64 requires a lot of changes throughout the driver compiler backends, which makes the task anything but trivial, but the vector backend is particularly difficult to implement because the hardware only supports 32-bit swizzles. This restriction means that a hardware swizzle such as <em>XYZW<\/em> only selects components <em>XY<\/em> in a <em>dvecN<\/em> and therefore, there is no direct mechanism to access components <em>ZW<\/em>. As a consequence, dealing with anything bigger than a <em>dvec2<\/em> requires more creative solutions, which then need to face some other hardware limitations and bugs, etc, which eventually  makes the vector backend require a significantly larger development effort than the scalar backend.<\/p>\n<p>Thankfully, <em>gen8+<\/em> hardware supports scalar <em>Geometry<\/em> and <em>Tessellation<\/em> shaders and <em>Intel<\/em>&#8216;s Kenneth Graunke had been working on enabling that for a while. When we realized that the vector <em>fp64<\/em> backend was going to require much more effort than what we had initially thought, he gave a final push to the full scalar <em>gen8+<\/em> implementation, which in turn allowed us to have a full <em>fp64<\/em> implementation for this hardware and expose <em>OpenGL 4.0<\/em>, and soon after, <em>OpenGL 4.3<\/em>.<\/p>\n<p>That does not mean that we don&#8217;t care about <em>gen7<\/em> though. As I said above, the plan has always been to bring <em>fp64<\/em> and <em>OpenGL4<\/em> to <em>gen7<\/em> as well. In fact, we have been hard at work on that since even before we started sending the <em>gen8+<\/em> implementation for review and we have made some good progress.<\/p>\n<p>Besides addressing the quirks of <em>fp64<\/em> for <em>IvyBridge<\/em> and <em>Haswell<\/em> (yes, they have different implementation requirements) we also need to implement the full <em>fp64<\/em> vector backend support from scratch, which as I said, is not a trivial undertaking. Because <em>Haswell<\/em> seems to require fewer changes we have started with that and I am happy to report that we have a working version already. In fact, we have already sent a small set of patches for review that implement <em>Haswell<\/em>&#8216;s requirements for the scalar backend and as I write this I am cleaning-up an initial implementation of the vector backend in preparation for review (currently at about 100 patches, but I hope to trim it down a bit before we start the review process). <em>IvyBridge<\/em> and <em>ValleView<\/em> will come next.<\/p>\n<p>The initial implementation for the vector backend has room for improvement since the focus was on getting it working first so we can expose <em>OpenGL4<\/em> in <em>gen7<\/em> as soon as possible. The good thing is that it is more or less clear how we can improve the implementation going forward (you can see an excellent post by Curro on that topic <a href=\"https:\/\/bugs.freedesktop.org\/show_bug.cgi?id=92760#c82\" target=\"_blank\">here<\/a>).<\/p>\n<p>You might also be wondering about <em>OpenGL 4.1&#8217;s ARB_vertex_attrib_64bit<\/em>, after all, that kind of goes hand in hand with <em>ARB_gpu_shader_fp64<\/em> and we implemented the extension for <em>gen8+<\/em> too. There is good news here too, as my colleague Juan Su\u00e1rez has already implemented this for <em>Haswell<\/em> and I would expect it to mostly work on <em>IvyBridge<\/em> as is or with minor tweaks. With that we should be able to expose at least <em>OpenGL 4.2<\/em> on all <em>gen7<\/em> hardware once we are done.<\/p>\n<p>So far, implementing <em>ARB_gpu_shader_fp64<\/em> has been quite the ride and I have learned a lot of interesting stuff about how the <em>i965<\/em> driver and <em>Intel GPUs<\/em> operate in the process. Hopefully, I&#8217;ll get to talk about all this in more detail at <em>XDC<\/em> later this year. If you are planning to attend and you are interested in discussing this or other Mesa stuff with me, please find me there, I&#8217;ll be looking forward to it.<\/p>\n<p>Finally, I&#8217;d like to thank both Intel and Igalia for supporting my work on Mesa and i965 all this time, my igalian friends Samuel Iglesias, who has been hard at work with me on the <em>fp64<\/em> implementation all this time, Juan Su\u00e1rez and Andr\u00e9s G\u00f3mez, who have done a lot of work to improve the <em>fp64<\/em> test suite in <em>Piglit<\/em> and all the friends at <em>Intel<\/em> who have been helping us in the process, very especially Connor Abbot, Francisco Jerez, Jason Ekstrand and Kenneth Graunke.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In case you haven&#8217;t heard yet, with the recently announced Mesa 12.0 release, Intel gen8+ GPUs expose OpenGL 4.3, which is quite a big leap from the previous OpenGL 3.3! Although this might surprise some, the truth is that even if the i965 driver only exposed OpenGL 3.3 it had been exposing many of the &hellip; <a href=\"https:\/\/blogs.igalia.com\/itoral\/2016\/07\/13\/story-and-status-of-arb_gpu_shader_fp64-on-intel-gpus\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Story and status of ARB_gpu_shader_fp64 on Intel GPUs&#8221;<\/span><\/a><\/p>\n","protected":false},"author":16,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-506","post","type-post","status-publish","format-standard","hentry","category-graphics"],"_links":{"self":[{"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/posts\/506","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/comments?post=506"}],"version-history":[{"count":41,"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/posts\/506\/revisions"}],"predecessor-version":[{"id":548,"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/posts\/506\/revisions\/548"}],"wp:attachment":[{"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/media?parent=506"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/categories?post=506"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.igalia.com\/itoral\/wp-json\/wp\/v2\/tags?post=506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}