In the previous post I discussed the Mesa development environment and gave a few tips for newcomers, but before we start hacking on the code we should have a look at how modern GPUs look like, since that has a definite impact on the design and implementation of driver code. Let’s get to it.
Fixed Function vs Programmable hardware
Before the advent of shading languages like GLSL we did not have the option to program the 3D hardware at will. Instead, the hardware would have specific units dedicated to implement certain operations (like vertex transformations) that could only be used through specific APIs, like those exposed by OpenGL. These units are usually labeled as Fixed Function, to differentiate them from modern GPUs that also expose fully programmable units.
What we have now in modern GPUs is a fully programmable pipeline, where graphics developers can code graphics algorithms of various sorts in high level programming languages like GLSL. These programs are then compiled and loaded into the GPU to execute specific tasks. This gives graphics developers a huge amount of freedom and power, since they are no longer limited to preset APIs exposing fixed functionality (like the old OpenGL lightning models for example).
Modern graphics drivers
But of course all this flexibility and power that graphics developers enjoy today come at the expense of significantly more complex hardware and drivers, since the drivers are responsible for exposing all that flexibility to the developers while ensuring that we still obtain the best performance out of the hardware in each scenario.
Rather than acting as a bridge between a fixed API like OpenGL and fixed function hardware, drivers also need to handle general purpose graphics programs written in high-level languages. This is a big change. In the case of OpenGL, this means that the driver needs to provide an implementation of the GLSL language, so suddenly, the driver is required to incorporate a full compiler and deal with all sort of problems that belong to the realm of compilers, like choosing an intermediary representation for the program code (IR), performing optimization passes and generating native code for the GPU.
Overview of a modern 3D pipeline
I have mentioned that modern GPUs expose fully programmable hardware units. These are called shading units, and the idea is that these units are connected in a pipeline so that the output of a shading unit becomes the input of the next. In this model, the application developer pushes vertices to one end of the pipeline and usually obtains rendered pixels on the other side. In between these two ends there are a number of units making this transition possible and a number of these will be programmable, which means that the graphics developer can control how these vertices are transformed into pixels at different stages.
The image below shows a simplified example of a 3D graphics pipeline, in this case as exposed by the OpenGL 4.3 specification. Let’s have a quick look at some of its main parts:
The OpenGL 4.3 3D pipeline (image via www.brightsideofnews.com)
Vertex Shader (VS)
This programmable shading unit takes vertices as input and produces vertices as output. Its main job is to transform these vertices in any way the graphics developer sees fit. Typically, this is were we would do transforms like vertex projection,
rotation, translation and, generally, compute per-vertex attributes that we won’t to provide to later stages in the pipeline.
The vertex shader processes vertex data as provided by APIs like glDrawArrays or glDrawElements and outputs shaded vertices that will be assembled into primitives as indicated by the OpenGL draw command (GL_TRIANGLES, GL_LINES, etc).
Geometry shaders are similar to vertex shaders, but instead of operating on individual vertices, they operate on a geometry level (that is, a line, a triangle, etc), so they can take the output of the vertex shader as their input.
The geometry shader unit is programmable and can be used to add or remove vertices from a primitive, clip primitives, spawn entirely new primitives or modify the geometry of a primitive (like transforming triangles into quads or points into triangles, etc). Geometry shaders can also be used to implement basic tessellation even if dedicated tessellation units present in modern hardware are a better fit for this job.
In GLSL, some operations like layered rendering (which allows rendering to multiple textures in the same program) are only accessible through geometry shaders, although this is now also possible in vertex shaders via a particular extension.
The output of a geometry shader are also primitives.
So far all the stages we discussed manipulated vertices and geometry. At some point, however, we need to render pixels. For this, primitives need to be rasterized, which is the process by which they are broken into individual fragments that would then be colored by a fragment shader and eventually turn into pixels in a frame buffer. Rasterization is handled by the rasterizer fixed function unit.
The rasterization process also assigns depth information to these fragments. This information is necessary when we have a 3D scene where multiple polygons overlap on the screen and we need to decide which polygon’s fragments should be rendered and which should be discarded because they are hidden by other polygons.
Finally, the rasterization also interpolates per-vertex attributes in order to compute the corresponding fragment values. For example, let’s say that we have a line primitive where each vertex has a different color attribute, one red and one green. For each fragment in the line the rasterizer will compute interpolated color values by combining red and green depending on how close or far the fragments are to each vertex. With this, we will obtain red fragments on the side of the red vertex that will smoothly transition to green as we move closer to the green vertex.
In summary, the input of the rasterizer are the primitives coming from a vertex, tessellation or geometry shader and the output are the fragments that build the primitive’s surface as projected on the screen including color, depth and other interpolated per-vertex attributes.
Fragment Shader (FS)
The programmable fragment shader unit takes the fragments produced by the rasterization process and executes an algorithm provided by a graphics developer to compute the final color, depth and stencil values for each fragment. This unit can be used to achieve numerous visual effects, including all kinds of post-processing filters, it is usually where we will sample textures to color polygon surfaces, etc.
This covers some of the most important elements in 3D the graphics pipeline and should be sufficient, for now, to understand some of the basics of a driver. Notice, however that have not covered things like transform feedback, tessellation or compute shaders. I hope I can get to cover some of these in future posts.
But before we are done with the overview of the 3D pipeline we should cover another topic that is fundamental to how the hardware works: parallelization.
Graphics processing is a very resource demanding task. We are continuously updating and redrawing our graphics 30/60 times per second. For a full HD resolution of 1920×1080 that means that we need to redraw over 2 million pixels in each go (124.416.000 pixels per second if we are doing 60 FPS). That’s a lot.
To cope with this the architecture of GPUs is massively parallel, which means that the pipeline can process many vertices/pixels simultaneously. For example, in the case of the Intel Haswell GPUs, programmable units like the VS and GS have multiple Execution Units (EU), each with their own set of ALUs, etc that can spawn up to 70 threads each (for GS and VS) while the fragment shader can spawn up to 102 threads. But that is not the only source of parallelism: each thread may handle multiple objects (vertices or pixels depending on the case) at the same time. For example, a VS thread in Intel hardware can shade two vertices simultaneously, while a FS thread can shade up to 8 (SIMD8) or 16 (SIMD16) pixels in one go.
Some of these means of parallelism are relatively transparent to the driver developer and some are not. For example, SIMD8 vs SIMD16 or single vertex shading vs double vertex shading requires specific configuration and writing driver code that is aligned with the selected configuration. Threads are more transparent, but in certain situations the driver developer may need to be careful when writing code that can require a sync between all running threads, which would obviously hurt performance, or at least be careful to do that kind of thing when it would hurt performance the least.
Coming up next
So that was a very brief introduction to how modern 3D pipelines look like. There is still plenty of stuff I have not covered but I think we can go through a lot of that in later posts as we dig deeper into the driver code. My next post will discuss how Mesa models various of the programmable pipeline stages I have introduced here, so stay tuned!