Free access to Valve-produced games on Steam for Mesa contributors

Just like they did for Debian developers before, it is Valve’s way of saying thanks and giving something back to the community. This is great news for all Mesa contributors, now we can play some great Valve games for free and we can also have an easier time looking into bug reports for them, which also works great for Valve, closing a perfect circle 🙂

An introduction to Mesa’s GLSL compiler (II)

Recap

My previous post served as an initial look into Mesa’s GLSL compiler, where we discussed the Mesa IR, which is a core aspect of the compiler. In this post I’ll introduce another relevant aspect: IR lowering.

IR lowering

There are multiple lowering passes implemented in Mesa (check src/glsl/lower_*.cpp for a complete list) but they all share a common denominator: their purpose is to re-write certain constructs in the IR so they fit better the underlying GPU hardware.

In this post we will look into the lower_instructions.cpp lowering pass, which rewrites expression operations that may not be supported directly by GPU hardware with different implementations.

The lowering process involves traversing the IR, identifying the instructions we want to lower and modifying the IR accordingly, which fits well into the visitor pattern strategy discussed in my previous post. In this case, expression lowering is handled by the lower_instructions_visitor class, which implements the lowering pass in the visit_leave() method for ir_expression nodes.

The hierarchical visitor class, which serves as the base class for most visitors in Mesa, defines visit() methods for leaf nodes in the IR tree, and visit_leave()/visit_enter() methods for non-leaf nodes. This way, when traversing intermediary nodes in the IR we can decide to take action as soon as we enter them or when we are about to leave them.

In the case of our lower_instructions_visitor class, the visit_leave() method implementation is a large switch() statement with all the operators that it can lower.

The code in this file lowers common scenarios that are expected to be useful for most GPU drivers, but individual drivers can still select which of these lowering passes they want to use. For this purpose, hardware drivers create instances of the lower_instructions class passing the list of lowering passes to enable. For example, the Intel i965 driver does:

const int bitfield_insert = brw->gen >= 7
                            ? BITFIELD_INSERT_TO_BFM_BFI
                            : 0;
lower_instructions(shader->base.ir,
                   MOD_TO_FLOOR |
                   DIV_TO_MUL_RCP |
                   SUB_TO_ADD_NEG |
                   EXP_TO_EXP2 |
                   LOG_TO_LOG2 |
                   bitfield_insert |
                   LDEXP_TO_ARITH);

Notice how in the case of Intel GPUs, one of the lowering passes is conditionally selected depending on the hardware involved. In this case, brw->gen >= 7 selects GPU generations since IvyBridge.

Let’s have a look at the implementation of some of these lowering passes. For example, SUB_TO_ADD_NEG is a very simple one that transforms subtractions into negative additions:

void
lower_instructions_visitor::sub_to_add_neg(ir_expression *ir)
{
   ir->operation = ir_binop_add;
   ir->operands[1] =
      new(ir) ir_expression(ir_unop_neg, ir->operands[1]->type,
                            ir->operands[1], NULL);
   this->progress = true;
}

As we can see, the lowering pass simply changes the operator used by the ir_expression node, and negates the second operand using the unary negate operator (ir_unop_neg), thus, converting the original a = b – c into a = b + (-c).

Of course, if a driver does not have native support for the subtraction operation, it could still do this when it processes the IR to produce native code, but this way Mesa is saving driver developers that work. Also, some lowering passes may enable optimization passes after the lowering that drivers might miss otherwise.

Let’s see a more complex example: MOD_TO_FLOOR. In this case the lowering pass provides an implementation of ir_binop_mod (modulo) for GPUs that don’t have a native modulo operation.

The modulo operation takes two operands (op0, op1) and implements the C equivalent of the ‘op0 % op1‘, that is, it computes the remainder of the division of op0 by op1. To achieve this the lowering pass breaks the modulo operation as mod(op0, op1) = op0 – op1 * floor(op0 / op1), which requires only multiplication, division and subtraction. This is the implementation:

ir_variable *x = new(ir) ir_variable(ir->operands[0]->type, "mod_x",
                                     ir_var_temporary);
ir_variable *y = new(ir) ir_variable(ir->operands[1]->type, "mod_y",
                                     ir_var_temporary);
this->base_ir->insert_before(x);
this->base_ir->insert_before(y);

ir_assignment *const assign_x =
   new(ir) ir_assignment(new(ir) ir_dereference_variable(x),
                         ir->operands[0], NULL);
ir_assignment *const assign_y =
   new(ir) ir_assignment(new(ir) ir_dereference_variable(y),
                         ir->operands[1], NULL);

this->base_ir->insert_before(assign_x);
this->base_ir->insert_before(assign_y);

ir_expression *const div_expr =
   new(ir) ir_expression(ir_binop_div, x->type,
                         new(ir) ir_dereference_variable(x),
                         new(ir) ir_dereference_variable(y));

/* Don't generate new IR that would need to be lowered in an additional
 * pass.
 */
if (lowering(DIV_TO_MUL_RCP) && (ir->type->is_float() ||
    ir->type->is_double()))
   div_to_mul_rcp(div_expr);

ir_expression *const floor_expr =
   new(ir) ir_expression(ir_unop_floor, x->type, div_expr);

if (lowering(DOPS_TO_DFRAC) && ir->type->is_double())
   dfloor_to_dfrac(floor_expr);

ir_expression *const mul_expr =
   new(ir) ir_expression(ir_binop_mul,
                         new(ir) ir_dereference_variable(y),
                         floor_expr);

ir->operation = ir_binop_sub;
ir->operands[0] = new(ir) ir_dereference_variable(x);
ir->operands[1] = mul_expr;
this->progress = true;

Notice how the first thing this does is to assign the operands to a variable. The reason for this is a bit tricky: since we are going to implement ir_binop_mod as op0 – op1 * floor(op0 / op1), we will need to refer to the IR nodes op0 and op1 twice in the tree. However, we can’t just do that directly, for that would mean that we have the same node (that is, the same pointer) linked from two different places in the IR expression tree. That is, we want to have this tree:

                             sub
                           /     \
                        op0       mult
                                 /    \
                              op1     floor
                                        |
                                       div
                                      /   \
                                   op0     op1

Instead of this other tree:


                             sub
                           /     \
                           |      mult
                           |     /   \
                           |   floor  |
                           |     |    |
                           |    div   |
                           |   /   \  |
                            op0     op1   

This second version of the tree is problematic. For example, let’s say that a hypothetical optimization pass detects that op1 is a constant integer with value 1, and realizes that in this case div(op0/op1) == op0. When doing that optimization, our div subtree is removed, and with that, op1 could be removed too (and possibily freed), leaving the other reference to that operand in the IR pointing to an invalid memory location… we have just corrupted our IR:

                             sub
                           /     \
                           |      mult
                           |     /    \
                           |   floor   op1 [invalid pointer reference]
                           |     |
                           |    /
                           |   /
                            op0 

Instead, what we want to do here is to clone the nodes each time we need a new reference to them in the IR. All IR nodes have a clone() method for this purpose. However, in this particular case, cloning the nodes creates a new problem: op0 and op1 are ir_expression nodes so, for example, op0 could be the expression a + b * c, so cloning the expression would produce suboptimal code where the expression gets replicated. This, at best, will lead to slower
compilation times due to optimization passes needing to detect and fix that, and at worse, that would go undetected by the optimizer and lead to worse performance where we compute the value of the expression multiple times:

                              sub
                           /        \
                         add         mult
                        /   \       /    \
                      a     mult  op1     floor
                            /   \          |
                           b     c        div
                                         /   \
                                      add     op1
                                     /   \
                                    a    mult
                                        /    \
                                        b     c

The solution to this problem is to assign the expression to a variable, then dereference that variable (i.e., read its value) wherever we need. Thus, the implementation defines two variables (x, y), assigns op0 and op1 to them and creates new dereference nodes wherever we need to access the value of the op0 and op1 expressions:

                       =               =
                     /   \           /   \
                    x     op0       y     op1


                             sub
                           /     \
                         *x       mult
                                 /    \
                               *y     floor
                                        |
                                       div
                                      /   \
                                    *x     *y

In the diagram above, each variable dereference is marked with an ‘*’, and each one is a new IR node (so both appearances of ‘*x’ refer to different IR nodes, both representing two different reads of the same variable). With this solution we only evaluate the op0 and op1 expressions once (when they get assigned to the corresponding variables) and we never refer to the same IR node twice from different places (since each variable dereference is a new IR node).

Now that we know why we assign these two variables, let’s continue looking at the code of the lowering pass:

In the next step we implement op0 / op1 using a ir_binop_div expression. To speed up compilation, if the driver has the DIV_TO_MUL_RCP lowering pass enabled, which transforms a / b into a * 1 / b (where 1 / b could be a native instruction), we immediately execute the lowering pass for that expression. If we didn’t do this here, the resulting IR would contain a division operation that might have to be lowered in a later pass, making the compilation process slower.

The next step uses a ir_unop_floor expression to compute floor(op0/op1), and again, tests if this operation should be lowered too, which might be the case if the type of the operands is a 64bit double instead of a regular 32bit float, since GPUs may only have a native floor instruction for 32bit floats.

Next, we multiply the result by op1 to get op1 * floor(op0 / op1).

Now we only need to subtract this from op0, which would be the root IR node for this expression. Since we want the new IR subtree spawning from this root node to replace the old implementation, we directly edit the IR node we are lowering to replace the ir_binop_mod operator with ir_binop_sub, make a dereference to op1 in the first operand and link the expression holding op1 * floor(op0 / op1) in the second operand, effectively attaching our new implementation in place of the old version. This is how the original and lowered IRs look like:

Original IR:

[prev inst] -> mod -> [next inst]
              /   \            
           op0     op1         

Lowered IR:

[prev inst] -> var x -> var y ->   =   ->   =   ->   sub   -> [next inst]
                                  / \      / \      /   \
                                 x  op0   y  op1  *x     mult
                                                        /    \
                                                      *y      floor
                                                                |
                                                               div
                                                              /   \
                                                            *x     *y

Finally, we return true to let the compiler know that we have optimized the IR and that as a consequence we have introduced new nodes that may be subject to further lowering passes, so it can run a new pass. For example, the subtraction we just added may be lowered again to a negative addition as we have seen before.

Coming up next

Now that we learnt about lowering passes we can also discuss optimization passes, which are very similar since they are also based on the visitor implementation in Mesa and also transform the Mesa IR in a similar way.

An introduction to Mesa’s GLSL compiler (I)

Recap

In my last post I explained that modern 3D pipelines are programmable and how this has impacted graphics drivers. In the following posts we will go deeper into this aspect by looking at different parts of Mesa’s GLSL compiler. Specifically, this post will cover the GLSL parser, the Mesa IR and built-in variables and functions.

The GLSL parser

The job of the parser is to process the shader source code string provided via glShaderSource and transform it into a suitable binary representation that is stored in RAM and can be efficiently processed by other parts of the compiler in later stages.

The parser consists of a set of Lex/Yacc rules to process the incoming shader source. The lexer (glsl_parser.ll) takes care of tokenizing the source code and the parser (glsl_parser.yy) adds meaning to the stream of tokens identified in
the lexer stage.

Similarly, just like in C or C++, GLSL includes a pre-processor that goes through the shader source code before the main parser kicks in. Mesa’s implementation of the GLSL pre-processor lives in src/glsl/glcpp and is also based on Lex/Yacc rules.

The output of the parser is an Abstract Syntax Tree (AST) that lives in RAM memory, which is a binary representation of the shader source code. The nodes that make this tree are defined in src/glsl/ast.h.

For someone familiar with all the Lex/Yacc stuff, the parser implementation in Mesa should feel familiar enough.

The next step takes care of converting from the AST to a different representation that is better suited for the kind of operations that drivers will have to do with it. This new representation, called the IR (Intermediate Representation), is usually referenced in Mesa as Mesa IR, GLSL IR or simply HIR.

The AST to Mesa IR conversion is driven by the code in src/glsl/ast_to_hir.cpp.

Mesa IR

The Mesa IR is the main data structure used in the compiler. Most of the work that the compiler does can be summarized as:

  • Optimizations in the IR
  • Modifications in the IR for better/easier integration with GPU hardware
  • Linking multiple shaders (multiple IR instances) into a single program.
  • Generating native assembly code for the target GPU from the IR

As we can see, the Mesa IR is at the core of all the work that the compiler has to do, so understanding how it is setup is necessary to work in this part of Mesa.

The nodes in the Mesa IR tree are defined in src/glsl/ir.h. Let’s have a look at the most important ones:

At the top of the class hierarchy for the IR nodes we have exec_node, which is Mesa’s way of linking independent instructions together in a list to make a program. This means that each instruction has previous and next pointers to the instructions that are before and after it respectively. So, we have ir_instruction, the base class for all nodes in the tree, inherit from exec_node.

Another important node is ir_rvalue, which is the base class used to represent expressions. Generally, anything that can go on the right side of an assignment is an ir_rvalue. Subclasses of ir_rvalue include ir_expression, used to represent all kinds of unary, binary or ternary operations (supported operators are defined in the ir_expression_operation enumeration), ir_texture, which is used to represent texture operations like a texture lookup, ir_swizzle, which is used for swizzling values in vectors, all the ir_dereference nodes, used to access the values stored in variables, arrays, structs, etc. and ir_constant, used to represent constants of all basic types (bool, float, integer, etc).

We also have ir_variable, which represents variables in the shader code. Notice that the definition of ir_variable is quite large… in fact, this is by large the node with the most impact in the memory footprint of the compiler when compiling shaders in large games/applications. Also notice that the IR differentiates between variables and variable dereferences (the fact of looking into a variable’s value), which are represented as an ir_rvalue.

Similarly, the IR also defines nodes for other language constructs like ir_loop, ir_if, ir_assignment, etc.

Debugging the IR is not easy, since the representation of a shader program in IR nodes can be quite complex to traverse and inspect with a debugger. To help with this Mesa provides means to print the IR to a human-readable text format. We can enable this by using the environment variable MESA_GLSL=dump. This will instruct Mesa to print both the original shader source code and its IR representation. For example:

$ MESA_GLSL=dump ./test_program

GLSL source for vertex shader 1:
#version 140
#extension GL_ARB_explicit_attrib_location : enable

layout(location = 0) in vec3 inVertexPosition;
layout(location = 1) in vec3 inVertexColor;

uniform mat4 MVP;
smooth out vec3 out0;

void main()
{
  gl_Position = MVP * vec4(inVertexPosition, 1);
  out0 = inVertexColor;
}

GLSL IR for shader 1:
(
(declare (sys ) int gl_InstanceID)
(declare (sys ) int gl_VertexID)
(declare (shader_out ) (array float 0) gl_ClipDistance)
(declare (shader_out ) float gl_PointSize)
(declare (shader_out ) vec4 gl_Position)
(declare (uniform ) (array vec4 56) gl_CurrentAttribFragMESA)
(declare (uniform ) (array vec4 33) gl_CurrentAttribVertMESA)
(declare (uniform ) gl_DepthRangeParameters gl_DepthRange)
(declare (uniform ) int gl_NumSamples)
(declare () int gl_MaxVaryingComponents)
(declare () int gl_MaxClipDistances)
(declare () int gl_MaxFragmentUniformComponents)
(declare () int gl_MaxVaryingFloats)
(declare () int gl_MaxVertexUniformComponents)
(declare () int gl_MaxDrawBuffers)
(declare () int gl_MaxTextureImageUnits)
(declare () int gl_MaxCombinedTextureImageUnits)
(declare () int gl_MaxVertexTextureImageUnits)
(declare () int gl_MaxVertexAttribs)
(declare (shader_in ) vec3 inVertexPosition)
(declare (shader_in ) vec3 inVertexColor)
(declare (uniform ) mat4 MVP)
(declare (shader_out smooth) vec3 out0)
(function main
  (signature void
    (parameters
    )
    (
      (declare (temporary ) vec4 vec_ctor)
      (assign  (w) (var_ref vec_ctor)  (constant float (1.000000)) ) 
      (assign  (xyz) (var_ref vec_ctor)  (var_ref inVertexPosition) ) 
      (assign  (xyzw) (var_ref gl_Position)
            (expression vec4 * (var_ref MVP) (var_ref vec_ctor) ) ) 
      (assign  (xyz) (var_ref out0)  (var_ref inVertexColor) ) 
    ))
)
)

Notice, however, that the IR representation we get is not the one that is produced by the parser. As we will see later, that initial IR will be modified in multiple ways by Mesa, for example by adding different kinds of optimizations, so the IR that we see is the result after all these processing passes over the original IR. Mesa refers to this post-processed version of the IR as LIR (low-level IR) and to the initial version of the IR as produced by the parser as HIR (high-level IR). If we want to print the HIR (or any intermediary version of the IR as it transforms into the final LIR), we can edit the compiler and add calls to _mesa_print_ir as needed.

Traversing the Mesa IR

We mentioned before that some of the compiler’s work (a big part, in fact) has to do with optimizations and modifications of the IR. This means that the compiler needs to traverse the IR tree and identify subtrees that are relevant to this kind of operations. To achieve this, Mesa uses the visitor design pattern.

Basically, the idea is that we have a visitor object that can traverse the IR tree and we can define the behavior we want to execute when it finds specific nodes.

For instance, there is a very simple example of this in src/glsl/linker.cpp: find_deref_visitor, which detects if a variable is ever read. This involves traversing the IR, identifying ir_dereference_variable nodes (the ones where a variable’s value is accessed) and check if the name of that variable matches the one we are looking for. Here is the visitor class definition:

/**
 * Visitor that determines whether or not a variable is ever read.
 */
class find_deref_visitor : public ir_hierarchical_visitor {
public:
   find_deref_visitor(const char *name)
      : name(name), found(false)
   {
      /* empty */
   }

   virtual ir_visitor_status visit(ir_dereference_variable *ir)
   {
      if (strcmp(this->name, ir->var->name) == 0) {
         this->found = true;
         return visit_stop;
      }

      return visit_continue;
   }

   bool variable_found() const
   {
      return this->found;
   }

private:
   const char *name;       /**< Find writes to a variable with this name. */
   bool found;             /**< Was a write to the variable found? */
};

And this is how we get to use this, for example to check if the shader code ever reads gl_Vertex:

find_deref_visitor find("gl_Vertex");
find.run(sh->ir);
if (find.variable_found()) {
  (...)
}

Most optimization and lowering passes in Mesa are implemented as visitors and follow a similar idea. We will look at examples of these in a later post.

Built-in variables and functions

GLSL defines a set of built-in variables (with ‘gl_’ prefix) for each shader stage which Mesa injects into the shader code automatically. If you look at the example where we used MESA_GLSL=dump to obtain the generated Mesa IR you can see some of these variables.

Mesa implements support for built-in variables in _mesa_glsl_initialize_variables(), defined in src/glsl/builtin_variables.cpp.

Notice that some of these variables are common to all shader stages, while some are specific to particular stages or available only in specific versions of GLSL.

Depending on the type of variable, Mesa or the hardware driver may be able to provide the value immediately (for example for variables holding constant values like gl_MaxVertexAttribs or gl_MaxDrawBuffers). Otherwise, the driver will probably have to fetch (or generate) the value for the variable from the hardware at program run-time by generating native code that is added to the user program. For example, a geometry shader that uses gl_PrimitiveID will need that variable updated for each primitive processed by the Geometry Shader unit in a draw call. To achieve this, a driver might have to generate native code that fetches the current primitive ID value from the hardware and puts stores it in the register that provides the storage for the gl_PrimitveID variable before the user code is executed.

The GLSL language also defines a number of available built-in functions that must be provided by implementators, like texture(), mix(), or dot(), to name a few examples. The entry point in Mesa’s GLSL compiler for built-in functions
is src/glsl/builtin_functions.cpp.

The method builtin_builder::create_builtins() takes care of registering built-in functions, and just like with built-in variables, not all functions are always available: some functions may only be available in certain shading units, others may only be available in certain GLSL versions, etc. For that purpose, each built-in function is registered with a predicate that can be used to test if that function is at all available in a specific scenario.

Built-in functions are registered by calling the add_function() method, which registers all versions of a specific function. For example mix() for float, vec2, vec3, vec4, etc Each of these versions has its own availability predicate. For instance, mix() is always available for float arguments, but using it with integers requires GLSL 1.30 and the EXT_shader_integer_mix extension.

Besides the availability predicate, add_function() also takes an ir_function_signature, which tells Mesa about the specific signature of the function being registered. Notice that when Mesa creates signatures for the functions it also defines the function body. For example, the following code snippet defines the signature for modf():

ir_function_signature *
builtin_builder::_modf(builtin_available_predicate avail,
                       const glsl_type *type)
{
   ir_variable *x = in_var(type, "x");
   ir_variable *i = out_var(type, "i");
   MAKE_SIG(type, avail, 2, x, i);

   ir_variable *t = body.make_temp(type, "t");
   body.emit(assign(t, expr(ir_unop_trunc, x)));
   body.emit(assign(i, t));
   body.emit(ret(sub(x, t)));

   return sig;
}

GLSL’s modf() splits a number in its integer and fractional parts. It assigns the integer part to an output parameter and the function return value is the fractional part.

This signature we see above defines input parameter ‘x’ of type ‘type’ (the number we want to split), an output parameter ‘i’ of the same type (which will hold the integer part of ‘x’) and a return type ‘type’.

The function implementation is based on the existence of the unary operator ir_unop_trunc, which can take a number and extract its integer part. Then it computes the fractional part by subtracting that from the original number.

When the modf() built-in function is used, the call will be expanded to include this IR code, which will later be transformed into native code for the GPU by the corresponding hardware driver. In this case, it means that the hardware driver is expected to provide an implementation of the ir_unop_trunc operator, for example, which in the case of the Intel i965 driver is implemented as a single hardware instruction (see brw_vec4_visitor.cpp or brw_fs_visitor.cpp
in src/mesa/drivers/dri/i965).

In some cases, the implementation of a built-in function can’t be defined at the IR level. In this case the implementation simply emits an ad-hoc IR node that drivers can identify and expand appropriately. An example of this is EmitVertex() in a geometry shader. This is not really a function call in the traditional sense, but a way to signal the driver that we have defined all the attributes of a vertex and it is time to “push” that vertex into the current primitive. The meaning of “pushing the vertex” is something that can’t be defined at the IR level because it will be different for each driver/hardware. Because of that, the built-in function simply injects an IR node ir_emit_vertex that drivers can identify and implement properly when the time comes. In the case of the Intel code, pushing a vertex involves a number of steps that are very intertwined with the hardware, but it basically amounts to generating native code that implements the behavior that the hardware expects for that to happen. If you are curious, the implementation of this in the i965 driver code can be found in brw_vec4_gs_visitor.cpp, in the visit() method that takes an ir_emit_vertex IR node as parameter.

Coming up next

In this post we discussed the parser, which is the entry point for the compiler, and introduced the Mesa IR, the main data structure. In following posts we will delve deeper into the GLSL compiler implementation. Specifically, we will look into the lowering and optimization passes as well as the linking process and the hooks for hardware drivers that deal with native code generation.

A brief overview of the 3D pipeline

Recap

In the previous post I discussed the Mesa development environment and gave a few tips for newcomers, but before we start hacking on the code we should have a look at how modern GPUs look like, since that has a definite impact on the design and implementation of driver code. Let’s get to it.

Fixed Function vs Programmable hardware

Before the advent of shading languages like GLSL we did not have the option to program the 3D hardware at will. Instead, the hardware would have specific units dedicated to implement certain operations (like vertex transformations) that could only be used through specific APIs, like those exposed by OpenGL. These units are usually labeled as Fixed Function, to differentiate them from modern GPUs that also expose fully programmable units.

What we have now in modern GPUs is a fully programmable pipeline, where graphics developers can code graphics algorithms of various sorts in high level programming languages like GLSL. These programs are then compiled and loaded into the GPU to execute specific tasks. This gives graphics developers a huge amount of freedom and power, since they are no longer limited to preset APIs exposing fixed functionality (like the old OpenGL lightning models for example).

Modern graphics drivers

But of course all this flexibility and power that graphics developers enjoy today come at the expense of significantly more complex hardware and drivers, since the drivers are responsible for exposing all that flexibility to the developers while ensuring that we still obtain the best performance out of the hardware in each scenario.

Rather than acting as a bridge between a fixed API like OpenGL and fixed function hardware, drivers also need to handle general purpose graphics programs written in high-level languages. This is a big change. In the case of OpenGL, this means that the driver needs to provide an implementation of the GLSL language, so suddenly, the driver is required to incorporate a full compiler and deal with all sort of problems that belong to the realm of compilers, like choosing an intermediary representation for the program code (IR), performing optimization passes and generating native code for the GPU.

Overview of a modern 3D pipeline

I have mentioned that modern GPUs expose fully programmable hardware units. These are called shading units, and the idea is that these units are connected in a pipeline so that the output of a shading unit becomes the input of the next. In this model, the application developer pushes vertices to one end of the pipeline and usually obtains rendered pixels on the other side. In between these two ends there are a number of units making this transition possible and a number of these will be programmable, which means that the graphics developer can control how these vertices are transformed into pixels at different stages.

The image below shows a simplified example of a 3D graphics pipeline, in this case as exposed by the OpenGL 4.3 specification. Let’s have a quick look at some of its main parts:


The OpenGL 4.3 3D pipeline (image via www.brightsideofnews.com)

Vertex Shader (VS)

This programmable shading unit takes vertices as input and produces vertices as output. Its main job is to transform these vertices in any way the graphics developer sees fit. Typically, this is were we would do transforms like vertex projection,
rotation, translation and, generally, compute per-vertex attributes that we won’t to provide to later stages in the pipeline.

The vertex shader processes vertex data as provided by APIs like glDrawArrays or glDrawElements and outputs shaded vertices that will be assembled into primitives as indicated by the OpenGL draw command (GL_TRIANGLES, GL_LINES, etc).

Geometry Shader

Geometry shaders are similar to vertex shaders, but instead of operating on individual vertices, they operate on a geometry level (that is, a line, a triangle, etc), so they can take the output of the vertex shader as their input.

The geometry shader unit is programmable and can be used to add or remove vertices from a primitive, clip primitives, spawn entirely new primitives or modify the geometry of a primitive (like transforming triangles into quads or points into triangles, etc). Geometry shaders can also be used to implement basic tessellation even if dedicated tessellation units present in modern hardware are a better fit for this job.

In GLSL, some operations like layered rendering (which allows rendering to multiple textures in the same program) are only accessible through geometry shaders, although this is now also possible in vertex shaders via a particular extension.

The output of a geometry shader are also primitives.

Rasterization

So far all the stages we discussed manipulated vertices and geometry. At some point, however, we need to render pixels. For this, primitives need to be rasterized, which is the process by which they are broken into individual fragments that would then be colored by a fragment shader and eventually turn into pixels in a frame buffer. Rasterization is handled by the rasterizer fixed function unit.

The rasterization process also assigns depth information to these fragments. This information is necessary when we have a 3D scene where multiple polygons overlap on the screen and we need to decide which polygon’s fragments should be rendered and which should be discarded because they are hidden by other polygons.

Finally, the rasterization also interpolates per-vertex attributes in order to compute the corresponding fragment values. For example, let’s say that we have a line primitive where each vertex has a different color attribute, one red and one green. For each fragment in the line the rasterizer will compute interpolated color values by combining red and green depending on how close or far the fragments are to each vertex. With this, we will obtain red fragments on the side of the red vertex that will smoothly transition to green as we move closer to the green vertex.

In summary, the input of the rasterizer are the primitives coming from a vertex, tessellation or geometry shader and the output are the fragments that build the primitive’s surface as projected on the screen including color, depth and other interpolated per-vertex attributes.

Fragment Shader (FS)

The programmable fragment shader unit takes the fragments produced by the rasterization process and executes an algorithm provided by a graphics developer to compute the final color, depth and stencil values for each fragment. This unit can be used to achieve numerous visual effects, including all kinds of post-processing filters, it is usually where we will sample textures to color polygon surfaces, etc.

This covers some of the most important elements in 3D the graphics pipeline and should be sufficient, for now, to understand some of the basics of a driver. Notice, however that have not covered things like transform feedback, tessellation or compute shaders. I hope I can get to cover some of these in future posts.

But before we are done with the overview of the 3D pipeline we should cover another topic that is fundamental to how the hardware works: parallelization.

Parallelization

Graphics processing is a very resource demanding task. We are continuously updating and redrawing our graphics 30/60 times per second. For a full HD resolution of 1920×1080 that means that we need to redraw over 2 million pixels in each go (124.416.000 pixels per second if we are doing 60 FPS). That’s a lot.

To cope with this the architecture of GPUs is massively parallel, which means that the pipeline can process many vertices/pixels simultaneously. For example, in the case of the Intel Haswell GPUs, programmable units like the VS and GS have multiple Execution Units (EU), each with their own set of ALUs, etc that can spawn up to 70 threads each (for GS and VS) while the fragment shader can spawn up to 102 threads. But that is not the only source of parallelism: each thread may handle multiple objects (vertices or pixels depending on the case) at the same time. For example, a VS thread in Intel hardware can shade two vertices simultaneously, while a FS thread can shade up to 8 (SIMD8) or 16 (SIMD16) pixels in one go.

Some of these means of parallelism are relatively transparent to the driver developer and some are not. For example, SIMD8 vs SIMD16 or single vertex shading vs double vertex shading requires specific configuration and writing driver code that is aligned with the selected configuration. Threads are more transparent, but in certain situations the driver developer may need to be careful when writing code that can require a sync between all running threads, which would obviously hurt performance, or at least be careful to do that kind of thing when it would hurt performance the least.

Coming up next

So that was a very brief introduction to how modern 3D pipelines look like. There is still plenty of stuff I have not covered but I think we can go through a lot of that in later posts as we dig deeper into the driver code. My next post will discuss how Mesa models various of the programmable pipeline stages I have introduced here, so stay tuned!

Setting up a development environment for Mesa

Recap

In my previous post I provided an overview of the Mesa source tree and identified some of its main modules.

Since we are on that subject I thought it would make sense to give a few tips on how to setup the development environment for Mesa too, so here I go.

Development environment

Mesa is mostly written in a combination of C and C++, uses autotools for its build system and Git for version control, so it should be a fairly familiar environment for many people. I am not going to explain how to build autotools projects here, there is plenty of documentation available on that subject, so instead I will focus on the specifics of Mesa.

First we need to checkout the source code. If you do not have a developer account then do an anonymous checkout:

# git clone git://anongit.freedesktop.org/git/mesa/mesa

If you do have a developer account do this instead:

# git clone git+ssh://username@git.freedesktop.org/git/mesa/mesa

Next, we will have to deal with dependencies. This should not be too hard though. Mesa is fairly low in the software stack so it does not have many and the ones it has seem to have a fairly stable API and don’t change too often, so typically, you should be able to build Mesa if you have a recent distribution and you keep it up to date. For reference, as of now I can build Mesa on my Ubuntu 14.04 without any problems.

In any case, the actual dependencies you will need to get may vary depending on the drivers you want to build, the target platform and the features you want to enable. For example, the R300 Gallium driver requires LLVM, but the Intel i965 driver doesn’t.

Notice, however, that if you are hacking on features that require specific builds of the XServer, Wayland/Weston or similar stuff the required setup will be more complex, since you would probably need to include these other projects into the mix, together with their respective dependencies.

Configuring the source tree

Here I will mention some of the Mesa specific options that I found to be more useful in my time with Mesa:

–enable-debug: This is necessary, at least, to get assertions to work, and you want this while you are developing. Mesa and the drivers have assertions on many places to make sure that new code does not break certain assumptions or violate hardware constraints, so you really want to make sure that you have these activated when you are developing. It also adds “-g -O0” to enable debug support.

–with-dri-drivers: This is the list of classic Mesa DRI drivers you want to build. If you know you will only hack on the i965 driver, for example, then building other drivers will only slow down your builds.

–with-gallium-drivers: This is the list of Gallium drivers you want to build. Again, if you are hacking on the classic DRI i965 driver you are probably not interested in building any Gallium drivers.

Notice that if you are working on the Mesa framework layer, that is, the bits shared by all drivers, instead of the internals of a specific driver, you will probably want to include more drivers in the build to make sure that they keep building after your changes.

–with-egl-platforms: This is a list of supported platforms. Same as with the options above, you probably only want to build Mesa for the platform or platforms you are working on.

Besides using a combination of these options, you probably also want to set your CFLAGS and CXXFLAGS (remember that Mesa uses both C and C++). I for one like to pass “-g3”, for example.

Using your built version of Mesa

Once you have built Mesa you can type ‘make install’ to install the libraries and drivers. Probably, you have configured autotools (via the –-prefix option) to do this to a safe location that does not conflict with your distribution installation of Mesa and now your problem is to tell your OpenGL programs that they should use this version of Mesa instead of the one provided by your distro.

You will have to adjust a couple of environment variables for this:

LIBGL_DRIVERS_PATH: Set this to the path where your built drivers have been installed. This will tell Mesa’s loader to look for the drivers here.

LD_LIBRARY_PATH: Set this to the path where your Mesa libraries have been installed. This will make it so that OpenGL programs load your recently built libGL.so rather than your system’s.

For more tips I’d suggest to read this short thread in the Mesa mailing list, which has some Mesa developers discussing their development environment setup.

Coming up next

In the next post I will provide an introduction to modern 3D graphics hardware. After all, the job of the graphics driver is all about programming the hardware, so having a basic understanding of how it works is a requirement if want to do any meaningful driver development.

An eagle eye view into the Mesa source tree

Recap

My last post introduced Mesa’s loader as the module that takes care of auto-selecting the right driver for our hardware. If the loader fails to find a suitable hardware driver it will fall back to a software driver, but we can also force this situation ourselves, which may come in handy in some scenarios. We also took a quick look at the glxinfo tool that we can use to query the capabilities and features exposed by the selected driver.

The topic of today focuses on providing a quick overview of the Mesa source code tree, which will help us identify the parts of the code that are relevant to our interests depending on the driver and/or the feature we intend to work on.

Browsing the source code

First off, there is already some documentation on this topic available on the Mesa 3D website that is a good place to start. Since that already gives some insight on what goes into each part of the repository I’ll focus on complementing that information with a little bit more of detail for some of the most important parts I have interacted with so far:

  • In src/egl/ we have the implementation of the EGL standard. If you are working on EGL-specific features, tracking down an EGL-specific problem or you are simply curious about how EGL links into the GL implementation, this is the place you want to visit. This includes the EGL implementations for the X11, DRM and Wayland platforms.
  • In src/glx/ we have the OpenGL bits relating specifically to X11 platforms, known as GLX. So if you are working on the GLX layer, this is the place to go. Here there is all the stuff that takes care of interacting with the XServer, the client-side DRI implementation, etc.
  • src/glsl/ contains a critical aspect of Mesa: the GLSL compiler used by all Mesa drivers. It includes a GLSL parser, the definition of the Mesa IR, also referred to as GLSL IR, used to represent shader programs internally, the shader linker and various optimization passes that operate on the Mesa IR. The resulting Mesa IR produced by the GLSL compiler is then consumed by the various drivers which transform it into native GPU code that can be loaded and run in the hardware.
  • src/mesa/main/ contains the core Mesa elements. This includes hardware-independent views of core objects like textures, buffers, vertex array objects, the OpenGL context, etc as well as basic infrastructure, like linked lists.
  • src/mesa/drivers/ contains the actual classic drivers (not Gallium). DRI drivers in particular go into src/mesa/drivers/dri. For example the Intel i965 driver goes into src/mesa/drivers/dri/i965. The code here is, for the most part, very specific to the underlying hardware platforms.
  • src/mesa/swrast*/ and src/mesa/tnl*/ provide software implementations for things like rasterization or vertex transforms. Used by some software drivers and also by some hardware drivers to implement certain features for which they don’t have hardware support or for which hardware support is not yet available in the driver. For example, the i965 driver implements operations on the accumulation and selection buffers in software via these modules.
  • src/mesa/vbo/ is another important module. Across its various versions, OpenGL has specified many ways in which a program can tell OpenGL about its vertex data, from using functions of the glVertex*() family inside glBegin()/glEnd() blocks, to things like vertex arrays, vertex array objects, display lists, etc… The drivers, however, do not need to deal with all this, Mesa makes it so that they always receive their vertex data as collection of vertex arrays, significantly reducing complexity on the side of the driver implementator. This is the module that takes care of managing all this, so no matter what type of drawing you GL program is doing or how it specifies its vertex data, it will always go through this module before it reaches the driver.
  • src/loader/, as we have seen in my previous post, contains the Mesa driver loader, which provides the logic necessary to decide which Mesa driver is the right one to use for a specific hardware so that Mesa’s libGL.so can auto-select the right driver when loaded.
  • src/gallium/ contains the Gallium3D framework implementation. If, like me, you only work on a classic driver, you don’t need to care about the contents of this at all. If you are working on Gallium drivers however, this is the place where you will find the various Gallium drivers in development (inside src/gallium/drivers/), like the various Gallium ATI/AMD drivers, Nouveau or the LLVM based software driver (llvmpipe) and the Gallium state trackers.

So with this in mind, one should have enough information to know where to start looking for something specific:

  • If are interested in how vertex data provided to OpenGL is manipulated and uploaded to the GPU, the vbo module is probably the right place to look.
  • If we are looking to work on a specific aspect of a concrete hardware driver, we should go to the corresponding directory in src/mesa/drivers/ if it is a classic driver, or src/gallium/drivers if it is a Gallium driver.
  • If we want to know about how Mesa, the framework, abstracts various OpenGL concepts like textures, vertex array objects, shader programs, etc. we should look into src/mesa/main/.
  • If we are interested in the platform specific support, be it EGL or GLX, we want to look into src/egl or src/glx.
  • If we are interested in the GLSL implementation, which involves anything from the compiler to the intermediary IR and the various optimization passes, we need to look into src/glsl/.

Coming up next

So now that we have an eagle view of the contents of the Mesa repository let’s see how we can prepare a development environment so we can start hacking on
some stuff. I’ll cover this in my next post.

Driver loading and querying in Mesa

Recap

In my previous post I explained that Mesa is a framework for OpenGL driver development. As such, it provides code that can be reused by multiple driver implementations. This code is, of course, hardware agnostic, but frees driver developers from doing a significant part of the work. The framework also provides hooks for developers to add the bits of code that deal with the actual hardware. This design allows multiple drivers to co-exist and share a significant amount of code.

I also explained that among the various drivers that Mesa provides, we can find both hardware drivers that take advantage of a specific GPU and software drivers, that are implemented entirely in software (so they work on the CPU and do not depend on a specific GPU). The latter are obviously slower, but as I discussed, they may come in handy in some scenarios.

Driver selection

So, Mesa provides multiple drivers, but how does it select the one that fits the requirements of a specific system?

You have probably noticed that Mesa is deployed in multiple packages. In my Ubuntu system, the one that deploys the DRI drivers is libgl1-mesa-dri:amd64. If you check its contents you will see that this package installs OpenGL drivers for various GPUs:

# dpkg -L libgl1-mesa-dri:amd64 
(...)
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r600.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_nouveau.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_vmwgfx.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r300.so
/usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_swrast.so
/usr/lib/x86_64-linux-gnu/dri/i915_dri.so
/usr/lib/x86_64-linux-gnu/dri/i965_dri.so
/usr/lib/x86_64-linux-gnu/dri/r200_dri.so
/usr/lib/x86_64-linux-gnu/dri/r600_dri.so
/usr/lib/x86_64-linux-gnu/dri/radeon_dri.so
/usr/lib/x86_64-linux-gnu/dri/r300_dri.so
/usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so
/usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
/usr/lib/x86_64-linux-gnu/dri/nouveau_vieux_dri.so
/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
(...)

Since I have a relatively recent Intel GPU, the driver I need is the one provided in i965_dri.so. So how do we tell Mesa that this is the one we need? Well, the answer is that we don’t, Mesa is smart enough to know which driver is the right one for our GPU, and selects it automatically when you load libGL.so. The part of Mesa that takes care of this is called the ‘loader’.

You can, however, point Mesa to look for suitable drivers in a specific directory other than the default, or force it to use a software driver using various environment variables.

What driver is Mesa actually loading?

If you want to know exactly what driver Mesa is loading, you can instruct it to dump this (and other) information to stderr via the LIBGL_DEBUG environment variable:

# LIBGL_DEBUG=verbose glxgears 
libGL: screen 0 does not appear to be DRI3 capable
libGL: pci id for fd 4: 8086:0126, driver i965
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/i965_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/i965_dri.so

So we see that Mesa checks the existing hardware and realizes that the i965 driver is the one to use, so it first attempts to load the TLS version of that driver and, since I don’t have the TLS version, falls back to the normal version, which I do have.

The code in src/loader/loader.c (loader_get_driver_for_fd) is the one responsible for detecting the right driver to use (i965 in my case). This receives a device fd as input parameter that is acquired previously by calling DRI2Connect() as part of the DRI bring up process. Then the actual driver file is loaded in glx/dri_common.c (driOpenDriver).

We can also obtain a more descriptive indication of the driver we are loading by using the glxinfo program that comes with the mesa-utils package:

# glxinfo | grep -i "opengl renderer"
OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile 

This tells me that I am using the Intel hardware driver, and it also shares information related with the specific Intel GPU I have (SandyBridge).

Forcing a software driver

I have mentioned that having software drivers available comes in handy at times, but how do we tell the loader to use them? Mesa provides an environment variable that we can set for this purpose, so switching between a hardware driver and a software one is very easy to do:

# LIBGL_DEBUG=verbose LIBGL_ALWAYS_SOFTWARE=1 glxgears 
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so

As we can see, setting LIBGL_ALWAYS_SOFTWARE will make the loader select a software driver (swrast).

If I force a software driver and call glxinfo like I did before, this is what I get:

# LIBGL_ALWAYS_SOFTWARE=1 glxinfo | grep -i "opengl renderer"
OpenGL renderer string: Software Rasterizer

So it is clear that I am using a software driver in this case.

Querying the driver for OpenGL features

The glxinfo program also comes in handy to obtain information about the specific OpenGL features implemented by the driver. If you want to check if the Mesa driver for your hardware implements a specific OpenGL extension you can inspect the output of glxinfo and look for that extension:

# glxinfo | grep GL_ARB_texture_multisample

You can also ask glxinfo to include hardware limits for certain OpenGL features including the -l switch. For example:

# glxinfo -l | grep GL_MAX_TEXTURE_SIZE
GL_MAX_TEXTURE_SIZE = 8192

Coming up next

In my next posts I will cover the directory structure of the Mesa repository, identifying its main modules, which should give Mesa newcomers some guidance as to where they should look for when they need to find the code that deals with something specific. We will then discuss how modern 3D hardware has changed the way GPU drivers are developed and explain how a modern 3D graphics pipeline works, which should pave the way to start looking into the real guts of Mesa: the implementation of shaders.

Diving into Mesa

Recap

In my last post I gave a quick introduction to the Linux graphics stack. There I explained how what we call a graphics driver in Linux is actually a combination of three different drivers:

  • the user space X server DDX driver, which handles 2D graphics.
  • the user space 3D OpenGL driver, that can be provided by Mesa.
  • the kernel space DRM driver.

Now that we know where Mesa fits let’s have a more detailed look into it.

DRI drivers and non-DRI drivers

As explained, Mesa handles 3D graphics by providing an implementation of the OpenGL API. Mesa OpenGL drivers are usually called DRI drivers too. Remember that, after all, the DRI architecture was brought to life precisely to enable efficient implementation of OpenGL drivers in Linux and, as I introduced in my previous post, DRI/DRM are the building blocks of the OpenGL drivers in Mesa.

There are other implementations of the OpenGL API available too. Hardware vendors that provide drivers for Linux will provide their own implementation of the OpenGL API, usually in the form of a binary blob. For example, if you have an NVIDIA GPU and install NVIDIA’s proprietary driver this will install its own libGL.so.

Notice that it is possible to create graphics drivers that do not follow the DRI architecture in Linux. For example, the NVIDIA proprietary driver installs a Kernel module that implements similar functionality to DRM but with a different API that has been designed by NVIDIA, and obviously, their corresponding user space drivers (DDX and OpenGL) will use this API instead of DRM to communicate with the NVIDIA kernel space driver.

Mesa, the framework

You have probably noticed that when I talk about Mesa I usually say ‘drivers’, in plural. That is because Mesa itself is not really a driver, but a project that hosts multiple drivers (that is, multiple implementations of the OpenGL API).

Indeed, Mesa is best seen as a framework for OpenGL implementators that provides abstractions and code that can be shared by multiple drivers. Obviously, there are many aspects of an OpenGL implementation that are independent of the underlying hardware, so these can be abstracted and reused.

For example, if you are familiar with OpenGL you know it provides a state based API. This means that many API calls do not have an immediate effect, they only modify the values of certain variables in the driver but do not require to push these new values to the hardware immediately. Indeed, usually that will happen later, when we actually render something by calling glDrawArrays() or a similar API: it is at that point that the driver will configure the 3D pipeline for rendering according to all the state that has been set by the previous API calls. Since these APIs do not interact with the hardware their implementation can be shared by multiple drivers, and then, each driver, in their implementation of glDrawArrays(), can fetch the values stored in this state and translate them into something meaningful for the hardware at hand.

As such, Mesa provides abstractions for many things and even complete implementations for multiple OpenGL APIs that do not require interaction with the hardware, at least not immediate interaction.

Mesa also defines hooks for the parts where drivers may need to do hardware specific stuff, for example in the implementation of glDrawArrays().

Looking into glDrawArrays()

Let’s see an example of these hooks into a hardware driver by inspecting the stacktrace produced from a call to glDrawArrays() inside Mesa. In this case, I am using the Mesa Intel DRI driver and I am calling glDrawArrays() from a function named render() in my program. This is the relevant part of the stacktrace:

brw_upload_state () at brw_state_upload.c:651
brw_try_draw_prims () at brw_draw.c:483
brw_draw_prims () at brw_draw.c:578
vbo_draw_arrays () at vbo/vbo_exec_array.c:667
vbo_exec_DrawArrays () at vbo/vbo_exec_array.c:819
render () at main.cpp:363

Notice that glDrawArrays() is actually vbo_exec_DrawArrays(). What is interesting about this stack is that vbo_exec_DrawArrays() and vbo_draw_arrays() are hardware independent and reused by many drivers inside Mesa. If you don’t have an Intel GPU like me, but also use a Mesa, your backtrace should be similar. These generic functions would usually do things like checks for API use errors, reformatting inputs in a way that is more appropriate for later processing or fetching additional information from the current state that will be needed to implement the actual operation in the hardware.

At some point, however, we need to do the actual rendering, which involves configuring the hardware pipeline according to the command we are issuing and the relevant state we have set in prior API calls. In the stacktrace above this starts with brw_draw_prims(). This function call is part of the Intel DRI driver, it is the hook where the Intel driver does the stuff required to configure the Intel GPU for drawing and, as you can see, it will later call something named brw_upload_state(), which will upload a bunch of state to the hardware to do exactly this, like configuring the various shader stages required by the current program, etc.

Registering driver hooks

In future posts we will discuss how the driver configures the pipeline in more detail, but for now let’s just see how the Intel driver registers its hook for the glDrawArrays() call. If we look at the stacktrace, and knowing that brw_draw_prims() is the hook into the Intel driver, we can just inspect how it is called from vbo_draw_arrays():

static void
vbo_draw_arrays(struct gl_context *ctx, GLenum mode, GLint start,
                GLsizei count, GLuint numInstances, GLuint baseInstance)
{
   struct vbo_context *vbo = vbo_context(ctx);
   (...)
   vbo->draw_prims(ctx, prim, 1, NULL, GL_TRUE, start, start + count - 1,
                   NULL, NULL);
   (...)
}

So the hook is draw_prims() inside vbo_context. Doing some trivial searches in the source code we can see that this hook is setup in brw_draw_init() like this:

void brw_draw_init( struct brw_context *brw )
{
   struct vbo_context *vbo = vbo_context(ctx);
   (...)
   /* Register our drawing function:
    */
   vbo->draw_prims = brw_draw_prims;
   (...)
}

Let’s put a breakpoint there and see when Mesa calls into that:

brw_draw_init () at brw_draw.c:583
brwCreateContext () at brw_context.c:767
driCreateContextAttribs () at dri_util.c:435
dri2_create_context_attribs () at dri2_glx.c:318
glXCreateContextAttribsARB () at create_context.c:78
setupOpenGLContext () at main.cpp:411
init () at main.cpp:419
main () at main.cpp:477

So there it is, Mesa (unsurprisingly) calls into the Intel DRI driver when we setup the OpenGL context and it is there when the driver will register various hooks, including the one for drawing primitives.

We could do a similar thing to see how the driver registers its hook for the context creation. We will see that the Intel driver (as well as other drivers in Mesa) assign a global variable with the hooks they need like this:

static const struct __DriverAPIRec brw_driver_api = {
   .InitScreen           = intelInitScreen2,
   .DestroyScreen        = intelDestroyScreen,
   .CreateContext        = brwCreateContext,
   .DestroyContext       = intelDestroyContext,
   .CreateBuffer         = intelCreateBuffer,
   .DestroyBuffer        = intelDestroyBuffer,
   .MakeCurrent          = intelMakeCurrent,
   .UnbindContext        = intelUnbindContext,
   .AllocateBuffer       = intelAllocateBuffer,
   .ReleaseBuffer        = intelReleaseBuffer
};

PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
{
   globalDriverAPI = &brw_driver_api;

   return brw_driver_extensions;
}

This global is then used throughout the DRI implementation in Mesa to call into the hardware driver as needed.

We can see that there are two types of hooks then, the ones that are needed to link the driver into the DRI implementation (which are the main entry points of the driver in Mesa) and then the hooks they add for tasks that are related to the hardware implementation of OpenGL bits, typically registered by the driver at context creation time.

In order to write a new DRI driver one would only have to write implementations for all these hooks, the rest is already implemented in Mesa and reused across multiple drivers.

Gallium3D, a framework inside a framework

Currently, we can split Mesa DRI drivers in two kinds: the classic drivers (not based on the Gallium3D framework) and the new Gallium drivers.

Gallium3D is part of Mesa and attempts to make 3D driver development easier and more practical than it was before. For example, classic Mesa drivers are tightly coupled with OpenGL, which means that implementing support for other APIs (like Direct3D) would pretty much require to write a completely new implementation/driver. This is addressed by the Gallium3D framework by providing an API that exposes hardware functions as present in modern GPUs rather than focusing on a specific API like OpenGL.

Other benefits of Gallium include, for example, support for various Operating Systems by separating the part of the driver that relies on specific aspects of the underlying OS.

In the last years we have seen a lot of drivers moving to the Gallium infrastructure, including nouveau (the open source driver for NVIDIA GPUs), various radeon drivers, some software drivers (swrast, llvmpipe) and more.


Gallium3D driver model (image via wikipedia)

Although there were some efforts to port the Intel driver to Gallium in the past, development of the Intel Gallium drivers (i915g and i965g) is stalled now as far as I know. Intel is focusing in the classic version of the drivers instead. This is probably because it would take a large amount of time and effort to bring the current classic driver to Gallium with the same features and stability that it has in its current classic form for many generations of Intel GPUs. Also, there is a lot of work going on to add support for new OpenGL features to the driver at the moment, which seems to be the priority right now.

Gallium and LLVM

As we will see in more detail in future posts, writing a modern GPU driver involves a lot of native code generation and optimization. Also, OpenGL includes the OpenGL Shading Language (GLSL) which directly requires to have a GLSL compiler available in the driver too.

It is no wonder then that Mesa developers thought that it would make sense to reuse existing compiler infrastructure rather than building and using their own: enter LLVM.

By introducing LLVM into the mix, Mesa developers expect to bring new and better optimizations to shaders and produce better native code, which is critical to performance.

This would also allow to eliminate a lot of code from Mesa and/or the drivers. Indeed, Mesa has its own complete implementation of a GLSL compiler, which includes a GLSL parser, compiler and linker as well as a number of optimizations, both for abstract representations of the code, in Mesa, and for the actual native code for a specific GPU, in the actual hardware driver.

The way that Gallium plugs LLVM is simple: Mesa parses GLSL and produces LLVM intermediary representation of the shader code that it can then pass to LLVM, which will take care of the optimization. The role of hardware drivers in this scenario is limited to providing LLVM backends that describe their respective GPUs (instruction set, registers, constraints, etc) so that LLVM knows how it can do its work for the target GPU.

Hardware and Software drivers

Even today I see people who believe that Mesa is just a software implementation of OpenGL. If you have read my posts so far it should be clear that this is not true: Mesa provides multiple implementations (drivers) of OpenGL, most of these are hardware accelerated drivers but Mesa also provides software drivers.

Software drivers are useful for various reasons:

  • For developing and testing purposes, when you want to take the hardware out of the equation. From this point of view, a software representation can provide a reference for expected behavior that is not tied or constrained by any particular hardware. For example, if you have an OpenGL program that does not work correctly we can run it with the software driver: if it works fine then we know the problem is in the hardware driver, otherwise we can suspect that the problem is in the application itself.
  • To allow execution of OpenGL in systems that lack 3D hardware drivers. It would obviously be slow, but in some scenarios it could be sufficient and it is definitely better than not having any 3D support at all.

I initially intended to cover more stuff in this post, but it is already getting long enough so let’s stop here for now. In the next post we will discuss how we can check and change the driver in use by Mesa, for example to switch between a software and hardware driver, and we will then start looking into Mesa’s source code and introduce its main modules.

A brief introduction to the Linux graphics stack

This post attempts to be a brief and simple introduction to the Linux graphics stack, and as such, it has an introductory nature. I will focus on giving enough context to understand the role that Mesa and 3D drivers in general play in the stack and leave it to follow up posts to dive deeper into the guts of Mesa in general and the Intel DRI driver specifically.

A bit of history

In order to understand some of the particularities of the current graphics stack it is important to understand how it had to adapt to new challenges throughout the years.

You see, nowadays things are significantly more complex than they used to be, but in the early times there was only a single piece of software that had direct access to the graphics hardware: the X server. This approach made the graphics stack simpler because it didn’t need to synchronize access to the graphics hardware between multiple clients.

In these early days applications would do all their drawing indirectly, through the X server. By using Xlib they would send rendering commands over the X11 protocol that the X server would receive, process and translate to actual hardware commands on the other side of a socket. Notice that this “translation” is the job of a driver: it takes a bunch of hardware agnostic rendering commands as its input and translates them into hardware commands as expected by the targeted GPU.

Since the X server was the only piece of software that could talk to the graphics hardware by design, these drivers were written specifically for it, became modules of the X server itself and an integral part of its architecture. These userspace drivers are called DDX drivers in X server argot and their role in the graphics stack is to support 2D operations as exported by Xlib and required by the X server implementation.


DDX drivers in the X server (image via wikipedia)

In my Ubuntu system, for example, the DDX driver for my Intel GPU comes via the xserver-xorg-video-intel package and there are similar packages for other GPU vendors.

3D graphics

The above covers 2D graphics as that is what the X server used to be all about. However, the arrival of 3D graphics hardware changed the scenario significantly, as we will see now.

In Linux, 3D graphics is implemented via OpenGL, so people expected an implementation of this standard that would take advantage of the fancy new 3D hardware, that is, a hardware accelerated libGL.so. However, in a system where only the X server was allowed to access the graphics hardware we could not have a libGL.so that talked directly to the 3D hardware. Instead, the solution was to provide an implementation of OpenGL that would send OpenGL commands to the X server through an extension of the X11 protocol and let the X server translate these into actual hardware commands as it had been doing for 2D commands before.

We call this Indirect Rendering, since applications do not send rendering commands directly to the graphics hardware, and instead, render indirectly through the X server.


OpenGL with Indirect Rendering (image via wikipedia)

Unfortunately, developers would soon realize that this solution was not sufficient for intensive 3D applications, such as games, that required to render large amounts of 3D primitives while maintaining high frame rates. The problem was clear: wrapping OpenGL calls in the X11 protocol was not a valid solution.

In order to achieve good performance in 3D applications we needed these to access the hardware directly and that would require to rethink a large chunk of the graphics stack.

Enter Direct Rendering Infrastructure (DRI)

Direct Rendering Infrastructure is the new architecture that allows X clients to talk to the graphics hardware directly. Implementing DRI required changes to various parts of the graphics stack including the X server, the kernel and various client libraries.

Although the term DRI usually refers to the complete architecture, it is often also used to refer only to the specific part of it that involves the interaction of applications with the X server, so be aware of this dual meaning when you read about this stuff on the Internet.

Another important part of DRI is the Direct Rendering Manager (DRM). This is the kernel side of the DRI architecture. Here, the kernel handles sensitive aspects like hardware locking, access synchronization, video memory and more. DRM also provides userspace with an API that it can use to submit commands and data in a format that is adequate for modern GPUs, which effectively allows userspace to communicate with the graphics hardware.

Notice that many of these things have to be done specifically for the target hardware so there are different DRM drivers for each GPU. In my Ubuntu system the DRM module for my Intel GPU is provided via the libdrm-intel1:amd64 package.


OpenGL with Direct Rendering (image via wikipedia)

DRI/DRM provide the building blocks that enable userspace applications to access the graphics hardware directly in an efficient and safe manner, but in order to use OpenGL we need another piece of software that, using the infrastructure provided by DRI/DRM, implements the OpenGL API while respecting the X server requirements.

Enter Mesa

Mesa is a free software implementation of the OpenGL specification, and as such, it provides a libGL.so, which OpenGL based programs can use to output 3D graphics in Linux. Mesa can provide accelerated 3D graphics by taking advantage of the DRI architecture to gain direct access to the underlying graphics hardware in its implementation of the OpenGL API.

When our 3D application runs in an X11 environment it will output its graphics to a surface (window) allocated by the X server. Notice, however, that with DRI this will happen without intervention of the X server, so naturally there is some synchronization to do between the two, since the X server still owns the window Mesa is rendering to and is the one in charge of displaying its contents on the screen. This synchronization between the OpenGL application and the X server is part of DRI. Mesa’s implementation of GLX (the extension of the OpenGL specification that addresses the X11 platform) uses DRI to talk to the X server and accomplish this.

Mesa also has to use DRM for many things. Communication with the graphics hardware happens by sending commands (for example “draw a triangle”) and data (for example the vertex coordinates of the triangle, their color attributes, normals, etc). This process usually involves allocating a bunch of buffers in the graphics hardware where all these commands and data are copied so that the GPU can access them and do its work. This is enabled by the DRM driver, which is the one piece that takes care of managing video memory and which offers APIs to userspace (Mesa in this case) to do this for the specific target hardware. DRM is also required whenever we need to allocate and manage video memory in Mesa, so things like creating textures, uploading data to textures, allocating color, depth or stencil buffers, etc all require to use the DRM APIs for the target hardware.


OpenGL/Mesa in the context of 3D Linux games (image via wikipedia)

What’s next?

Hopefully I have managed to explain what is the role of Mesa in the Linux graphics stack and how it works together with the Direct Rendering Infrastructure to enable efficient 3D graphics via OpenGL. In the next post we will cover Mesa in more detail, we will see that it is actually a framework where multiple OpenGL drivers live together, including both hardware and software variants, we will also have a look at its directory structure and identify its main modules, introduce the Gallium framework and more.

A tour around the world of Mesa and Linux graphics drivers

For some time now I have decided to focus my work at Igalia on the graphics stack. As a result of this I had the chance to participate in a couple of very interesting projects like implementing Wayland support in WebKitGtk+ (a topic I have visited in this blog a number of times) and, lately, work on graphics drivers for Linux in the Mesa framework.

The graphics stack in Linux is complex and it is not always easy to find information and technical documentation that can aid beginners in their firsts steps. This is usually a very demanding domain, the brave individuals who decide to put their energy into it usually have their hands full hacking on the code and they don’t have that much room for documenting what they do in a way that is particularly accessible to newcomers.

As I mentioned above, I have been hacking on Mesa lately (particularly on the Intel i965 driver) and so far it as been a lot of fun, probably the most exciting work I have done at Igalia in all these years, but it is also certainly challenging, requiring me to learn a lot of new things and some times fairly complex stuff.

Getting involved in this is no easy endeavor, the learning curve is steep because the kind of work you do here is probably unlike anything you have done before: for starters it requires a decent understanding of OpenGL and capacity to understand OpenGL specifications and what they mean in the context of the driver, you also need to have a general understanding of how modern 3D-capable GPUs work and finally, you have to dig deeper and understand how the specific GPU that your driver targets works and what is the role that the driver needs to play to make that hardware work as intended. And that’s not all of it, a driver may need to support multiple generations of GPUs which sometimes can be significantly different from each other, requiring driver developers to write and merge multiple code paths that handle these differences. You can imagine the maintenance burden and extra complexity that comes from this.

Finally, we should also consider the fact that graphics drivers are among the most critical pieces of code you can probably have in a system, they need to be performant and stable for all supported hardware generations, which adds to the overall complexity.

All this stuff can be a bit overwhelming in the beginning for those who attempt to give their first steps in this world but I believe that this initial steep learning curve can be smoothed out by introducing some of the most important concepts in a way that is oriented specifically to new developers. The rest will still not be an easy task, it requires hard work, some passion, be willing to learn and a lot of attention to detail, but I think anyone passionate enough should be able to get into it with enough dedication.

I had to go through all this process myself lately, so I figured I am in a very good situation to try and address this problem myself, so that’s why I decided to write a series of posts to introduce people to the world of Mesa and 3D graphics drivers, with a focus on OpenGL and Intel GPUs, which is the area were I am currently developing my work. Although I’ll focus on Intel hardware I believe that many of the concepts that I will be introducing here are general enough so that they are useful also to people interested in other GPUs. I’ll try to be clear about when I am introducing general concepts and when I am discussing Intel specific stuff.

My next post, which will be the first in this series, will serve as an introduction to the Linux graphics stack and Linux graphics drivers. We will discuss what Mesa brings to the table exactly and what we mean when we talk about graphics drivers in Linux exactly. I think that should put us on the right track to start looking into the internals of Mesa.

So that’s it, if you are interested in learning more about Linux graphics and specifically Mesa and 3D graphics drivers, stay tuned! I’ll try my best to post regularly and often.