This is a follow up article about my ongoing research on Web content rendering using aggressive batching and merging of draw operations, together with OpenGL (ES 3.0) instanced rendering.
In a previous post, I discussed how relying on the Web engine’s layer tree to figure out non-overlapping content (layers) of a Web page, would (theoretically) allow an OpenGL based rasterizer to ignore the order of the drawing operations. This would allow the rasterizer to group together drawing of similar geometry and submit them efficiently to the GPU using instanced rendering.
I also presented some basic examples and comparisons of this technique with Skia, a popular 2D rasterizer, giving some hints on how much we can accelerate rendering if the overhead of the OpenGL API calls is reduced by using the instanced rendering technique.
However, this idea remained to be validated for real cases and in real hardware, specially because of the complexity and pressure imposed on shader programs, which now become responsible for de-referencing the attributes of each batched geometry and render them correctly.
Also, there are potential API changes in the rasterizer that could make this technique impractical to implement in any existing Web engine without significant changes in the rendering process.
To try keep this article short and focused, today I want to talk only about my latest experiments rendering some fairly complex Web elements using this technique; and leave the discussion about performance to future entries.
Everything is a rectangle
As mentioned in my previous article, almost everything in a Web page can be rendered with a rectangle primitive.
Web pages are mostly character glyphs, which today’s rasterizers normally draw by texture mapping a pre-rendered image of the glyph onto a rectangular area. Then you have boxes, images, shadows, lines, etc; which can all be drawn with a rectangle with the correct layout, transformation and/or texturing.
Primitives that are not rectangles are mostly seen in the element’s border specification, where you have borders with radius, and different styles: double, dotted, grooved, etc. There is a rich set of primitives coming from the combination of features in the borders spec alone.
There is also the Canvas 2D and SVG APIs, which are created specifically for arbitrary 2D content. The technique I’m discussing here purposely ignores these APIs and focuses on accelerating the rest.
In practice, however, these non-rectangular geometries account for just a tiny fraction of the typical rendering of a Web page, which allows me to effectively call them “exceptions”.
The approach I’m currently following assumes everything in a Web page is a rectangle, and all non-rectangular geometry is treated as exceptions and handled differently on shader code.
This means I no longer need to ignore the ordering problem since I always batch a rectangle for every single draw operation, and then render all rectangles in order. This introduces a dramatic change compared to the previous approach I discussed. Now I can (partially) implement this technique without changing the API of existing rasterizers. I say “partially” because to take full advantage of the performance gain, some API changes would be desired.
Drawing non-rectangular geometry using rectangles
So, how do we deal with these exceptions? Remember that we want to draw only with rectangles so that no operation could ever break our batch, if we want to take full advantage of the instanced rendering acceleration.
There are 3 ways of rendering non-rectangular geometry using rectangles:
- 1. Using a geometry shader:
This is the most elegant solution, and looks like it was designed for this case. But since it isn’t yet widely deployed, I will not make much emphasis on it here. But we need to follow its evolution closely.
- 2. Degenerating rectangles:
This is basically to turn a rectangle into a triangle by degenerating one of its vertices. Then, with a set of degenerated rectangles one could draw any arbitrary geometry as we do today with triangles.
- 3. Drawing geometry in the fragment shader:
This sounds like a bad idea, and it is definitely a bad idea! However, given the small and limited amount of cases that we need to consider, it can be feasible.
I’m currently experimenting with 3). You might ask why?, it looks like the worse option. The reason is that going for 2), degenerating rectangles, seems overkill at this point, lacking a deeper understanding of exactly what non-rectangle geometry we will ever need. Implementing a generic rectangle degeneration just for a few tiny set of cases would have been initially a bad choice and a waste of time.
So I decided to explore first the option of drawing these exceptions in the fragment shader and see how far I could go in terms of shader code complexity and performance (un)loss.
Next, I will show some examples of simple Web features rendered this way.
While my previous screen-casts were ran in my working laptop with a powerful Haswell GPU, one of my goals then was to focus on mobile devices. Hence, I started developing on an Arndale board I happen to have around. Details of the exact setup is out of the scope now, but I will just mention that the board is running a Linaro distribution with the official Mali T604 drivers by ARM.
Following is a video I ensambled to show the different examples running on the Arndale board (and my laptop at the same time). This time I had to record using an external camera instead of screen-casting to avoid interference with the performance, so please bear with my camera-on-hand video recording skills.
This video file is also available on Vimeo.
I won’t talk about performance now, since I plan to cover that in future deliveries. Enough to be said that the performance is pretty good, comparable to my laptop in most of the examples. Also, there are a lot of simple known optimizations that I have not done because I’m focusing on validating the method first.
One important thing to note is that when drawing is done in a fragment shader, you cannot benefit from multi-sampling anti-aliasing (MSAA), since sampling occurs at an earlier stage. Hence, you have to implement anti-aliasing your self. In this case, I implemented a simple distance-to-edge linear anti-aliasing, and to my surprise, the end result is much better than the MSAA with 8 samples I was trying on my Haswell laptop before, and it is also faster.
On a related note, I have found out that MSAA does not give me much when rendering character glyphs (the majority of content) since they come already anti-aliased by FreeType2. And MSAA will slow down the rendering of the entire scene for every single frame.
I continue to dump the code from this research into a personal repository on GitHub. Go take a look if you are interested in the prototyping of these experiments.
Conclusions and next steps
There is one important conclusion coming out from these experiments: The fact that the rasterizer is stateless makes it very inefficient to modify a single element in a scene.
By stateless I mean they do not keep semantic information about the elements being drawn. For example, lets say I draw a rectangle in one frame, and in the next frame I want to draw the same rectangle somewhere else on the canvas. I already have a batch with all the elements of the scene happily stored in a vertex buffer object on GPU memory, and the rectangle in question is there somewhere. If I could keep the offset where that rectangle is in the batch, I could modify its attributes without having to drop and re-submit the entire buffer.
The solution: Moving to a scene graph. Web engines already implement a scene graph but at a higher level. Here I’m talking about a scene graph in the rasterizer itself, where nodes keep the offset of their attributes in the batch (layout, transformation, color, etc); and when you modify any of these attributes, only the deltas are uploaded to the GPU, rather than the whole batch.
I believe a scene graph approach has the potential to open a whole new set of opportunities for acceleration, specially for transitions and animations, and scrolling.
And that’s exciting!
Apart from this, I also want to:
- Benchmark! set up a platform for reliable benchmarking and perf comparison with Skia/Cairo.
- Take a subset of this technique and test it in Skia, behind current API.
- Validate the case of drawing drop shadows and multi-step gradient backgrounds.
- Test in other different OpenGL ES 3.0 implementations (and more devices!).
Let us not forget the fight we are fighting: Web applications must be as fast as native. I truly think we can do it.