W3C, Encrypted Media Extensions and DRM: A rant

The recent decision of the Director of the World Wide Web Consortium to overrule members’ objections to publishing a DRM standard for the Web, together with a series of rant I have read around about the issue; compelled me to write my own rant. What follows is, as usual, my own personal view on the subject and does not represent that of any of my partners or contractors.

The argumnent that the Web would become irrelevant if it didn’t support reproducing DRM-protected content, therefore some sort of DRM-supporting technology has to be standarized as part of the W3C, is two-fold flawed. Yet this is one of the core arguments used to defend supporting Encrypted Media Extension (EME), a technology that enables DRM, as a W3C standard.

On one hand, this is like saying that the Web is mostly useful for reproducing DRM-video from the big content distributors, downplaying the rest of the features that make the Web so important and empowering. It is actually the case that the usefulness of the open Web is very hard to replicate with (a collection of) non-Web applications; while reproducing DRM-protected video would be a tiny feature of the Web, in comparison.

Notice also, that this is problematic only for DRM-protected content, which accounts for only a fraction of the video content available on the Internet. Otherwise non-DRM video can perfectly be reproduced today with HTML5. While it is true that there is high user demand for content distributed by those same companies that push for DRM on the Web; there isn’t any requirement to use DRM to deliver that content other than supporting a very specific business practice that is only important for one party, the distributors, and ncessarily results in more control and freedom-limiting to users.

Is it the W3C’s mission to protect business models and practices (of dubidous effectiveness and well-known privacy and/or security issues)? No, it isn’t.

On the other hand, it is a fallacy that if a given DRM-enabling technology is not part of a W3C standard, it will necessarily bring fragmentation to the ecosystem because each player will (re)invent their own DRM technology. True, this is what has hapended with Flash and Silverlight, but arguably because there wasn’t a W3C standard. These technologies were invented many years before consuming video on the Web was popular or even possible, at a time when the value of avoiding fragmentation was not enough or not evident to implementors. These technologies existed before the big streaming corps of today existed, and were created for other purposes at a time when browsers were competing in features at the expense of interoperability (remember IE?).

The Industry players (content distributors, web implementors, etc) are free to agree on a common solution outside the umbrella of the W3C, because it is in their best interest to avoid fragmentation. It saves costs and makes for a better user experience. It is what industries do when they see opportunity for cutting down costs. Just don’t utilize the W3C (or better, the W3C should not let itself be utilized) to standarize a techonology that undermines the openness of the Web.

Browsers vendors are and will always be competing with each other. The W3C’s best position might just be to leave the market do what it is (supposedly) good at; and limit itself to advise against DRM, or failing that, at most provide guidelines on how to limit the damage should Web DRM happens in other forum (e.g, implement sandboxing, UX for user consent, make it opt-in, protect researchers, etc).

The reality is that DRM content distributors need the Web more than the Web needs to support DRM-content. It is just much cheaper for them if the W3C, a respected institution, solves the fragmentation for them.

By enshrining EME in its standards, the W3C is unnecessarily supporting a set of (corporate) interests at the expense of its users’ interests, giving undeserved legitimacy to DRM, based on an argument of pragmatism that doesn’t exist.

Going even further, it is arguable that the Web should support every feature that there is out there, if it bears a risk for its users in terms of privacy, security and freedom of research. Today, Web users don’t expect to do absolutely everything in a Web browser, even if that is desirable in the future. Most borwsers out there support configuring 3er party URL schemes that would launch a native application to handle the content pointed to by such URL. This might just be an acceptable trade-off for the Web ecosystem for specific types of content.

Also, the majority of content distributors have their own native players. In most cases, they promote the use of their native apps to users over the Web version of their players, because the experience is usually better and they can exert more control.

It might be that we have all been sold a straw man.

Drawing Web content with OpenGL (ES 3.0) instanced rendering

This is a follow up article about my ongoing research on Web content rendering using aggressive batching and merging of draw operations, together with OpenGL (ES 3.0) instanced rendering.

In a previous post, I discussed how relying on the Web engine’s layer tree to figure out non-overlapping content (layers) of a Web page, would (theoretically) allow an OpenGL based rasterizer to ignore the order of the drawing operations. This would allow the rasterizer to group together drawing of similar geometry and submit them efficiently to the GPU using instanced rendering.

I also presented some basic examples and comparisons of this technique with Skia, a popular 2D rasterizer, giving some hints on how much we can accelerate rendering if the overhead of the OpenGL API calls is reduced by using the instanced rendering technique.

However, this idea remained to be validated for real cases and in real hardware, specially because of the complexity and pressure imposed on shader programs, which now become responsible for de-referencing the attributes of each batched geometry and render them correctly.

Also, there are potential API changes in the rasterizer that could make this technique impractical to implement in any existing Web engine without significant changes in the rendering process.

To try keep this article short and focused, today I want to talk only about my latest experiments rendering some fairly complex Web elements using this technique; and leave the discussion about performance to future entries.

Everything is a rectangle

As mentioned in my previous article, almost everything in a Web page can be rendered with a rectangle primitive.

Web pages are mostly character glyphs, which today’s rasterizers normally draw by texture mapping a pre-rendered image of the glyph onto a rectangular area. Then you have boxes, images, shadows, lines, etc; which can all be drawn with a rectangle with the correct layout, transformation and/or texturing.

Primitives that are not rectangles are mostly seen in the element’s border specification, where you have borders with radius, and different styles: double, dotted, grooved, etc. There is a rich set of primitives coming from the combination of features in the borders spec alone.

There is also the Canvas 2D and SVG APIs, which are created specifically for arbitrary 2D content. The technique I’m discussing here purposely ignores these APIs and focuses on accelerating the rest.

In practice, however, these non-rectangular geometries account for just a tiny fraction of the typical rendering of a Web page, which allows me to effectively call them “exceptions”.

The approach I’m currently following assumes everything in a Web page is a rectangle, and all non-rectangular geometry is treated as exceptions and handled differently on shader code.

This means I no longer need to ignore the ordering problem since I always batch a rectangle for every single draw operation, and then render all rectangles in order. This introduces a dramatic change compared to the previous approach I discussed. Now I can (partially) implement this technique without changing the API of existing rasterizers. I say “partially” because to take full advantage of the performance gain, some API changes would be desired.

Drawing non-rectangular geometry using rectangles

So, how do we deal with these exceptions? Remember that we want to draw only with rectangles so that no operation could ever break our batch, if we want to take full advantage of the instanced rendering acceleration.

There are 3 ways of rendering non-rectangular geometry using rectangles:

  • 1. Using a geometry shader:

    This is the most elegant solution, and looks like it was designed for this case. But since it isn’t yet widely deployed, I will not make much emphasis on it here. But we need to follow its evolution closely.

  • 2. Degenerating rectangles:

    This is basically to turn a rectangle into a triangle by degenerating one of its vertices. Then, with a set of degenerated rectangles one could draw any arbitrary geometry as we do today with triangles.

  • 3. Drawing geometry in the fragment shader:

    This sounds like a bad idea, and it is definitely a bad idea! However, given the small and limited amount of cases that we need to consider, it can be feasible.

I’m currently experimenting with 3). You might ask why?, it looks like the worse option. The reason is that going for 2), degenerating rectangles, seems overkill at this point, lacking a deeper understanding of exactly what non-rectangle geometry we will ever need. Implementing a generic rectangle degeneration just for a few tiny set of cases would have been initially a bad choice and a waste of time.

So I decided to explore first the option of drawing these exceptions in the fragment shader and see how far I could go in terms of shader code complexity and performance (un)loss.

Next, I will show some examples of simple Web features rendered this way.

Experiments

The setup:

While my previous screen-casts were ran in my working laptop with a powerful Haswell GPU, one of my goals then was to focus on mobile devices. Hence, I started developing on an Arndale board I happen to have around. Details of the exact setup is out of the scope now, but I will just mention that the board is running a Linaro distribution with the official Mali T604 drivers by ARM.

My Arndale board

Following is a video I ensambled to show the different examples running on the Arndale board (and my laptop at the same time). This time I had to record using an external camera instead of screen-casting to avoid interference with the performance, so please bear with my camera-on-hand video recording skills.



This video file is also available on Vimeo.

I won’t talk about performance now, since I plan to cover that in future deliveries. Enough to be said that the performance is pretty good, comparable to my laptop in most of the examples. Also, there are a lot of simple known optimizations that I have not done because I’m focusing on validating the method first.

One important thing to note is that when drawing is done in a fragment shader, you cannot benefit from multi-sampling anti-aliasing (MSAA), since sampling occurs at an earlier stage. Hence, you have to implement anti-aliasing your self. In this case, I implemented a simple distance-to-edge linear anti-aliasing, and to my surprise, the end result is much better than the MSAA with 8 samples I was trying on my Haswell laptop before, and it is also faster.

On a related note, I have found out that MSAA does not give me much when rendering character glyphs (the majority of content) since they come already anti-aliased by FreeType2. And MSAA will slow down the rendering of the entire scene for every single frame.

I continue to dump the code from this research into a personal repository on GitHub. Go take a look if you are interested in the prototyping of these experiments.

Conclusions and next steps

There is one important conclusion coming out from these experiments: The fact that the rasterizer is stateless makes it very inefficient to modify a single element in a scene.

By stateless I mean they do not keep semantic information about the elements being drawn. For example, lets say I draw a rectangle in one frame, and in the next frame I want to draw the same rectangle somewhere else on the canvas. I already have a batch with all the elements of the scene happily stored in a vertex buffer object on GPU memory, and the rectangle in question is there somewhere. If I could keep the offset where that rectangle is in the batch, I could modify its attributes without having to drop and re-submit the entire buffer.

The solution: Moving to a scene graph. Web engines already implement a scene graph but at a higher level. Here I’m talking about a scene graph in the rasterizer itself, where nodes keep the offset of their attributes in the batch (layout, transformation, color, etc); and when you modify any of these attributes, only the deltas are uploaded to the GPU, rather than the whole batch.

I believe a scene graph approach has the potential to open a whole new set of opportunities for acceleration, specially for transitions and animations, and scrolling.

And that’s exciting!

Apart from this, I also want to:

  • Benchmark! set up a platform for reliable benchmarking and perf comparison with Skia/Cairo.
  • Take a subset of this technique and test it in Skia, behind current API.
  • Validate the case of drawing drop shadows and multi-step gradient backgrounds.
  • Test in other different OpenGL ES 3.0 implementations (and more devices!).

Let us not forget the fight we are fighting: Web applications must be as fast as native. I truly think we can do it.

A possibly faster approach to OpenGL rasterization of 2D Web content

Even thought it has been a while since my last entry on this blog, I have been quite busy. During most of last year I brought my modest contributions into an awesome startup that you have probably heard of by now, helping them integrate GNOME technologies into their products. I was lucky to join their team at an early stage of the development and participate in key discussions.

But more on that project on future entries.

Today I want to talk about things that keep me busy these days, and are of course related to Web engines. Specifically, I want to talk about 2D rasterization and the process of putting pixels on the screen as fast as possible (aka, the 60 frames-per-second holy grail). I want to discuss an idea that has the potential to significantly increase the performance of page rendering by utilizing modern GPU capabilities, OpenGL, and a bit of help from Web engines’ magic.

This is a technical article, so if you are not very familiar with 2D rasterization, OpenGL or how Web engines draw stuff, I recommended you to take some time off and read about it. It is truly a world of wonders (and sometimes pain).

Instanced rendering

The core of the idea is based on instanced rendering. It is a fairly well known technique introduced by OpenGL 3.1 and OpenGL-ES 3.0 as extension GL_EXT_draw_instanced.

To draw geometry with OpenGL, one normally submits a primitive to the rendering pipeline. The primitive consists of a collection of vertices, and a number of attributes per each vertex. Traditionally, you could only submit one primitive at a time.

With instanced rendering, it is possible to send several “instances” of the same primitive (the same collection of vertices and attributes) on a single call. This dramatically reduces the overhead of pipeline state changes and gives the GPU driver a better chance at optimizing rendering of instances of a particular geometry.

Hence, it is generally a common practice for OpenGL applications to group rendering of similar geometry into batches, and submit them to the pipeline all at once as instances. This technique is commonly known as batching and merging.

Skia, the 2D rasterizer used by the Chromium and Android projects, and Cairo, a popular 2D rasterizer backing many projects such as GNOME and previous versions of Mozilla Firefox; both to some extent have support for some sort of instanced rendering in their respective GL backends.

Telling instances apart

Ok, it is possible to draw a bunch of primitives at once, but how can we make them look different? A way of customizing individual instances is necessary, otherwise they will all render on top of the previous one. Not very useful.

There are two ways of submitting per-instance information: one is by adding a “divisor” to the buffers containing vertex or attribute information (VBOs), which will tell the pipeline to use the divided chunks as per-instance information instead of per-vertex. glVertexAttribDivisor is used in this case.

The other way is to upload the per-instance information to a buffer texture (or any texture for that matter) and fetch the information of the corresponding vertex by sampling, using a new variable gl_InstanceID available in shader code, as the key. This variable will increase for each instance of the geometry being rendered (as oppose to per vertex, for which you have gl_VertexID).

So, a quick recap so far. We are able to draw several instances of the same geometry at once, very efficiently, and are able to upload arbitrary data to customize each of these instances at will.

But wait, there are caveats.

The ordering problem

So, lets say we can now group together all drawing operations that involve the same primitive (rectangle, line, circle, etc). What happens if we draw (say) a filled rectangle, then a circle on top, and then another rectangle on top of the circle?

Following the simple grouping rule, what will happen is that the two rectangles will be grouped together and drawn first in one call, then the circle. This will not render the expected result, since the circle will end up laying on top of the rectangle that was drawn after it.

This problem is commonly known as “ordering”, and it clearly breaks our otherwise super-performing batching and merging case.

So, in scenes that involve lots of geometry overlapping, the grouping is limited to contents that do not overlap, if we wanted to preserve the right order of operations.

In practice, it means that we first need to separate the content in layers, then group the same primitives within a single layer, and finally submit the batches from each layer in the right order.

But guess what? Browser engines already do exactly that. Engines build a layer tree (among several other trees) with the information contained in the HTML and CSS content (layout, styling, transformations, etc), where the content is separated in render nodes whose content do not normally overlap. The actual process is much more complicated than that, but this simplification is enough to illustrate the idea.

Now, what if?

First, for the sake of presenting an idea, lets ignore the 2D context of a canvas element by now. More on that later. Lets focus on most of the web sites out there.

If we look at the number of primitives typically used by the rendering of a page, they boil down to a handful. Essentially, they are:

  • A rectangle: for almost all HTML elements, which are boxes. And character glyphs! which are normally rendered ahead of time and cached in a texture layout, then texture-mapped onto a rectangle. And images!, which are also texture-mapped onto rectangles.
  • A thin line: for thin (<=1 pixel) borders, inset/outset effect, hr, etc. Thicker borders can be drawn as thin rectangles.
  • A round corner: the quarter of a circle, filled or stroke, used to implement rounded rectangles (hello border-radius).
  • A circle: for bulleted listings, radio-buttons, etc. Argueably, these can be rendered using unicode characters, so no need for specific geometry.

Lets stay with these. There are other cases that I will purposely ignore, like one seen in a rounded rectangle with different thickness in two consecutive borders.

Then we have, for each of these primitives, an evolutionary-like variety of background styles (imaged, colored, repeated, gradient, etc); transformations (rotation, translation, scaling, etc); border styles (again imaged, colored, with different thickness, etc), shadow and blurring effects, and so on.

With a working texture cache, we have a potentially good chance at aggressively grouping together drawing of these primitives, like rectangles for example, for all text glyphs, boxes and images.

So, what if we could submit to a smart shader all the information that describes and tells apart these grouped instances? Is it possible to efficiently pack and then re-interpret in a shader all the styling and transformation complexities of today's CSS-styled HTML elements?

A new approach

Existing 2D rasterizers used in Web engines (at least Skia and Cairo, whose source code is available to me) are general purpose drawing libraries. That means they should render deterministically for any kind of application, not only Web engines. Specifically, they need to avoid the ordering problem explained above, where the result of a set of overlapped drawing operations is different if you change their order.

There are several reasons why modern Web engines use general purpose 2D rasterizers (as opposed to rasterizers written specifically for the needs of Web content rendering). One clear reason is that they existed before (in the case of Cairo at least) as a generic 2D graphics library, and was later used for Web rendering. Other reason is that the implementation of the Canvas 2D spec requires a general purpose 2D API, because that's what it is. And there is a clear benefit in reusing your beautifully optimized Canvas 2D implementation to draw the rest of the Web contents. Also, these libraries evolved from a pixmap (image) backed rendering target, into libraries exploiting the hardware-acceleration of GPU cards. Both libraries now feature an OpenGL(ES) backend that is somehow forced to comply with the previously existing API and behaviors.

But that is sub-optimal for Web engines that simply want to draw non-overlapping content into layers, then draw the layers in order. And even though batching and merging do occur in the GL backends today, it is apparently far from optimal as we will see later.


So, if we completely ignore the ordering problem for the case of Web engines drawing already layered nodes onto an OpenGL based render target, we might be able to aggressively group together potentially all the operations that share the same primitive.

This is of course if, as mentioned above, we are able to describe the particularities of each instance of these primitives, hand them down to a smart shader for rendering, then do all that efficiently so that the performance gained in batching is not lost by uploading tons of instance information to the GPU or running heavy shader code.

It is unclear (to me) whether this is at all possible. That's is why this approach is just an idea yet lacking validation for the real world. But it is a research that could potentially boost performance of Web content rendering.

It shares some similarities with (and was partially inspired by) the way Android does font rendering.

A proof of concept

So, I was set up to write a proof of concept trying to validate or discard the idea as quickly as possible. The purpose is to write the minimum code that would allow meaningful comparison between this approach and exiting rasterizers (Skia being my first target for this), for specific use cases that are relevant to generic Web content rendering (not Canvas 2D).

My proof of concept is being developed at: https://github.com/elima/glr

So far, it just provides a few primitives: rectangle and rounded corner, allowing for 3 basic drawing operations: rectangle (filled or stroked), rounded rectangle (only filled) and character glyph (not text, just single characters).

Then each element drawn can be transformed (scaled, rotated and/or translated), laid-out on the canvas (top, left, width and height), and has a color or texture.

Anti-aliasing is achieved by multisampling with 8 samples per pixel. Character glyphs are not anti-aliased, that was too complex to put in a proof of concept and it is a problem already solved by others anyway. I used the simplest possible path to put a pre-cached glyph on the screen, and for that wrote a super naive texture cache, and used FreeType2 for rasterizing the glyphs.
The idea of including chars was to explore if text glyphs, which accounts for most of typical Web page's content, could be batched together with all others drawings that use a rectangle primitive.

Note should be taken that this proof-of-concept is not intended to become a new library. It is just a vehicle to validate an idea by providing the minimum implementation needed to test its limits. Eventually, all this would have to be implemented in existing libraries. I just happen to be very fluent at glib and C :), as to prototype fast.

Comparisons

Before we jump into FPS excitements, lets clarify that any comparison here should be taken with a grain of salt. 2D rasterizers are complex libraries that do a lot of non-trivial things like anti-aliasing, sub-pixel alignment, color space conversion, adaptation to the specifics of the underlying hardware/driver/GL-version combos, to name just a few.

Thus, any comparison should be put in the context of what code paths are being selected, what rendering operations are being grouped, and when and why they aren't; how many GL operations are submitted to the pipeline to render the same scene, etc.

I have included 3 initial examples that try to illustrate how batching and merging of "compatible" draws (sharing the same underlying primitive) improves performance when ordering is ignored, while at the same time each element can have its own color, layout and transformation. For each example, I have written a Skia counterpart that tries to render exactly the same, to the extent possible, for the sake of comparing.

The data below corresponds to runs in my laptop, which is a Thinkpad T440p running Debian GNU/Linux, has an integrated Intel(tm) GPU (4th gen), and the OpenGL driver is provided by Mesa 10.2.1.

I used apitrace to look at what GL commands are actually sent to the driver.

Lets start with the RectsAndText example. It basically draws a lot of alternating filled rectangles and character glyphs, each with its own color, transformation and layout. In the screencast below, both examples (Skia and glr) are running at the same time. This of course does not reflect real performance since both compete for GPU resources, but I decided to record it this way because the improvement is much better noticed. The frames-per-second decrease proportionally for both examples when run at the same time, so it remains relevant for comparison.

The window in the left corresponds to the Skia example, and the right to glr. The same goes for all screencasts below.



This video file is also available for download.

The difference is considerable. Skia performs at an average of 6-7 FPS while the new approach gives around 40 FPS. That’s a 5x-6x improvement, not bad. Also, notice that CPU usage is considerably higher in the case of Skia.

The interesting thing here is that in the case of glr, all elements are batched together (both rectangles and chars), so only one drawing operation is actually submitted to the pipeline, as you can see in the available apitrace dump. A trace for the corresponding Skia example is also available.



apitrace output for RectAndText glr example



apitrace output for RectAndText Skia example

The next example is Rects, which is similar but renders only rectangles, alternating between filled and stroked. The interesting bit is that in the case of glr, each style of rectangles is drawn onto one different layer, each layer operating on its own separate thread; demonstrating that parallelism is now possible during batching.



This video file is available for download.

In this example, the performance difference is even higher. glr is around 8x faster. Again, apitrace traces for the glr example and the Skia version are available. This time glr submits a total of 2 instanced drawing operations, one for filled rects and one for stroked.



apitrace output for Rects glr example



apitrace output for Rects Skia example

The last example draws several layers of non-overlapping rounded rectangles. As with previous examples, every element is given a unique layout, color and transformation. This example tries to illustrate that because batching operates only at layer level, the more layers you have the less you benefit from this technique. In this particular example, the gap is reduced considerably. In fact it looks like Skia is faster by a few FPS, but it is actually not true. When both examples are run together, Skia is faster, but if run separately, glr example is faster (though not much). I’m still figuring this out.



This video file is available for download.

And the traces for the glr example and the Skia example.



apitrace output for Layers glr example



apitrace output for Layers Skia example


If you are curious about the implementation, take a look at where most of the magic happens: GlrContext, GlrCanvas and GlrBatch objects, and the vertex shader. The rest of the code is mostly API and glue to provide a coherent way to use this approach. Specifically, an abstract concept of "layer" is introduced. The workflow goes this way:

  • For initialization, a context, a rendering target and a canvas object are created. This is similar to how other 2D libraries work.
  • In the rendering loop and for each frame, the canvas is first notified that a new frame will be rendered.
  • Then any number of layer objects are created and attached to the canvas. The drawing API works against a layer (instead of a canvas), and will group and batch all the drawing operations in internal commands. When drawing to a layer finishes, the layer is notified that it is ready.
  • Finally, the canvas is requested to finish the frame, right before swapping buffers. This call will wait for all the attached layers to finish (blocking if needed). Once all complete, the canvas will take the batched commands from each layer, in the order they were attached, and submit them to the pipeline for rendering.

One thing to remark is that layers are self-contained stateful objects, and can survive frames without needing to redraw.

Other benefits

One by-product derived from the fact that layers cache drawing operations in internal commands (which in turn use locally allocated buffers), is that layers now become data-parallel. This is a term rarely used in the context of OpenGL because as you probably know, the way its API is designed makes it a giant state machine, making any parallelization unpractical.

With this approach, layers can be drawn in separated threads (or fully moved to OpenCL), which can bring extra performance if there are several complex layers that need drawing at the same time.

Another potential extra benefit comes from the fact that the canvas renders to a target that is actually a framebuffer backed up by a multisample texture. This means we can use any previously rendered frame as a texture, the same way it currently works in both Chromium and Webkit, where layers are texture-mapped then composited into the final scene.

So, we have the flexibility that, if a particular layer is too complex or slow to draw, we can attach it alone to a canvas, render it, and use the texture as with the current model. But, if we are short on texture memory, it is possible to keep commands batched in layers and render them on every frame. This is kind of similar to what Chromium does, recording draw operations into an SkPicture and then re-playing back when needed.

Future work

This is an approach that needs validation for a number of real world use cases before it can be even considered for testing on a Web engine. It is key to explore how complex information (for example, multi-step gradient backgrounds, or complex border styling with rounded rectangles) can be passed to the shaders and rendered correctly and efficiently. Also, there are shadows and blurring effects, all parametrized to cover the most creative use cases, that also need verification against this model.

Basically, we need to understand the limits of the approach by trying to implement modern W3C specs, selecting the most complex features first.

Other important priorities are:

  • Understand how much workload can be imposed on shaders side before the gained performance starts to degrade.
  • Test on OpenGL-ES and constrained GPU on embedded ARM, to detect the minimum requirements.
  • Figure out how to implement a mid-frame flushing mechanism when texture cache exhausts or command buffers get too large. This is not trivial, since to flush a layer (that is possibly running in a separate thread) it has to be blocked, then the canvas has to wait for all layers below it to finish and then execute their commands, then signal the blocked layer to continue.
  • Try how scrolling would behave if previously batched layers are drawn for every frame, instead of using current scrolling techniques that rely on rolling big textures, or moving several tiles up and down. These techniques impose either great pressure on texture memory, or a lot of complexity on tile management (or both), specially in the context of new super-high resolution screens.

Conclusions and final words

I have tried to detail an idea that although not new, I believe has not been explored in full in the context of Web engines. It relies on two essential hypothesis:

  • That it is possible to batch not only geometry, but the complex attributes of arbitrarily styled HTML elements, and render that geometry as instances using shader code.
  • It is safe to ignore ordering of draw operations during rasterization phase, and leverage on Web engine’s layer tree to solve overlapping.

Modern GPUs and OpenGL APIs have great potential for optimizing 2D rasterization, but as it happens most of the times, there is no one solution to fit all. Instead, each particular application and use case requires a different set of strategies and trade-offs for optimum performance.

This approach, even if valid for a sufficient number of use cases is unlikely to go faster than existing approaches for all tests cases. Even less replace these approaches. This is pretty clear in the case of canvas 2D for example, which will continue to require a general purpose rasterizer. But if there is a sufficient number of use cases that would benefit from this approach to some degree, then maintaining one code path that enables it will already be a win.

Finally, I want to thank Samsung SRA for partially sponsoring the time I dedicated to pursue this idea, and also Igalia and igalians which are always there to back me up and help me move forward.

Now, is there anyone interested in helping me explore this idea further?

FileTea, low friction anonymous file sharing


Let me present you FileTea, a project enabling anonymous file sharing for the Web. It is designed to be simple and easy to use, to run in (modern) browsers without additional plugins, and to avoid the hassle of user registration.

Works like this:

  • Mary wants to share a file with Jane. She opens a browser and navigates to a FileTea service.
  • Mary drag-and-drops the file (or files) she wants to send into the page and copies the short url generated for each file.
  • Mary sends the url to Jane by instant messaging, e-mail, SMS or just posts it somewhere.
  • Jane receives the url, opens it in her browser and downloads the file.

A reference deployment is running at http://filetea.me. It is alpha version and is kindly hosted by Igalia (thanks!). Please, feel free to try it (and provide feedback!).

FileTea is free software and released under the terms of the GNU AGPL license. That means that you can install it in your own server and host it yourself for your organization, your business or your friends. The source code is hosted at Gitorious.

We are not alone

There are similar services running out there like http://min.us, http://ge.tt, http://imm.io, and maybe others I’m not aware of. But FileTea is different in three important aspects:

  • FileTea is free software (including server side), meaning development is open to the community, and you can run your own instance and make it better.
  • FileTea does not store any file in the server. It just synchronizes and bridges an upload from the seeder with a download to the leacher.
  • FileTea sets no limit to the size of shared files.

Features, my friend

Currently, FileTea has the bare minimum features to allow file-sharing, but I have a long list of possibly cool stuff to add as I find free time to work on them. Some hot topics are:

  • Global and per-transfer bandwidth limitation (up and down).
  • Proper thumbnails for the shared files.
  • Bulk sharing: the ability to share a group of files under a single url and download a zip or tar ball with all files together.

If you have more ideas, I would be happy to hear them.

So that’s it, happy sharing!

The Web jumps into D-Bus

Lets start with the fun. The following demos show the EventDance’s D-Bus bridge in action:

Proxying org.freedesktop.Notifications

Directly from a Web page, we create a proxy to the standard freedesktop.org object /org/freedesktop/Notifications which is exported to the session bus running in my normal desktop environment. Then, we call the remote method Notify using a simple web form. This method opens a notification dialog showing a quick note to the desktop user. The proxy also watches the remote signal NotificationClosed telling that a notification dialog was closed.
During the demo we open different browsers to show the cross-browser nature of EventDance’s Web transport.

Your browser doesn’t seem to support HTML5 video tag but you can download the video directly.

Screencast 1: D-Bus proxy example
Acquiring a bus name and exporting an object

In the next screencast we acquire/release a name on the session bus, and export a simple object onto it. The object has one method Ask that receives a string as input argument, shows a confirmation dialog on the Web page, and return a boolean value depending on whether the user pressed “Ok” or “Cancel” to close the dialog. We use D-Feet to call the remote method on the browser that owns the name at the moment of the call.

Your browser doesn’t seem to support HTML5 video tag but you can download the video directly.

Screencast 2: D-Bus name owning example
The client-side source code of both demos is available at examples/common/dbus-bridge-proxy.html and examples/common/dbus-bridge-own-name.html respectively, inside EventDance repository. You can check them to realize how simple is the API to connect to and use the D-Bus connection.

The server-side code for both demos is the same, and is public at examples/dbus-bridge.c. There is also a JavaScript version at examples/js/dbusBridge.js and a python version at examples/python/dbus-bridge.py. To run JavaScript and python examples you need EventDance compiled with –enable-introspection configure flag active (the default), and GJS or pygobject respectively available in your system.

The D-Bus bridge

At the highest level, EventDance library provides a set of transports and IPC mechanisms. The D-Bus bridge is just one of these mechanisms, written specifically to work over the Web transport. Its purpose is to connect a Web page with a remote D-Bus daemon running on the Web server, using the EventDance’s Web transport (long-polling or websocket if available).

D-Bus uses a binary protocol, and the browser’s JavaScript context doesn’t (yet) support binary data. Thus, it was sensible to translate D-Bus messages to a browser-friendly protocol, in this case JSON. Using the API recently added into json-glib to integrate GVariant and JSON, the bridge translates messages back and forth from GDBus to JSON and vice-versa. Also, it tries to reduce message payload as much as possible, by caching the GDBus objects server-side, and sending only their references over the wire. This makes the protocol very simple and lightweight.

Cool, now what?

Hopefully you are already wondering about the many purposes this feature can serve.

While many people envision that the next desktop will be the browser, many more do use Web applications almost exclusively, already today. The traditional separation between Web and Desktop app development is blurring. Browsers have become powerful platforms for running complex applications, and this situation is speeding up with the broad and increasing adoption of HTML5 standards by major browsers.

On the other hand, we have D-Bus, a freedesktop.org standard that is at the core of almost every GNU/Linux system out there. It is the de-facto IPC mechanism on which your applications talk and share. D-Bus allows us to write a program in any language, and export its usefulness over a standard channel. Also allows us to write differentiated UIs (e.g, Qt vs. GTK+ vs. NCurses) to interface a common functionality. Yes, one bus to bind them all!

Joining these two pieces together is just the next logical step. A step towards bringing together the best of two contexts: the ubiquity of the Web and the inter-process collaborative nature of the Desktop.

We need to write applications that you can host and use securely and reliably not only in your computer, but anywhere in the Planet where you happen to have a browser plugged into the Net; whether it is your laptop, mobile phone, tablet or your neighbors’ PS3. We also need to encourage application developers to export the logic of their programs over D-Bus, to allow other platforms (like the Web!) to reuse it. Telepathy is a good example of such program.

Almost identical applications in terms of functionality are written for many FOSS environments like GNOME, KDE, MeeGo, Android, etc; yet many times only the user experience and the technologies used to build it are different. Although not always possible, there is room for a wider code-reusing culture if we come back to the original Unix philosophy:


Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

That universal interface has just been upgraded.