Drawing Web content with OpenGL (ES 3.0) instanced rendering

Posted by elima on September 08, 2014

This is a follow up article about my ongoing research on Web content rendering using aggressive batching and merging of draw operations, together with OpenGL (ES 3.0) instanced rendering.

In a previous post, I discussed how relying on the Web engine’s layer tree to figure out non-overlapping content (layers) of a Web page, would (theoretically) allow an OpenGL based rasterizer to ignore the order of the drawing operations. This would allow the rasterizer to group together drawing of similar geometry and submit them efficiently to the GPU using instanced rendering.

I also presented some basic examples and comparisons of this technique with Skia, a popular 2D rasterizer, giving some hints on how much we can accelerate rendering if the overhead of the OpenGL API calls is reduced by using the instanced rendering technique.

However, this idea remained to be validated for real cases and in real hardware, specially because of the complexity and pressure imposed on shader programs, which now become responsible for de-referencing the attributes of each batched geometry and render them correctly.

Also, there are potential API changes in the rasterizer that could make this technique impractical to implement in any existing Web engine without significant changes in the rendering process.

To try keep this article short and focused, today I want to talk only about my latest experiments rendering some fairly complex Web elements using this technique; and leave the discussion about performance to future entries.

Everything is a rectangle

As mentioned in my previous article, almost everything in a Web page can be rendered with a rectangle primitive.

Web pages are mostly character glyphs, which today’s rasterizers normally draw by texture mapping a pre-rendered image of the glyph onto a rectangular area. Then you have boxes, images, shadows, lines, etc; which can all be drawn with a rectangle with the correct layout, transformation and/or texturing.

Primitives that are not rectangles are mostly seen in the element’s border specification, where you have borders with radius, and different styles: double, dotted, grooved, etc. There is a rich set of primitives coming from the combination of features in the borders spec alone.

There is also the Canvas 2D and SVG APIs, which are created specifically for arbitrary 2D content. The technique I’m discussing here purposely ignores these APIs and focuses on accelerating the rest.

In practice, however, these non-rectangular geometries account for just a tiny fraction of the typical rendering of a Web page, which allows me to effectively call them “exceptions”.

The approach I’m currently following assumes everything in a Web page is a rectangle, and all non-rectangular geometry is treated as exceptions and handled differently on shader code.

This means I no longer need to ignore the ordering problem since I always batch a rectangle for every single draw operation, and then render all rectangles in order. This introduces a dramatic change compared to the previous approach I discussed. Now I can (partially) implement this technique without changing the API of existing rasterizers. I say “partially” because to take full advantage of the performance gain, some API changes would be desired.

Drawing non-rectangular geometry using rectangles

So, how do we deal with these exceptions? Remember that we want to draw only with rectangles so that no operation could ever break our batch, if we want to take full advantage of the instanced rendering acceleration.

There are 3 ways of rendering non-rectangular geometry using rectangles:

  • 1. Using a geometry shader:

    This is the most elegant solution, and looks like it was designed for this case. But since it isn’t yet widely deployed, I will not make much emphasis on it here. But we need to follow its evolution closely.

  • 2. Degenerating rectangles:

    This is basically to turn a rectangle into a triangle by degenerating one of its vertices. Then, with a set of degenerated rectangles one could draw any arbitrary geometry as we do today with triangles.

  • 3. Drawing geometry in the fragment shader:

    This sounds like a bad idea, and it is definitely a bad idea! However, given the small and limited amount of cases that we need to consider, it can be feasible.

I’m currently experimenting with 3). You might ask why?, it looks like the worse option. The reason is that going for 2), degenerating rectangles, seems overkill at this point, lacking a deeper understanding of exactly what non-rectangle geometry we will ever need. Implementing a generic rectangle degeneration just for a few tiny set of cases would have been initially a bad choice and a waste of time.

So I decided to explore first the option of drawing these exceptions in the fragment shader and see how far I could go in terms of shader code complexity and performance (un)loss.

Next, I will show some examples of simple Web features rendered this way.

Experiments

The setup:

While my previous screen-casts were ran in my working laptop with a powerful Haswell GPU, one of my goals then was to focus on mobile devices. Hence, I started developing on an Arndale board I happen to have around. Details of the exact setup is out of the scope now, but I will just mention that the board is running a Linaro distribution with the official Mali T604 drivers by ARM.

My Arndale board

Following is a video I ensambled to show the different examples running on the Arndale board (and my laptop at the same time). This time I had to record using an external camera instead of screen-casting to avoid interference with the performance, so please bear with my camera-on-hand video recording skills.



This video file is also available on Vimeo.

I won’t talk about performance now, since I plan to cover that in future deliveries. Enough to be said that the performance is pretty good, comparable to my laptop in most of the examples. Also, there are a lot of simple known optimizations that I have not done because I’m focusing on validating the method first.

One important thing to note is that when drawing is done in a fragment shader, you cannot benefit from multi-sampling anti-aliasing (MSAA), since sampling occurs at an earlier stage. Hence, you have to implement anti-aliasing your self. In this case, I implemented a simple distance-to-edge linear anti-aliasing, and to my surprise, the end result is much better than the MSAA with 8 samples I was trying on my Haswell laptop before, and it is also faster.

On a related note, I have found out that MSAA does not give me much when rendering character glyphs (the majority of content) since they come already anti-aliased by FreeType2. And MSAA will slow down the rendering of the entire scene for every single frame.

I continue to dump the code from this research into a personal repository on GitHub. Go take a look if you are interested in the prototyping of these experiments.

Conclusions and next steps

There is one important conclusion coming out from these experiments: The fact that the rasterizer is stateless makes it very inefficient to modify a single element in a scene.

By stateless I mean they do not keep semantic information about the elements being drawn. For example, lets say I draw a rectangle in one frame, and in the next frame I want to draw the same rectangle somewhere else on the canvas. I already have a batch with all the elements of the scene happily stored in a vertex buffer object on GPU memory, and the rectangle in question is there somewhere. If I could keep the offset where that rectangle is in the batch, I could modify its attributes without having to drop and re-submit the entire buffer.

The solution: Moving to a scene graph. Web engines already implement a scene graph but at a higher level. Here I’m talking about a scene graph in the rasterizer itself, where nodes keep the offset of their attributes in the batch (layout, transformation, color, etc); and when you modify any of these attributes, only the deltas are uploaded to the GPU, rather than the whole batch.

I believe a scene graph approach has the potential to open a whole new set of opportunities for acceleration, specially for transitions and animations, and scrolling.

And that’s exciting!

Apart from this, I also want to:

  • Benchmark! set up a platform for reliable benchmarking and perf comparison with Skia/Cairo.
  • Take a subset of this technique and test it in Skia, behind current API.
  • Validate the case of drawing drop shadows and multi-step gradient backgrounds.
  • Test in other different OpenGL ES 3.0 implementations (and more devices!).

Let us not forget the fight we are fighting: Web applications must be as fast as native. I truly think we can do it.

A possibly faster approach to OpenGL rasterization of 2D Web content

Posted by elima on July 04, 2014

Even thought it has been a while since my last entry on this blog, I have been quite busy. During most of last year I brought my modest contributions into an awesome startup that you have probably heard of by now, helping them integrate GNOME technologies into their products. I was lucky to join their team at an early stage of the development and participate in key discussions.

But more on that project on future entries.

Today I want to talk about things that keep me busy these days, and are of course related to Web engines. Specifically, I want to talk about 2D rasterization and the process of putting pixels on the screen as fast as possible (aka, the 60 frames-per-second holy grail). I want to discuss an idea that has the potential to significantly increase the performance of page rendering by utilizing modern GPU capabilities, OpenGL, and a bit of help from Web engines’ magic.

This is a technical article, so if you are not very familiar with 2D rasterization, OpenGL or how Web engines draw stuff, I recommended you to take some time off and read about it. It is truly a world of wonders (and sometimes pain).

Instanced rendering

The core of the idea is based on instanced rendering. It is a fairly well known technique introduced by OpenGL 3.1 and OpenGL-ES 3.0 as extension GL_EXT_draw_instanced.

To draw geometry with OpenGL, one normally submits a primitive to the rendering pipeline. The primitive consists of a collection of vertices, and a number of attributes per each vertex. Traditionally, you could only submit one primitive at a time.

With instanced rendering, it is possible to send several “instances” of the same primitive (the same collection of vertices and attributes) on a single call. This dramatically reduces the overhead of pipeline state changes and gives the GPU driver a better chance at optimizing rendering of instances of a particular geometry.

Hence, it is generally a common practice for OpenGL applications to group rendering of similar geometry into batches, and submit them to the pipeline all at once as instances. This technique is commonly known as batching and merging.

Skia, the 2D rasterizer used by the Chromium and Android projects, and Cairo, a popular 2D rasterizer backing many projects such as GNOME and previous versions of Mozilla Firefox; both to some extent have support for some sort of instanced rendering in their respective GL backends.

Telling instances apart

Ok, it is possible to draw a bunch of primitives at once, but how can we make them look different? A way of customizing individual instances is necessary, otherwise they will all render on top of the previous one. Not very useful.

There are two ways of submitting per-instance information: one is by adding a “divisor” to the buffers containing vertex or attribute information (VBOs), which will tell the pipeline to use the divided chunks as per-instance information instead of per-vertex. glVertexAttribDivisor is used in this case.

The other way is to upload the per-instance information to a buffer texture (or any texture for that matter) and fetch the information of the corresponding vertex by sampling, using a new variable gl_InstanceID available in shader code, as the key. This variable will increase for each instance of the geometry being rendered (as oppose to per vertex, for which you have gl_VertexID).

So, a quick recap so far. We are able to draw several instances of the same geometry at once, very efficiently, and are able to upload arbitrary data to customize each of these instances at will.

But wait, there are caveats.

The ordering problem

So, lets say we can now group together all drawing operations that involve the same primitive (rectangle, line, circle, etc). What happens if we draw (say) a filled rectangle, then a circle on top, and then another rectangle on top of the circle?

Following the simple grouping rule, what will happen is that the two rectangles will be grouped together and drawn first in one call, then the circle. This will not render the expected result, since the circle will end up laying on top of the rectangle that was drawn after it.

This problem is commonly known as “ordering”, and it clearly breaks our otherwise super-performing batching and merging case.

So, in scenes that involve lots of geometry overlapping, the grouping is limited to contents that do not overlap, if we wanted to preserve the right order of operations.

In practice, it means that we first need to separate the content in layers, then group the same primitives within a single layer, and finally submit the batches from each layer in the right order.

But guess what? Browser engines already do exactly that. Engines build a layer tree (among several other trees) with the information contained in the HTML and CSS content (layout, styling, transformations, etc), where the content is separated in render nodes whose content do not normally overlap. The actual process is much more complicated than that, but this simplification is enough to illustrate the idea.

Now, what if?

First, for the sake of presenting an idea, lets ignore the 2D context of a canvas element by now. More on that later. Lets focus on most of the web sites out there.

If we look at the number of primitives typically used by the rendering of a page, they boil down to a handful. Essentially, they are:

  • A rectangle: for almost all HTML elements, which are boxes. And character glyphs! which are normally rendered ahead of time and cached in a texture layout, then texture-mapped onto a rectangle. And images!, which are also texture-mapped onto rectangles.
  • A thin line: for thin (<=1 pixel) borders, inset/outset effect, hr, etc. Thicker borders can be drawn as thin rectangles.
  • A round corner: the quarter of a circle, filled or stroke, used to implement rounded rectangles (hello border-radius).
  • A circle: for bulleted listings, radio-buttons, etc. Argueably, these can be rendered using unicode characters, so no need for specific geometry.

Lets stay with these. There are other cases that I will purposely ignore, like one seen in a rounded rectangle with different thickness in two consecutive borders.

Then we have, for each of these primitives, an evolutionary-like variety of background styles (imaged, colored, repeated, gradient, etc); transformations (rotation, translation, scaling, etc); border styles (again imaged, colored, with different thickness, etc), shadow and blurring effects, and so on.

With a working texture cache, we have a potentially good chance at aggressively grouping together drawing of these primitives, like rectangles for example, for all text glyphs, boxes and images.

So, what if we could submit to a smart shader all the information that describes and tells apart these grouped instances? Is it possible to efficiently pack and then re-interpret in a shader all the styling and transformation complexities of today's CSS-styled HTML elements?

A new approach

Existing 2D rasterizers used in Web engines (at least Skia and Cairo, whose source code is available to me) are general purpose drawing libraries. That means they should render deterministically for any kind of application, not only Web engines. Specifically, they need to avoid the ordering problem explained above, where the result of a set of overlapped drawing operations is different if you change their order.

There are several reasons why modern Web engines use general purpose 2D rasterizers (as opposed to rasterizers written specifically for the needs of Web content rendering). One clear reason is that they existed before (in the case of Cairo at least) as a generic 2D graphics library, and was later used for Web rendering. Other reason is that the implementation of the Canvas 2D spec requires a general purpose 2D API, because that's what it is. And there is a clear benefit in reusing your beautifully optimized Canvas 2D implementation to draw the rest of the Web contents. Also, these libraries evolved from a pixmap (image) backed rendering target, into libraries exploiting the hardware-acceleration of GPU cards. Both libraries now feature an OpenGL(ES) backend that is somehow forced to comply with the previously existing API and behaviors.

But that is sub-optimal for Web engines that simply want to draw non-overlapping content into layers, then draw the layers in order. And even though batching and merging do occur in the GL backends today, it is apparently far from optimal as we will see later.


So, if we completely ignore the ordering problem for the case of Web engines drawing already layered nodes onto an OpenGL based render target, we might be able to aggressively group together potentially all the operations that share the same primitive.

This is of course if, as mentioned above, we are able to describe the particularities of each instance of these primitives, hand them down to a smart shader for rendering, then do all that efficiently so that the performance gained in batching is not lost by uploading tons of instance information to the GPU or running heavy shader code.

It is unclear (to me) whether this is at all possible. That's is why this approach is just an idea yet lacking validation for the real world. But it is a research that could potentially boost performance of Web content rendering.

It shares some similarities with (and was partially inspired by) the way Android does font rendering.

A proof of concept

So, I was set up to write a proof of concept trying to validate or discard the idea as quickly as possible. The purpose is to write the minimum code that would allow meaningful comparison between this approach and exiting rasterizers (Skia being my first target for this), for specific use cases that are relevant to generic Web content rendering (not Canvas 2D).

My proof of concept is being developed at: https://github.com/elima/glr

So far, it just provides a few primitives: rectangle and rounded corner, allowing for 3 basic drawing operations: rectangle (filled or stroked), rounded rectangle (only filled) and character glyph (not text, just single characters).

Then each element drawn can be transformed (scaled, rotated and/or translated), laid-out on the canvas (top, left, width and height), and has a color or texture.

Anti-aliasing is achieved by multisampling with 8 samples per pixel. Character glyphs are not anti-aliased, that was too complex to put in a proof of concept and it is a problem already solved by others anyway. I used the simplest possible path to put a pre-cached glyph on the screen, and for that wrote a super naive texture cache, and used FreeType2 for rasterizing the glyphs.
The idea of including chars was to explore if text glyphs, which accounts for most of typical Web page's content, could be batched together with all others drawings that use a rectangle primitive.

Note should be taken that this proof-of-concept is not intended to become a new library. It is just a vehicle to validate an idea by providing the minimum implementation needed to test its limits. Eventually, all this would have to be implemented in existing libraries. I just happen to be very fluent at glib and C :), as to prototype fast.

Comparisons

Before we jump into FPS excitements, lets clarify that any comparison here should be taken with a grain of salt. 2D rasterizers are complex libraries that do a lot of non-trivial things like anti-aliasing, sub-pixel alignment, color space conversion, adaptation to the specifics of the underlying hardware/driver/GL-version combos, to name just a few.

Thus, any comparison should be put in the context of what code paths are being selected, what rendering operations are being grouped, and when and why they aren't; how many GL operations are submitted to the pipeline to render the same scene, etc.

I have included 3 initial examples that try to illustrate how batching and merging of "compatible" draws (sharing the same underlying primitive) improves performance when ordering is ignored, while at the same time each element can have its own color, layout and transformation. For each example, I have written a Skia counterpart that tries to render exactly the same, to the extent possible, for the sake of comparing.

The data below corresponds to runs in my laptop, which is a Thinkpad T440p running Debian GNU/Linux, has an integrated Intel(tm) GPU (4th gen), and the OpenGL driver is provided by Mesa 10.2.1.

I used apitrace to look at what GL commands are actually sent to the driver.

Lets start with the RectsAndText example. It basically draws a lot of alternating filled rectangles and character glyphs, each with its own color, transformation and layout. In the screencast below, both examples (Skia and glr) are running at the same time. This of course does not reflect real performance since both compete for GPU resources, but I decided to record it this way because the improvement is much better noticed. The frames-per-second decrease proportionally for both examples when run at the same time, so it remains relevant for comparison.

The window in the left corresponds to the Skia example, and the right to glr. The same goes for all screencasts below.



This video file is also available for download.

The difference is considerable. Skia performs at an average of 6-7 FPS while the new approach gives around 40 FPS. That’s a 5x-6x improvement, not bad. Also, notice that CPU usage is considerably higher in the case of Skia.

The interesting thing here is that in the case of glr, all elements are batched together (both rectangles and chars), so only one drawing operation is actually submitted to the pipeline, as you can see in the available apitrace dump. A trace for the corresponding Skia example is also available.



apitrace output for RectAndText glr example



apitrace output for RectAndText Skia example

The next example is Rects, which is similar but renders only rectangles, alternating between filled and stroked. The interesting bit is that in the case of glr, each style of rectangles is drawn onto one different layer, each layer operating on its own separate thread; demonstrating that parallelism is now possible during batching.



This video file is available for download.

In this example, the performance difference is even higher. glr is around 8x faster. Again, apitrace traces for the glr example and the Skia version are available. This time glr submits a total of 2 instanced drawing operations, one for filled rects and one for stroked.



apitrace output for Rects glr example



apitrace output for Rects Skia example

The last example draws several layers of non-overlapping rounded rectangles. As with previous examples, every element is given a unique layout, color and transformation. This example tries to illustrate that because batching operates only at layer level, the more layers you have the less you benefit from this technique. In this particular example, the gap is reduced considerably. In fact it looks like Skia is faster by a few FPS, but it is actually not true. When both examples are run together, Skia is faster, but if run separately, glr example is faster (though not much). I’m still figuring this out.



This video file is available for download.

And the traces for the glr example and the Skia example.



apitrace output for Layers glr example



apitrace output for Layers Skia example


If you are curious about the implementation, take a look at where most of the magic happens: GlrContext, GlrCanvas and GlrBatch objects, and the vertex shader. The rest of the code is mostly API and glue to provide a coherent way to use this approach. Specifically, an abstract concept of "layer" is introduced. The workflow goes this way:

  • For initialization, a context, a rendering target and a canvas object are created. This is similar to how other 2D libraries work.
  • In the rendering loop and for each frame, the canvas is first notified that a new frame will be rendered.
  • Then any number of layer objects are created and attached to the canvas. The drawing API works against a layer (instead of a canvas), and will group and batch all the drawing operations in internal commands. When drawing to a layer finishes, the layer is notified that it is ready.
  • Finally, the canvas is requested to finish the frame, right before swapping buffers. This call will wait for all the attached layers to finish (blocking if needed). Once all complete, the canvas will take the batched commands from each layer, in the order they were attached, and submit them to the pipeline for rendering.

One thing to remark is that layers are self-contained stateful objects, and can survive frames without needing to redraw.

Other benefits

One by-product derived from the fact that layers cache drawing operations in internal commands (which in turn use locally allocated buffers), is that layers now become data-parallel. This is a term rarely used in the context of OpenGL because as you probably know, the way its API is designed makes it a giant state machine, making any parallelization unpractical.

With this approach, layers can be drawn in separated threads (or fully moved to OpenCL), which can bring extra performance if there are several complex layers that need drawing at the same time.

Another potential extra benefit comes from the fact that the canvas renders to a target that is actually a framebuffer backed up by a multisample texture. This means we can use any previously rendered frame as a texture, the same way it currently works in both Chromium and Webkit, where layers are texture-mapped then composited into the final scene.

So, we have the flexibility that, if a particular layer is too complex or slow to draw, we can attach it alone to a canvas, render it, and use the texture as with the current model. But, if we are short on texture memory, it is possible to keep commands batched in layers and render them on every frame. This is kind of similar to what Chromium does, recording draw operations into an SkPicture and then re-playing back when needed.

Future work

This is an approach that needs validation for a number of real world use cases before it can be even considered for testing on a Web engine. It is key to explore how complex information (for example, multi-step gradient backgrounds, or complex border styling with rounded rectangles) can be passed to the shaders and rendered correctly and efficiently. Also, there are shadows and blurring effects, all parametrized to cover the most creative use cases, that also need verification against this model.

Basically, we need to understand the limits of the approach by trying to implement modern W3C specs, selecting the most complex features first.

Other important priorities are:

  • Understand how much workload can be imposed on shaders side before the gained performance starts to degrade.
  • Test on OpenGL-ES and constrained GPU on embedded ARM, to detect the minimum requirements.
  • Figure out how to implement a mid-frame flushing mechanism when texture cache exhausts or command buffers get too large. This is not trivial, since to flush a layer (that is possibly running in a separate thread) it has to be blocked, then the canvas has to wait for all layers below it to finish and then execute their commands, then signal the blocked layer to continue.
  • Try how scrolling would behave if previously batched layers are drawn for every frame, instead of using current scrolling techniques that rely on rolling big textures, or moving several tiles up and down. These techniques impose either great pressure on texture memory, or a lot of complexity on tile management (or both), specially in the context of new super-high resolution screens.

Conclusions and final words

I have tried to detail an idea that although not new, I believe has not been explored in full in the context of Web engines. It relies on two essential hypothesis:

  • That it is possible to batch not only geometry, but the complex attributes of arbitrarily styled HTML elements, and render that geometry as instances using shader code.
  • It is safe to ignore ordering of draw operations during rasterization phase, and leverage on Web engine’s layer tree to solve overlapping.

Modern GPUs and OpenGL APIs have great potential for optimizing 2D rasterization, but as it happens most of the times, there is no one solution to fit all. Instead, each particular application and use case requires a different set of strategies and trade-offs for optimum performance.

This approach, even if valid for a sufficient number of use cases is unlikely to go faster than existing approaches for all tests cases. Even less replace these approaches. This is pretty clear in the case of canvas 2D for example, which will continue to require a general purpose rasterizer. But if there is a sufficient number of use cases that would benefit from this approach to some degree, then maintaining one code path that enables it will already be a win.

Finally, I want to thank Samsung SRA for partially sponsoring the time I dedicated to pursue this idea, and also Igalia and igalians which are always there to back me up and help me move forward.

Now, is there anyone interested in helping me explore this idea further?

Introducing gocl, a gobject wrapper to OpenCL

Posted by elima on May 06, 2013

For the past few months I have been working on this project to bring OpenCL closer to GNOME technologies, and today I’m glad to make the first public announcement. For the uninformed reader, OpenCL is a framework and language for writing programs that execute across heterogeneous HW pieces like CPUs, GPUs, DSPs, etc. While not applicable to any piece of software, OpenCL can unleash unparalleled performance and power efficiency on specific heavy algorithms like media decoding, cryptography, computer vision, big data indexing and processing, physics simulation, graphics, image compositing, among others.

Gocl is a GLib/GObject based library that aims at simplifying the use of OpenCL in GNOME software. It is intended to be a lightweight wrapper that adapts OpenCL programming patterns and boilerplate, and expose a simpler API that is known and comfortable to GNOME developers. Examples of such adaptations are the integration with GLib’s main loop, exposing non-blocking APIs, GError based error reporting and full gobject-introspection support. It will also be including convenient API to simplify code for the most common use patterns.

Gocl started as part of the work and research we do at Igalia on HW acceleration, that I decided to take a bit of, clean it up and release it in a way that can be useful to others. OpenCL is gaining relevance and popularity since the number of implementations and supported chips have grown significantly in recent years. Soon we are going to see OpenCL running anywhere and GNOME technologies should be ready to take advantage of it.

Full gtk-doc documentation is available, and source code is hosted at my GitHub account.

The API is very simple and limited at this stage, and should be considered very unstable. Although I’m not currently working on it full time, I do have kind of a roadmap for the API and features that I will prioritize:

  • Completing the missing asynchronous API
  • Adding API to query available OpencL extensions
  • Provide API to expose cl_khr_gl_sharing extension, for object sharing with OpenGL

You are welcome to suggest/request features that you would like to see in Gocl, as well as propose changes on the API. The GitHub issue tracking at project’s page is available for that, and also to report bugs.

So, do you know of a specific piece of software in GNOME that could potentially benefit from OpenCL? I would love to hear about it.

At Igalia, as part of our strong commitment to make the Web better and faster, we are already looking into ways of applying OpenCL to WebKit and its related technologies, and I’m personally interested on that line of work.

SHA-512 hashing support in glib

Posted by elima on November 21, 2012

Always feels good to close old bugs, even if done unintentionally. In one of the projects I’m working on, I ran into the lack of SHA-512 support in glib and decided to step in. It turned out that such support was requested in a bug reported 3 years ago.

Whatever the reasons, the good news is that the next release of glib will ship with support for SHA-512 hashing in GChecksum. The implementation strictly follows the FIPS-180-2 standard.

Thanks to Emmanuele Bassi for reviewing my patch, Julian Andres Klode for merging it, and Igalia for sponsoring my dedication.

Going to JSConf.eu 2011

Posted by elima on September 27, 2011

Back in 2009 I had the chance to attend the european edition of the Javascript Conference for the first time. It was a nice and intense learning experience (it runs only for two fully packed days). This event is the counterpart of the US edition however it gathers a wide and heterogeneous Javascript community from around the world and not just Europe.

And guess what, after missing my ticket last year, I’m attending again this year’s edition sponsored by igalia. I will be giving a talk titled “Javascript, the GNOME way” in which I will discuss the relationship between GNOME 3 and Javascript, the technologies behind it and how to get started writing JS in the “GNOME way”. My goals with this presentation are 1) to communicate to the wider Javascript audience about the awesomeness of using the GNOME libraries in JS and 2) to try bridging the two communities in subjects that matters to both, to ultimately foster collaboration and alignment between them.

I’m sure this year edition will be as cool as the others, and I look forward to absorb again all that knowledge, ideas, enthusiasm and yeah, the Berlin’s autumn breeze too.

See you there.

FileTea, low friction anonymous file sharing

Posted by elima on September 01, 2011


Let me present you FileTea, a project enabling anonymous file sharing for the Web. It is designed to be simple and easy to use, to run in (modern) browsers without additional plugins, and to avoid the hassle of user registration.

Works like this:

  • Mary wants to share a file with Jane. She opens a browser and navigates to a FileTea service.
  • Mary drag-and-drops the file (or files) she wants to send into the page and copies the short url generated for each file.
  • Mary sends the url to Jane by instant messaging, e-mail, SMS or just posts it somewhere.
  • Jane receives the url, opens it in her browser and downloads the file.

A reference deployment is running at http://filetea.me. It is alpha version and is kindly hosted by Igalia (thanks!). Please, feel free to try it (and provide feedback!).

FileTea is free software and released under the terms of the GNU AGPL license. That means that you can install it in your own server and host it yourself for your organization, your business or your friends. The source code is hosted at Gitorious.

We are not alone

There are similar services running out there like http://min.us, http://ge.tt, http://imm.io, and maybe others I’m not aware of. But FileTea is different in three important aspects:

  • FileTea is free software (including server side), meaning development is open to the community, and you can run your own instance and make it better.
  • FileTea does not store any file in the server. It just synchronizes and bridges an upload from the seeder with a download to the leacher.
  • FileTea sets no limit to the size of shared files.

Features, my friend

Currently, FileTea has the bare minimum features to allow file-sharing, but I have a long list of possibly cool stuff to add as I find free time to work on them. Some hot topics are:

  • Global and per-transfer bandwidth limitation (up and down).
  • Proper thumbnails for the shared files.
  • Bulk sharing: the ability to share a group of files under a single url and download a zip or tar ball with all files together.

If you have more ideas, I would be happy to hear them.

So that’s it, happy sharing!

See you at Desktop Summit 2011

Posted by elima on August 01, 2011

Although my talk “The Web jumps into D-Bus” was not accepted for this year edition of the Desktop Summit, I’m still attending the event thank to my employer Igalia which apart from sponsoring the conference, is sponsoring my attendance together with a bunch of other igalians as well. Like two years ago, this edition is special because we have the GUADEC and Akademy events co-located under the free Desktop Summit umbrella. This is a great opportunity for communication, coordination, sharing and synergy among the different free desktop environments, their projects, hackers and communities.

As usual, I will be hanging around closer to where GNOME folks gather, but this year that could not always be true since I have a special interest on Freedesktop.org and Web technologies, and that means I might be seen anywhere around ;) . Apart from that, of course I will be hopping on the security and introspection-related sessions as usual; and I also want to get closer to the GNU community that usually gathers during these days. I hope there will be interesting discussions around the Freedombox project.

So, if you are in Berlin next week and want to chat about EventDance, the free desktop, free software, technology or life in general, find me there and be welcomed.

EventDance 0.1.4 released

Posted by elima on May 02, 2011

During Easter holidays, I finally managed to find time to close EventDance 0.1.3 development cycle and release 0.1.4. This milestone took more than expected for several reasons, mainly due to some last minute API changes I had to introduce and a couple of features I couldn’t resist to implement earlier. The result is a long changelog that I will try to summarize:

  • Basic API for asymmetric (public-key) cryptography

    EvdPkiPubkey and EvdPkiPrivkey classes provide abstraction for PKI public and private key respectively. They basically are asynchronous, GIO-friendly wrappers for libgcrypt PK functions. There is also API for asynchronous key-pair generation. By now, only encryption/decryption using RSA algorithm is supported.

  • Basic API for symmetric cryptography

    EvdTlsCipher also provides an asynchronous, GIO-friendly wrapper for libcrypt symmetric crypto API, adding some nice features like data auto-padding and key aligning built right in. Not all algorithms supported by libgcrypt are available but only the most popular (e.g, AES 128/192/256, ARCFOUR).

  • SNI and lazy certificate selection for TLS credentials

    Server Name Indication is a SSL/TLS extension that permits a client to request the domain name before the certificate is committed to the server. This feature is available in GnuTLS and is now exported to EvdTlsSession. Also, EvdTlsCredentials added a callback to select the certificate to send to the peer during the TLS handshake. The combination of these two features is critical to implement an SSL/TLS capable reverse Web proxy. I’m seriously considering to include one such proxy inside EventDance, that would export a D-Bus API over the system bus to allow applications to easily add/remove virtual hosts and server backends on-the-fly.

  • Websockets mechanism into EvdWebTransport

    Now the web transport negotiates mechanism with the browser during handshake and uses websockets if supported, otherwise falls back to long-polling. Only version 76 (hybi-00) of the spec is implemented so far.

  • EvdDBusBridge

    A component to connect a web page to a D-Bus message bus running in the server, allowing client-side Web applications to proxy/export objects and acquire bus names. Check my previous post introducing this feature for details

  • EvdJsonrpc

    An asynchronous, GIO-friendly implementation of the JSON-RPC protocol version 1.0, specifically designed to work well with EventDance transports.

  • EvdDaemon

    An abstraction for any program that runs as a service daemon. The purpose is that if you are implementing a daemon, you just use an EvdDaemon instance and automagically get an event-loop (GMainLoop), pid-file management, syslog-based logging, daemonizing (console detaching) and clean program termination. The pid-file and syslog functionalities are still on the way though.

  • EvdDBusDaemon

    A component that launches a custom D-Bus message bus and tracks its execution. This is useful when an application needs to use a custom message bus instead of the well-known ones; for security or sandboxing reasons.

Also, as usual, lots of bugfixes and random improvements. A dependency on json-glib was added too.

Now and for the next weeks, I’m running a documentation and annotations sprint, something I have delayed too much already. I will also write a couple of basic tutorials on how to build and use EventDance. Stay tuned.

The Web jumps into D-Bus

Posted by elima on March 19, 2011

Lets start with the fun. The following demos show the EventDance’s D-Bus bridge in action:

Proxying org.freedesktop.Notifications

Directly from a Web page, we create a proxy to the standard freedesktop.org object /org/freedesktop/Notifications which is exported to the session bus running in my normal desktop environment. Then, we call the remote method Notify using a simple web form. This method opens a notification dialog showing a quick note to the desktop user. The proxy also watches the remote signal NotificationClosed telling that a notification dialog was closed.
During the demo we open different browsers to show the cross-browser nature of EventDance’s Web transport.

Your browser doesn’t seem to support HTML5 video tag but you can download the video directly.

Screencast 1: D-Bus proxy example
Acquiring a bus name and exporting an object

In the next screencast we acquire/release a name on the session bus, and export a simple object onto it. The object has one method Ask that receives a string as input argument, shows a confirmation dialog on the Web page, and return a boolean value depending on whether the user pressed “Ok” or “Cancel” to close the dialog. We use D-Feet to call the remote method on the browser that owns the name at the moment of the call.

Your browser doesn’t seem to support HTML5 video tag but you can download the video directly.

Screencast 2: D-Bus name owning example
The client-side source code of both demos is available at examples/common/dbus-bridge-proxy.html and examples/common/dbus-bridge-own-name.html respectively, inside EventDance repository. You can check them to realize how simple is the API to connect to and use the D-Bus connection.

The server-side code for both demos is the same, and is public at examples/dbus-bridge.c. There is also a JavaScript version at examples/js/dbusBridge.js and a python version at examples/python/dbus-bridge.py. To run JavaScript and python examples you need EventDance compiled with –enable-introspection configure flag active (the default), and GJS or pygobject respectively available in your system.

The D-Bus bridge

At the highest level, EventDance library provides a set of transports and IPC mechanisms. The D-Bus bridge is just one of these mechanisms, written specifically to work over the Web transport. Its purpose is to connect a Web page with a remote D-Bus daemon running on the Web server, using the EventDance’s Web transport (long-polling or websocket if available).

D-Bus uses a binary protocol, and the browser’s JavaScript context doesn’t (yet) support binary data. Thus, it was sensible to translate D-Bus messages to a browser-friendly protocol, in this case JSON. Using the API recently added into json-glib to integrate GVariant and JSON, the bridge translates messages back and forth from GDBus to JSON and vice-versa. Also, it tries to reduce message payload as much as possible, by caching the GDBus objects server-side, and sending only their references over the wire. This makes the protocol very simple and lightweight.

Cool, now what?

Hopefully you are already wondering about the many purposes this feature can serve.

While many people envision that the next desktop will be the browser, many more do use Web applications almost exclusively, already today. The traditional separation between Web and Desktop app development is blurring. Browsers have become powerful platforms for running complex applications, and this situation is speeding up with the broad and increasing adoption of HTML5 standards by major browsers.

On the other hand, we have D-Bus, a freedesktop.org standard that is at the core of almost every GNU/Linux system out there. It is the de-facto IPC mechanism on which your applications talk and share. D-Bus allows us to write a program in any language, and export its usefulness over a standard channel. Also allows us to write differentiated UIs (e.g, Qt vs. GTK+ vs. NCurses) to interface a common functionality. Yes, one bus to bind them all!

Joining these two pieces together is just the next logical step. A step towards bringing together the best of two contexts: the ubiquity of the Web and the inter-process collaborative nature of the Desktop.

We need to write applications that you can host and use securely and reliably not only in your computer, but anywhere in the Planet where you happen to have a browser plugged into the Net; whether it is your laptop, mobile phone, tablet or your neighbors’ PS3. We also need to encourage application developers to export the logic of their programs over D-Bus, to allow other platforms (like the Web!) to reuse it. Telepathy is a good example of such program.

Almost identical applications in terms of functionality are written for many FOSS environments like GNOME, KDE, MeeGo, Android, etc; yet many times only the user experience and the technologies used to build it are different. Although not always possible, there is room for a wider code-reusing culture if we come back to the original Unix philosophy:


Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

That universal interface has just been upgraded.

JSON and GVariant

Posted by elima on January 31, 2011

Last week, my patch to provide GVariant integration into json-glib was merged. I filed a bug for it back in October, and after some patching iterations it finally got its way in.

This mean that now you can obtain a JSON node tree from a GVariant value and the opposite, with a single API call. I want to thank Emmanuele for reviewing my patches promptly (they were kind of lengthy), and for the positive feedback. The new API will be available in next release 0.14.

Why do we need that, anyway?

This integration seems quite natural if you give it a thought:

  • json-glib and GVariant are both general purpose data structure holders,
  • with serializing/deserializing capabilities
  • and both are GLib based.

My motivation to hack on that comes from my work in EventDance. I’m trying to bring D-Bus APIs into a Web page so that it is possible to connect to any server-side message bus from a page’s script and talk to the exported objects, export JavaScript objects to the bus so that they can be consumed by server applications, and own bus names. This feature and the rationale behind its usefulness goes beyond the reach of this post. I will devote another entry just to talk about that and show some cool demos.

So lets focus back on the matter: As I’m using the GIO’s GDBus API for the server-side, and GDBus uses GVariant for the data packing, I needed a way to convert JavaScript data structures to GVariant and vice-versa, so that the arguments in the JavaScript context can be constructed naturally using JS native objects and arrays, and passed seamlessly to the remote APIs without the need to worry about GVariant or whatever.

Also, GVariant stores data in a binary format and there are contexts unable to handle that (like JavaScript and other scripting languages). With this API, programmers can convert their GVariants to JSON, have them processed, and then converted back to GVariant.

How does the conversion work?

Well, as you probably know, GVariant features a data-type set that is quite richer than JSON. Thus, conversion from JSON to GVariant is an ambiguous operation. To solve this, I included an optional type signature as an argument in the API, so that when a JSON tree is converted to GVariant, the signature, if provided, is used to disambiguate the data types. This mean that if a signature is provided, the resulting GVariant is guaranteed to comply with it.

GVariant * json_gvariant_deserialize (JsonNode     *json_node,
                                                         const gchar  *signature,
                                                         GError         **error);

If the signature is not provided, the conversion can still be done, and a fixed mapping is used. For detailed information on how to use the new API, you can check my build of the json-glib documentation.

I hope other people find this useful.