Über latest Media Source Extensions improvements in WebKit with GStreamer

In this post I am going to talk about the implementation of the Media Source Extensions (known as MSE) in the WebKit ports that use GStreamer. These ports are WebKitGTK+, WebKitEFL and WebKitForWayland, though only the latter has the latest work-in-progress implementation. Of course we hope to upstream WebKitForWayland soon and with it, this backend for MSE and the one for EME.

My colleague Enrique at Igalia wrote a post about this about a week ago. I recommend you read it before continuing with mine to understand the general picture and the some of the issues that I managed to fix on that implementation. Come on, go and read it, I’ll wait.

One of the challenges here is something a bit unnatural in the GStreamer world. We have to process the stream information and then make some metadata available to the JavaScript app before playing instead of just pushing everything to a playing pipeline and being happy. For this we created the AppendPipeline, which processes the data and extracts that information and keeps it under control for the playback later.

The idea of the our AppendPipeline is to put a data stream into it and get it processed at the other side. It has an appsrc, a demuxer (qtdemux currently) and an appsink to pick up the processed data. Something tricky of the spec is that when you append data into the SourceBuffer, that operation has to block it and prevent with errors any other append operation while the current is ongoing, and when it finishes, signal it. Our main issue with this is that the the appends can contain any amount of data from headers and buffers to only headers or just partial headers. Basically, the information can be partial.

First I’ll present again Enrique’s AppendPipeline internal state diagram:

First let me explain the easiest case, which is headers and buffers being appended. As soon as the process is triggered, we move from Not started to Ongoing, then as the headers are processed we get the pads at the demuxer and begin to receive buffers, which makes us move to Sampling. Then we have to detect that the operation has ended and move to Last sample and then again to Not started. If we have received only headers we will not move to Sampling cause we will not receive any buffers but we still have to detect this situation and be able to move to Data starve and then again to Not started.

Our first approach was using two different timeouts, one to detect that we should move from Ongoing to Data starve if we did not receive any buffer and another to move from Sampling to Last sample if we stopped receiving buffers. This solution worked but it was a bit racy and we tried to find a less error prone solution.

We tried then to use custom downstream events injected from the source and at the moment they were received at the sink we could move from Sampling to Last sample or if only headers were injected, the pads were created and we could move from Ongoing to Data starve. It took some time and several iterations to fine tune this but we managed to solve almost all cases but one, which was receiving only partial headers and no buffers.

If the demuxer received partial headers and no buffers it stalled and we were not receiving any pads or any event at the output so we could not tell when the append operation had ended. Tim-Philipp gave me the idea of using the need-data signal on the source that would be fired when the demuxer ran out of useful data. I realized then that the events were not needed anymore and that we could handle all with that signal.

The need-signal is fired sometimes when the pipeline is linked and also when the the demuxer finishes processing data, regardless the stream contains partial headers, complete headers or headers and buffers. It works perfectly once we are able to disregard that first signal we receive sometimes. To solve that we just ensure that at least one buffer left the appsrc with a pad probe so if we receive the signal before any buffer was detected at the probe, it shall be disregarded to consider that the append has finished. Otherwise, if we have seen already a buffer at the probe we can consider already than any need-data signal means that the processing has ended and we can tell the JavaScript app that the append process has ended.

Both need-data signal and probe information come in GStreamer internal threads so we could use mutexes to overcome any race conditions. We thought though that deferring the operations to the main thread through the pipeline bus was a better idea that would create less issues with race conditions or deadlocks.

To finish I prefer to give some good news about performance. We use mainly the YouTube conformance tests to ensure our implementation works and I can proudly say that these changes reduced the time of execution in half!

That’s all folks!

New media controls in WebKitGtk+ (reloaded)

In December we organized in A Coruña the WebKitGTK+ hackfest at the Igalia premises as usual and also as usual it was an awesome oportunity to meet the rest of the team. For more information about the progress done in the hackfest, you can have a look at KaL’s post.

As part of the hackfest I decided to take a task that it would take some time so that I could focus and I decided to go for rewriting once again the WebKitGTK+ multimedia controls. People who just read this post will wonder why I say again and the reason is that last year we completely redesigned the multimedia controls. This time I have not redesigned them (well, a bit) but rewritten them in JavaScript as the Apple guys had done before.

To get the job done, the first step was bundling the JavaScript code and activating the codepath to use those controls. I used the Apple controls as template so you can imagine that the first result was a non-working monster that at some point reminded to Safari multimedia controls. At that point I could do two things, forking or inheriting. I decided to go with inheritance because it keeps the spirit of WebKit (and almost all Free Software projects) of sharing as much code as possible and because forking later is easier than merging. Then step by step I kept redefining JavaScript methods and tweaking some stuff in the C++ and CSS code to create the current user experience that we had so far.

Some of the non-aesthetic changes are the following:

  • Focus rings are now managed from CSS instead of C++.
  • Tests got new fixes, rebaselines and more love.
  • CMake support for the new controls.
  • Load captions icon from theme.
  • Load and hide elements handled now with CSS (and JavaScript).

The captions icon problem was interesting because I found out that the one we were using was “user-invisible-symbolic” and it was hardcoded directly in the CSS code. I changed it to be loaded from the theme but it raised the issue of using the incorrect metaphor though the current icon looks nice for captions. I filed a GNOME bug (and another WebKit bug to follow this up) so that a new icon can be created for captions/subtitles with the correct metaphor.

And which are the controls aesthetic changes?

  • Show a very subtle gradient when the elements are focused or active to improve the accessibility support (which won’t be complete until bug 117857 is fixed).
  • Volume slider rolls up and down with a nice animation.
  • Some other elements are not shown when they are not needed.
  • Captions menu shows up with both click and mouse hover for coherence with the volume slider.
  • Captions menu is also animated the same way as the volume slider.
  • Captions menu was propertly centered.
  • Captions menu style was changed to make it more similar to the rest of the controls (fonts, margings…)
  • Volume slider shows below the media element when it is too close to the page top and it cannot be shown on it. This was a regression that I introduced with the first rewrite, happy to have it fixed now.

As I already said the aesthetic differences with the former C++ are not a big deal unless you compare them with the original controls:

Starting point

To appreciate the new controls I cannot just show a screenshot, because the nicest thing are the animations. Therefore a video is needed (and if you have WebKit compiled you can experience them yourself)):

Of course, I thank our hackfest sponsors as the it was possible because of them:

Igalia GNOME Foundation

New media controls in WebKitGtk+

So it looks like my patch for the rework of the WebKitGtk+ media controls was finally landed.

First I would like to thank Igalia for giving me some time to complete this task, which took some work and began at WebKitGtk+ hackfest some time ago with Žan Doberšek and Jon McCann.

Starting point was:

Starting point

As you can see the controls look like an old Gtk+ application without any theming. Jon suggested that we could began with mimicing Chromium controls as they look closer to any modern themed GNOME application and adapt them to use the GNOME symbolic icons and keep some other stuff like the volume bar, but of course making it look nicer.

What was done:

  • Adding the GNOME symbolic icon theme and a method to replace the normal stock icons, though we keep them as fallback.
  • Deep adaptation of Chromium CSS and C++ code to make it suit the GNOME requirements.
  • Some buttons fell off the design, like seeking backwards and forward.
  • Aligned the elements with the pixel ruler to make them as close to perfect as possible in all conditions (as some buttons are hidden in certain situations, like fullscreen, volume…).
  • Fixed a bug about the buffering ranges that was in trunk at that point, but was independent of the code I was cooking.
  • Removed as much of the C++ code as possible to deviate the drawing to CSS, which is more maintainable for design purposes. The only things that are still painted with C++ code are the slider tracks, which depend on parameters than cannot be specified in CSS, like the buffering ranges and the volume (which was not before, but I introduced for design coherence).
  • Removed the focus ring which was making the controls uglier.
  • Removed the dead code.
  • New baselines for the tests, including the pixel ones. Flagged also some tests that are (and will) not working in Chromium either.

I had a small issue with a Chromium guy landing a patch that forced me to change the display of some components from -webkit-box to -webkit-flex and of course, rebasing all related tests. This created a small delay in landing the patch, but it finally did as 143463.

And the result is the following:
New media controls

I don’t know about you guys, but I like it!

Painting video with GStreamer and Qt/QML or Gtk+ with overlay

As part of my work at Igalia I had to work with video and GStreamer for some years. I always used Gtk+ for that so when I needed to do things with Qt and QML, things were different. In my projects I always used pure GStreamer code instead of the Qt bindings for GStreamer because at the moment those bindings were not ready or reliable.

I know two ways of painting video:

  • Overlay way, with a window id and so on
  • Texture streaming

I might write later about texture streaming, but I will focus now on overlay.

Painting

The first way means that you need from your graphical toolkit a window id. That window id is asked by the video sink element in a very special moment and you need to provide it in that moment if you have not provided it before. For example, if you are using playbin2 and you already know the sink you want to use, just instantiate your sink and set the window id at that moment with gst_x_overlay_set_window_handle and set the sink to the playbin2 element by setting the video-sink property.

If you are not using playbin2 and for example you are using GStreamer Editing Services, you cannot use a property because currently there is no one and need to use a more complicated method. I already reported the bug with its patches and hope that they apply them as soon as possible to improve compatibility with playbin2 because the way it is now is a bit inconsistent with the rest of GStreamer code base.

Both Qt and Gtk have now client side windows, which means that your program window has only one X window and it is the toolkit that decides which widget is receiving the events. The main consequence is that if we just set the window id, GStreamer will use the whole window and will paint the video over the rest of our widgets (it does not matter if QML/Qt or Gtk+) and you’ll get very ugly effects. To solve that, you need to set the render rectangle, which are the coordinates (relative to the X whole X window) where you want to paint your video. You need to do that just after setting the window id with gst_x_overlay_set_render_rectangle.

If you do not set your window handle and your render rectangle before the pipeline begins to move, it will ask you about that with the prepare-xwindow-id GstMessage, but this message can happen inside the GStreamer threads and it cannot wait until the main loop runs, it needs the information at that very moment, so you need to connect to the synchronous bus handle. GStreamer has a good example at the GstXOverlay documentation about how to do that. To use the callback in C++, you need to declare a static method and pass this as user data parameter, then you can behave almost as having a normal object method. This is the most common solution used in the GNOME world and fits perfectly with the Qt framework too.

The code to get the window id and render rectangle in Gtk+ would be something like:

GdkWindow *gdk_window;
gdk_window = gtk_widget_get_window(your_widget);
/* as sink you can use GST_MESSAGE_SRC() if you are waiting
    for the prepare-xwindow-id message */
gst_x_overlay_set_xwindow_id(GST_X_OVERLAY(your_sink),
                             GDK_WINDOW_XID(gdk_window));
/* do your maths about your coordinates */
gst_x_overlay_set_window_handle(GST_X_OVERLAY(sink),
                                x, y, width, height);

In Qt, if you are using common widgets, you could use something like:

WId winId = QApplication::activeWindow()->effectiveWinId();
gst_x_overlay_set_xwindow_id(GST_X_OVERLAY(your_sink),
                             winId);
/* do your maths about your coordinates */
gst_x_overlay_set_window_handle(GST_X_OVERLAY(sink),
                                x, y, width, height);

If you are using a QGraphicsScene you would do something like:

/* to get the view you could do something like this
    (if you have only one or will to mess things up):
QGraphicsView *view = your_scene.views[0];
*/
gst_x_overlay_set_xwindow_id(GST_X_OVERLAY(your_sink),
                             view->viewport()->effectiveWinId());
/* do your maths about your coordinates */
gst_x_overlay_set_window_handle(GST_X_OVERLAY(sink),
                                x, y, width, height);

If you are using QML, you would have a very similar approach to the last snippet, because as you should have a QDeclarativeItem, it has a scene() that you can use, to have something like QGraphicsView *view = scene().views[0]; (of course, assuming that you have only one view, which is the most common case).

Overlaying stuff

Some times it is nice do put your controls on top of the video by covering part of the image. It would be like having the video as the background of a canvas where you draw some other widgets. Some GStreamer elements give you the possibility of doing a trick to do this, which is using a colorkey for your background and painting whatever you want on top of that as long as it does not include that colorkey. Some elements like xvimagesink or omapxvsink (used in the Nokia N9 and N950) have the colorkey property that you can read and set. If you are not planning to overlay anything, you can forget about this, but if you do, you need set a color key to the sink and use that color to paint the background of your widget and a good moment is also when setting the window handle:

g_object_set(sink, "autopaint-colorkey", FALSE,
             "colorkey", 0x080810, NULL);

Why do I unset the colorkey autopainting? Because I do not want GStreamer to mess my widget painting.

And more important: Why did I use 0x080810? Because it is a dark color, close to black, but it is not black. Pure black can be dangerous as it is commonly used in themes when painting widgets so you would be getting ugly artifacts. Some people recommend magenta (0xFF00FF) as it is supposedly a color that does not exist in nature (citation needed). I would not do it for several reasons:

  • You will need to synchronize your painting very well to avoid seeing the colorkey
  • If you respect aspect ratio you will see it for sure, because you (or the sink if it is automatic) paint the backgound and the sink draws the image by leaving some empty space.
  • It does not behave well with blendings, as you blend from your widget color to the background, which is the colorkey

Advice: do not mess with colorkey and omapxvsink. Though it is supposed to be writable, it is not and it always uses 0x080810.

Aspect ratio

There are two kind of people:

  • The ones that want to use all the pixels of their monitor/TVs and like damaging their brain with distorted images.
  • The ones that like to see a correctly dimensioned image with some bars giving you a better impression of what was recorded.

As you can guess I belong to the second group.

There are some sinks that do that automatically for you by setting the force-aspect-ratio property, like ximagesink and xvimagesink but there are other that does not and omapxvsink is an example. It is not a big problem but forces you to work a bit more when you select the render rectangle. For that you need to know the video size, which you cannot know until the pipeline is running, which forces to to hook to the GST_MESSAGE_ASYNC_DONE, or in the case of playbin2, you already have the video size when getting the prepare-xwindow-id message. An example to get the video size would be:

GstPad *pad;
GstCaps *caps;
GstStructure *structure;
int width, height;

pad = GST_BASE_SINK_PAD(sink);
caps = GST_PAD_CAPS(pad);
g_return_if_fail(caps && gst_caps_is_fixed(caps));

structure = gst_caps_get_structure(caps, 0);
gst_structure_get_int(structure, "width", &width);
gst_structure_get_int(structure, "height", &height);

/* some videos define a pixel aspect ratio, meaning that the
   video pixel could be like 2x1 copared to a squared pixed
   and we need to correct this */
if (gst_structure_has_field(structure, "pixel-aspect-ratio")) {
    int par_n, par_d;
    gst_structure_get_fraction(structure, "pixel-aspect-ratio",
                               &par_n, &par_d);
    width = width * par_n / par_d;
}

/* trick: some sinks perform better with multiple of 2 */
width &= ~1;
height &= ~1;