Serious Encrypted Media Extensions on GStreamer based WebKit ports

Encrypted Media Extensions (a.k.a. EME) is the W3C standard for encrypted media in the web. This way, media providers such as Hulu, Netflix, HBO, Disney+, Prime Video, etc. can provide their contents with a reasonable amount of confidence that it will make it very complicated for people to “save” their assets without their permission. Why do I use the word “serious” in the title? In WebKit there is already support for Clear Key, which is the W3C EME reference implementation but EME supports more encryption systems, even privative ones (I have my opinion about this, you can ask me privately). No service provider (that I know) supports Clear Key, they usually rely on Widevine, PlayReady or some other.

Three years ago, my colleague Žan Doberšek finished the implementation of what was going to be the shell of WebKit’s modern EME implementation, following latest W3C proposal. We implemented that downstream (at Web Platform for Embedded) as well using Thunder, which includes as a plugin a fork of what was Open Content Decryption Module (a.k.a. OpenCDM). The OpenCDM API changed quite a lot during this journey. It works well and there are millions of set-top-boxes using it currently.

The delta between downstream and the upstream GStreamer based WebKit ports was quite big, testing was difficult and syncing was not always easy, so we decided reverse the situation.

Our first step was done by my colleague Charlie Turner, who made Clear Key work upstream again while adapted some changes the Apple folks had done meanwhile. It was amazing to see Clear Key tests passing again and his work with the CDMProxy related classes was awesome. After having ClearKey working, I had to adapt them a bit to accomodate Thunder. To explain a bit about the WebKit EME architecture, I must say that there are two layers. The first is the crossplatform one, which implements the W3C API (MediaKeys, MediaKeySession, CDM…). These classes rely on the platform ones (CDMPrivate, CDMInstance, CDMInstanceSession) to handle the platform management, message exchange, etc. which would be the second layer. Apple playback system is fully integrated with their DRM system so they don’t need anything else. We do because we need to integrate our own decryptors to defer to Thunder for decryption so in the GStreamer based ports we also need the CDMProxy related classes, which would be CDMProxy, CDMInstanceProxy, CDMInstanceSessionProxy… The last two extend CDMInstance and CDMInstanceSession respectively to be able to deal with the key management, that is abstracted to the KeyHandle and KeyStore.

Once the abstraction is there (let’s remember that the abstranction works both for Clear Key and Thunder), the Thunder implementation is quite simple, just gluing the CDMProxy, CDMInstanceProxy and CDMInstanceSessionProxy classes to the Thunder system and writing a GStreamer decryptor element for it. I might have made a mistake when selecting the files but considering Thunder classes + the GStreamer common decryptor code, cloc says it is just 1198 lines of platform code. I think it is pretty low for what it does. Apart from that, obviously, there are 5760 lines of crossplatform code.

To build and run all this you need to do several things:

  1. Build the dependencies with WEBKIT_JHBUILD=1 JHBUILD_ENABLE_THUNDER="yes" to enable the old fashioned JHBuild build and force it to build the Thunder dependencies. All dependendies are on JHBuild, even Widevine is referenced but to download it you need the proper credentials as it is closed source.
  2. Pass --thunder when calling build-webkit.sh.
  3. Run MiniBrowser with WEBKIT_GST_EME_RANK_PRIORITY="Thunder" and pass parameters --enable-mediasource=TRUE --enable-encrypted-media=TRUE --autoplay-policy=allow. The autoplay policy is usually optional but in this case it is necessary for the YouTube TV tests. We need to give the Thunder decryptor a higher priority because of WebM, that does not specify a key system and without it the Clear Key one can be selected and fail. MP4 does not create trouble because the protection system is specified and the caps negotiation does its magic.

As you could have guessed if you have a closer look at the GStreamer JHBuild moduleset, you’ll see that only Widevine is supported. To support more, you only have to make them build in the Thunder ecosystem and add them to CDMFactoryThunder::supportedKeySystems.

When I coded this, all YouTube TV tests for Widevine were green in the desktop. At the moment of writing this post they aren’t because of some problem with the Widevine installation that will be sorted quickly, I hope.

VCR to WebM with GStreamer and hardware encoding

My family had bought many years ago a Panasonic VHS video camera and we had recorded quite a lot of things, holidays, some local shows, etc. I even got paid 5000 pesetas (30€ more than 20 years ago) a couple of times to record weddings in a amateur way. Since my father passed less than a year ago I have wanted to convert those VHS tapes into something that can survive better technologically speaking.

For the job I bought a USB 2.0 dongle and connected it to a VHS VCR through a SCART to RCA cable.

The dongle creates a V4L2 device for video and is detected by Pulseaudio for audio. As I want to see what I am converting live I need to tee both audio and video to the corresponding sinks and the other part would go to to the encoders, muxer and filesink. The command line for that would be:

gst-launch-1.0 matroskamux name=mux ! filesink location=/tmp/test.webm \
v4l2src device=/dev/video2 norm=255 io-mode=mmap ! queue ! vaapipostproc ! tee name=video_t ! \
queue ! vaapivp9enc rate-control=4 bitrate=1536 ! mux.video_0 \
video_t. ! queue ! xvimagesink \
pulsesrc device=alsa_input.usb-MACROSIL_AV_TO_USB2.0-02.analog-stereo ! 'audio/x-raw,rate=48000,channels=2' ! tee name=audio_t ! \
queue ! pulsesink \
audio_t. ! queue ! vorbisenc ! mux.audio_0

As you can see I convert to WebM with VP9 and Vorbis. Something interesting can be passing norm=255 to the v4l2src element so it’s capturing PAL and the rate-control=4 for VBR to the vaapivp9enc element, otherwise it will use cqp as default and file size would end up being huge.

You can see the pipeline, which is beatiful, here:

As you can see, we’re using vaapivp9enc here which is hardware enabled and having this pipeline running in my computer was consuming more or less 20% of CPU with the CPU absolutely relaxed, leaving me the necessary computing power for my daily work. This would not be possible without GStreamer and GStreamer VAAPI plugins, which is what happens with other solutions whose instructions you can find online.

If for some reason you can’t find vaapivp9enc in Debian, you should know there are a couple of packages for the intel drivers and that the one you should install is intel-media-va-driver. Thanks go to my colleague at Igalia Víctor Jáquez, who maintains gstreamer-vaapi and helped me solving this problem.

My workflow for this was converting all tapes into WebM and then cutting them in the different relevant pieces with PiTiVi running GStreamer Editing Services both co-maintained by my colleague at Igalia, Thibault Saunier.

Some rough numbers on WebKit code

My wife asked me for some rough LOC numbers on the WebKit project and I think I could share them with you here as well. They come from r221232. As I’ll take into account some generated code it is relevant to mention that I built WebKitGTK+ with the default CMake options.

First thing I did was running sloccount Source and got the following numbers:

cpp: 2526061 (70.57%)
ansic: 396906 (11.09%)
asm: 207284 (5.79%)
javascript: 175059 (4.89%)
java: 74458 (2.08%)
perl: 73331 (2.05%)
objc: 44422 (1.24%)
python: 38862 (1.09%)
cs: 13011 (0.36%)
ruby: 11605 (0.32%)
xml: 11396 (0.32%)
sh: 3747 (0.10%)
yacc: 2167 (0.06%)
lex: 1007 (0.03%)
lisp: 89 (0.00%)
php: 10 (0.00%)

This number do not include IDL code so I did some grepping to get the number myself that gave me 19632 IDL lines:

$ find Source/ -name ".idl" | xargs cat | grep -ve "^[[:space:]]\/*" -ve "^[[:space:]]*" -ve "^[[:space:]]$" -ve "^[[:space:]][$" -ve "^[[:space:]]};$" | wc -l
19632

The interesting part of the IDL files is that they are used to generate code so those 19632 IDL lines expand to:

ansic: 699140 (65.25%)
cpp: 368720 (34.41%)
python: 1492 (0.14%)
xml: 1040 (0.10%)
javascript: 883 (0.08%)
asm: 169 (0.02%)
perl: 11 (0.00%)

Let’s have a look now at the LayoutTests (they test the functionality of WebCore + the platform). Tests are composed mainly by HTML files so if you run sloccount LayoutTests you get:

javascript: 401159 (76.74%)
python: 87231 (16.69%)
xml: 22978 (4.40%)
php: 4784 (0.92%)
ansic: 3661 (0.70%)
perl: 2726 (0.52%)
sh: 199 (0.04%)

It’s quite interesting to see that sloccount does not consider HTML which is quite relevant when you’re testing a web engine so again, we have to count them manually (thanks to Carlos López who helped me to properly grep here as some binary lines were giving me a headache to get the numbers):

find LayoutTests/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
2205690

You can see 2205690 of “meaningful lines” that combine HTML + other languages that you can see above. I can’t substract here to just get the HTML lines because the number above take into account files with a different extension than HTML, though many of them do include other languages, specially JavaScript.

But the LayoutTests do not include only pure WebKit tests. There are some imported ones so it might be interesting to run the same procedure under LayoutTests/imported to see which ones are imported and not written directly into the WebKit project. I emphasize that because they can be written by WebKit developers in other repositories and actually I can present myself and Youenn Fablet as an example as we wrote tests some tests that were finally moved into the specification and included back later when imported. So again, sloccount LayoutTests/imported:

python: 84803 (59.99%)
javascript: 51794 (36.64%)
ansic: 3661 (2.59%)
php: 575 (0.41%)
xml: 250 (0.18%)
sh: 199 (0.14%)
perl: 86 (0.06%)

The same procedure to count HTML + other stuff lines inside that directory gives a number of 295490:

$ find LayoutTests/imported/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
295490

There are also some other tests that we can talk about, for example the JSTests. I’ll mention already the numbers summed up regarding languages and the manual HTML code (if you made it here, you know the drill already):

javascript: 1713200 (98.64%)
xml: 20665 (1.19%)
perl: 2449 (0.14%)
python: 421 (0.02%)
ruby: 56 (0.00%)
sh: 38 (0.00%)
HTML+stuff: 997

ManualTests:

javascript: 297 (41.02%)
ansic: 187 (25.83%)
java: 118 (16.30%)
xml: 103 (14.23%)
php: 10 (1.38%)
perl: 9 (1.24%)
HTML+stuff: 16026

PerformanceTests:

javascript: 950916 (83.12%)
cpp: 147194 (12.87%)
ansic: 38540 (3.37%)
asm: 5466 (0.48%)
sh: 872 (0.08%)
ruby: 419 (0.04%)
perl: 348 (0.03%)
python: 325 (0.03%)
xml: 5 (0.00%)
HTML+stuff: 238002

TestsWebKitAPI:

cpp: 44753 (99.45%)
ansic: 163 (0.36%)
objc: 76 (0.17%)
xml: 7 (0.02%)
javascript: 1 (0.00%)
HTML+stuff: 3887

And this is all. Remember that these are just some rough statistics, not a “scientific” paper.

Update:

In her expert opinion, in the WebKit project we are devoting around 50% of the total LOC to testing, which makes it a software engineering “textbook” project regarding testing and I think we can be proud of it!

Web Engines Hackfest 2016

From September 26th to 28th we celebrated at the Igalia HQ the 2016 edition of the Web Engines Hackfest. This year we broke all records and got participants from the three main companies behind the three biggest open source web engines, say Mozilla, Google and Apple. Or course, it was not only them, we had some other companies and ourselves. I was active part of the organization and I think we not only did not get any complain but people were comfortable and happy around.

We had several talks (I included the slides and YouTube links):

We had lots and lots of interesting hacking and we also had several breakout sessions:

  • WebKitGTK+ / Epiphany
  • Servo
  • WPE / WebKit for Wayland
  • Layout Models (Grid, Flexbox)
  • WebRTC
  • JavaScript Engines
  • MathML
  • Graphics in WebKit

What I did during the hackfest was working with Enrique and Žan to advance on reviewing our downstream implementation of our GStreamer based of Media Source Extensions (MSE) in order to land it as soon as possible and I can proudly say that we did already (we didn’t finish at the hackfest but managed to do it after it). We broke the bots and pissed off Michael and Carlos but we managed to deactivate it by default and continue working on it upstream.

So summing up, from my point of view and it is not only because I was part of the organization at Igalia, based also in other people’s opinions, I think the hackfest was a success and I think we will continue as we were or maybe growing a bit (no spoilers!).

Finally I would like to thank our gold sponsors Collabora and Igalia and our silver sponsor Mozilla.

WebRTC in WebKit/WPE

For some time I worked at Igalia to enable WebRTC on WebKitForWayland or WPE for the Raspberry Pi 2.

The goal was to have the WebKit WebRTC tests working for a demo. My fellow Igalian Alex was working on the platform itself in WebKit and assisting with some tuning for the Pi on WebKit but the main work needed to be done in OpenWebRTC.

My other fellow Igalian Phil had begun a branch to work on this that was half way with some workarounds. My first task was getting into combat/workaround mode and make OpenWebRTC work with compressed streams from gst-rpicamsrc. OpenWebRTC supported only raw video streams and that Raspberry Pi Cam module GStreamer element provides only H264 encoded ones. I moved some encoders and parsers around, some caps modifications, removed some elements that didn’t work on the Pi and made it work eventually. You can see the result at:

<

video style=”max-width: 100%;” src=”/xrcalvar/files/2016/10/201607-webrtc.mp4″ controls poster=”/xrcalvar/files/2016/10/webrtc-poster.png”>

To make this work by yourselves you needed a custom branch of Buildroot where you could build with the proper plugins enabled also selected the appropriate branches in WPE and OpenWebRTC.

Unfortunately the work was far from being finished so I continued the effort to try to make the arch changes in OpenWebRTC have production quality and that meant do some tasks step by step:

  • Rework the video orientation code: The workaround deactivated it as so far it was being done in GStreamer. In the case of rpicamsrc that can be done by the hardware itself so I cooked a GStreamer interface to enable rotation the same way it was done for the [gl]videoflip elements. The idea would be deprecate the original ones and use the new interface. These landed both in videoflip and glvideoflip. Of course I also implemented it on gst-rpicamsrc here and here and eventually in OpenWebRTC sources.
  • Rework video flip: Once OpenWebRTC sources got orientation support, I could rework the flip both for local and remote feeds.
  • Add gl{down|up}load elements back: There were some issues with the gl elements to upload and download textures, which we had removed. I readded them again.
  • Reworked bins linking: In OpenWebRTC there are some bins that are created to perform some tasks and depending on different circumstances you add or not some elements. I reworked the way those elements are linked so that we don’t have to take into account all the use cases to link them. Now this is easier as the elements are linked as they are the added to the bin.
  • Reworked the renderer_disabled: As in the case for orientation, some elements such as gst-rpicamsrc are able to change color and balance so I added support for that to avoid having that done by GStreamer elements if not necessary. In this case the proper interfaces were already there in GStreamer.
  • Moved the decoding/parsing from the source to the renderer: Before our changes the source was parsing/decoding the remote feeds, local sources were not decoded, just raw was supported. Our workarounds made the local sources decode too but this was not working for all cases. So why decoding at the sources when GStreamer has caps and you can just chain all that to the renderers? So I did eventually, I moved the parsing/decoding to the renderers. This took fixing all the caps negotiation from sources to renderers. Unfortunatelly I think I broke audio on the way, but surely nothing difficult to fix.

This is still a work in progress and now I am changing tasks and handing this over back to my fellow Igalians Phil, who I am sure will do an awesome job together with Alex.

And again, thanks to Igalia for letting me work on this and to Metrological that is sponsoring this work.

Über latest Media Source Extensions improvements in WebKit with GStreamer

In this post I am going to talk about the implementation of the Media Source Extensions (known as MSE) in the WebKit ports that use GStreamer. These ports are WebKitGTK+, WebKitEFL and WebKitForWayland, though only the latter has the latest work-in-progress implementation. Of course we hope to upstream WebKitForWayland soon and with it, this backend for MSE and the one for EME.

My colleague Enrique at Igalia wrote a post about this about a week ago. I recommend you read it before continuing with mine to understand the general picture and the some of the issues that I managed to fix on that implementation. Come on, go and read it, I’ll wait.

One of the challenges here is something a bit unnatural in the GStreamer world. We have to process the stream information and then make some metadata available to the JavaScript app before playing instead of just pushing everything to a playing pipeline and being happy. For this we created the AppendPipeline, which processes the data and extracts that information and keeps it under control for the playback later.

The idea of the our AppendPipeline is to put a data stream into it and get it processed at the other side. It has an appsrc, a demuxer (qtdemux currently) and an appsink to pick up the processed data. Something tricky of the spec is that when you append data into the SourceBuffer, that operation has to block it and prevent with errors any other append operation while the current is ongoing, and when it finishes, signal it. Our main issue with this is that the the appends can contain any amount of data from headers and buffers to only headers or just partial headers. Basically, the information can be partial.

First I’ll present again Enrique’s AppendPipeline internal state diagram:

First let me explain the easiest case, which is headers and buffers being appended. As soon as the process is triggered, we move from Not started to Ongoing, then as the headers are processed we get the pads at the demuxer and begin to receive buffers, which makes us move to Sampling. Then we have to detect that the operation has ended and move to Last sample and then again to Not started. If we have received only headers we will not move to Sampling cause we will not receive any buffers but we still have to detect this situation and be able to move to Data starve and then again to Not started.

Our first approach was using two different timeouts, one to detect that we should move from Ongoing to Data starve if we did not receive any buffer and another to move from Sampling to Last sample if we stopped receiving buffers. This solution worked but it was a bit racy and we tried to find a less error prone solution.

We tried then to use custom downstream events injected from the source and at the moment they were received at the sink we could move from Sampling to Last sample or if only headers were injected, the pads were created and we could move from Ongoing to Data starve. It took some time and several iterations to fine tune this but we managed to solve almost all cases but one, which was receiving only partial headers and no buffers.

If the demuxer received partial headers and no buffers it stalled and we were not receiving any pads or any event at the output so we could not tell when the append operation had ended. Tim-Philipp gave me the idea of using the need-data signal on the source that would be fired when the demuxer ran out of useful data. I realized then that the events were not needed anymore and that we could handle all with that signal.

The need-signal is fired sometimes when the pipeline is linked and also when the the demuxer finishes processing data, regardless the stream contains partial headers, complete headers or headers and buffers. It works perfectly once we are able to disregard that first signal we receive sometimes. To solve that we just ensure that at least one buffer left the appsrc with a pad probe so if we receive the signal before any buffer was detected at the probe, it shall be disregarded to consider that the append has finished. Otherwise, if we have seen already a buffer at the probe we can consider already than any need-data signal means that the processing has ended and we can tell the JavaScript app that the append process has ended.

Both need-data signal and probe information come in GStreamer internal threads so we could use mutexes to overcome any race conditions. We thought though that deferring the operations to the main thread through the pipeline bus was a better idea that would create less issues with race conditions or deadlocks.

To finish I prefer to give some good news about performance. We use mainly the YouTube conformance tests to ensure our implementation works and I can proudly say that these changes reduced the time of execution in half!

That’s all folks!

Web Engines Hackfest according to me

And once again, in December we celebrated the hackfest. This year happened between Dec 7-9 at the Igalia premises and the scope was much broader than WebKitGTK+, that’s why it was renamed as Web Engines Hackfest. We wanted to gather people working on all open source web engines and we succeeded as we had people working on WebKit, Chromium/Blink and Servo.

The edition before this I was working with Youenn Fablet (from Canon) on the Streams API implementation in WebKit and we spent our time on the same thing again. We have to say that things are much more mature now. During the hackfest we spent our time in fixing the JavaScriptCore built-ins inside WebCore and we advanced on the automatic importation of the specification web platform tests, which are based on our prior test implementation. Since now they are managed there, it does not make sense to maintain them inside WebKit too, we just import them. I must say that our implementation is fairly complete since we support the current version of the spec and have almost all tests passing, including ReadableStream, WritableStream and the built-in strategy classes. What is missing now is making Streams work together with other APIs, such as Media Source Extensions, Fetch or XMLHttpRequest.

There were some talks during the hackfest and we did not want to be less, so we had our own about Streams. You can enjoy it here:

You can see all hackfest talks in this YouTube playlist. The ones I liked most were the ones by Michael Catanzaro about HTTP security, which is always interesting given the current clumsy political movements against cryptography and the one by Dominik Röttsches about font rendering. It is really amazing what a browser has to do just to get some letters painted on the screen (and look good).

As usual, the environment was amazing and we had a great time, including the traditional Street Fighter‘s match, where Gustavo found a worthy challenger in Changseok 🙂

Of course, I would like to thank Collabora and Igalia for sponsoring the event!

And by the way, quite shortly after that, I became a WebKit reviewer!

WebKit Contributors Meeting 2015 (late, I know)

After writing my last post I realized that I needed to write a bit more about what I had been doing at the WebKit Contributors Meeting.

First thing to say is that it happened in March at Apple campus in Cupertino and I atteded as part of the Igalia gang.

My goal when I went there was to discuss with Youenn Fablet about Streams API and we are implementing and see how we could bootstrap the reviews and being to get the code reviewed and landed efficiently. Youenn and I also made a presentation (mainly him) about it. At that moment we got some comments and help from Benjamin Poulain and nowadays we are also working with Darin Adler and Geoffrey Garen so the work is ongoing.

WebRTC was also a hot topic and we talked a bit about how to deal with the promises as they seem to be involved in the WebRTC standard was well. My Igalian partner Philippe was missed in this regard as he is involved in the development of WebRTC in WebKit, but he unfortunately couldn’t make it because of personal reasons.

I also had an interesting talk with Jer Noble and Eric Carlson about Media Source and Encrypted Media Extensions. I told them about the several downstream implementations that we are or were working on, specially the GStreamer based implementation of WPE one and that we expect to begin to upstream soon (update: this is done, yay!). They commented that they still have doubts about the abstractions they made for them and of course I promised to get back to them when we begin with the job. Actually I already discussed some issues with Quique, another fellow Igalian.

Among the other interesting discussions, I found very necessary the migration of Mac port to CMake. Actually, I am experiencing now the painbenefits of using XCode to add files, specially the generated ones to the compilation. I hope that Alex succeeds with the task and soon we have a common build system for all main ports.

ReadableStream almost ready

Hello dear readers! Long time no see! You might thing that I have been lazy, and I was in blog posting but I was coding like mad.

First remarkable thing is that I attended the WebKit Contributors Meeting that happened in March at Apple campus in Cupertino as part of the Igalia gang. There we discussed of course about Streams API, its state and different implementation possibilities. Another very interesting point which would make me very happy would be the movement of Mac to CMake.

In a previous post I already introduced the concepts of the Streams API and some of its possible use cases so I’ll save you that part now. The news is that ReadableStream has its basic funcionality complete. And what does it mean? It means that you can create a ReadableStream by providing the constructor with the underlying source and the strategy objects and read from it with its reader and all the internal mechanisms of backpresure and so on will work according to the spec. Yay!

Nevertheless, there’s still quite some work to do to complete the implementation of Streams API, like the implementation of byte streams, writable and transform streams, piping operations and built-in strategies (which is what I am on right now).I don’t know either when Streams API will be activated by default in the next builds of Safari, WebKitGTK+ or WebKit for Wayland, but we’ll make it at some point!

Code suffered already lots of changes because we were still figuring out which architecture was the best and Youenn did an awesome job in refactoring some things and providing support for promises in the bindings to make the implementation of ReadableStream more straitghforward and less “custom”.

Implementation could still suffer quite some important changes as, as part of my work implementing the strategies, some reviewers raised their concerns of having Streams API implemented inside WebCore in terms of IDL interfaces. I have already a proof of concept of CountQueuingStrategy and ByteLengthQueuingStrategy implemented inside JavaScriptCore, even a case where we use built-in JavaScript functions, which might help to keep closer to the spec if we can just include JavaScript code directly. We’ll see how we end up!

Last and not least I would like to thank Igalia for sponsoring me to attend the WebKit Contributors Meeting in Cupertino and also Adenilson for being so nice and taking us to very nice places for dinner and drinks that we wouldn’t be able to find ourselves (I owe you, promise to return the favor at the Web Engines Hackfest). It was also really nice to have the oportunity of quickly visiting New York City for some hours because of the long connection there which usually would be a PITA, but it was very enjoyable this time.