VA-API and DRM/KMS in MinnowBoard

In 2012 I started to work on a video renderer for GStreamer which uses directly the DRM/KMS kernel subsystem to display images. I even blogged about it, but I didn’t finished it.

Nonetheless, in December last year, a customer asked us to finish the element, but this time focusing on i.MX6, rather than OMAP4. Consequently, early this year, kmssink got merged into upstream.

The video sink has some nice features, such as DMA-buf importing through the PRIME kernel interface. This feature makes possible a zero-copy path when the video decoder delivers dmabuf based frames. For this particular project, the kernel we used supports the CODA media subsystem, and through the GStreamer element v4l2videodec we could link pipelines negotiating dmabuf sharing, and thus we played videos very efficiently.

During the last GStreamer Hackfest, Nicolas Dufresne got working the MFC media subsystem, for Exynos, with v4l2videodec and kmssink, but I don’t remember if he also got the dmabuf sharing working.

All in all, kmssink had only been tested in a couple of ARM devices, so I wonder, as a gstremer-vaapi co-maintainer, if it also works in x86 devices.

Some colleagues at Igalia, use a Minnowboard to test WebKit4Wayland, so I borrowed it, installed a minimal Fedora 23 in it, and, surprisingly, I found out that it has a nice VA-API support:

   $ vainfo
   error: can't connect to X server!
   libva info: VA-API version 0.38.1
   libva info: va_getDriverName() returns 0
   libva info: Trying to open /usr/lib64/dri/i965_drv_video.so
   libva info: Found init function __vaDriverInit_0_38
   libva info: va_openDriver() returns 0
   vainfo: VA-API version: 0.38 (libva 1.6.2)
   vainfo: Driver version: Intel i965 driver for Intel(R) Bay Trail - 1.6.2
   vainfo: Supported profile and entrypoints
         VAProfileMPEG2Simple            : VAEntrypointVLD
         VAProfileMPEG2Simple            : VAEntrypointEncSlice
         VAProfileMPEG2Main              : VAEntrypointVLD
         VAProfileMPEG2Main              : VAEntrypointEncSlice
         VAProfileH264ConstrainedBaseline: VAEntrypointVLD
         VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
         VAProfileH264Main               : VAEntrypointVLD
         VAProfileH264Main               : VAEntrypointEncSlice
         VAProfileH264High               : VAEntrypointVLD
         VAProfileH264High               : VAEntrypointEncSlice
         VAProfileH264StereoHigh         : VAEntrypointVLD
         VAProfileVC1Simple              : VAEntrypointVLD
         VAProfileVC1Main                : VAEntrypointVLD
         VAProfileVC1Advanced            : VAEntrypointVLD
         VAProfileNone                   : VAEntrypointVideoProc
         VAProfileJPEGBaseline           : VAEntrypointVLD


Afterwards, I compiled GStreamer using gst-uninstalled in order to have kmssink and the master branch of gstreamer-vaapi. And got it! I had hardware accelerated decoding with VA-API, and video rendering using KMS/DRM. But, sadly, no zero copy, since, in order to have dmabuf sharing, we have to finish and to merge bug 755072 in gstreamer-vaapi.

Running the typical Big Bunny video (1080p, H.264) the playback consumes less than the 50% of the CPU usage, compared with the 100% of CPU usage with software decoders (and a lot of frames dropped).

I recorded a small video with the experiment as a proof.

Update: Javer Martinez told me in twitter, that MFC can do dma-buf export since uses vb2 and Exynos DRM can import, but he couldn’t get zero copy to work due missing format support.

A GStreamer Video Sink using KMS

The purpose of this blog post is to show the concepts related to the GstKMSSink, a new video sink for GStreamer 1.0, co-developed by Alessandro Decina and myself, done during my hack-fest time in the Igalia’s multimedia team.

One interesting thing to notice is that this element shows it is possible to write DRI clients without the burden of X Window.

Brief introduction to graphics in Linux

If you want to dump images onto your screen, you can simply use the frame buffer device. It provides an abstraction for the graphics hardware and represents the frame buffer of the video hardware. This kernel device allows user application to access the graphics hardware without knowing the low-level details [1].

In GStreamer, we have two options for displaying images using the frame buffer device; or three, if we use OMAP3: fbvideosink, fbdevsink and gst-omapfb.

Nevertheless, since the appearance of the GPUs, the frame buffer device interface has not been sufficient to fulfill all their capabilities. A new kernel interface ought to emerge. And that was the Direct Rendering Manager (DRM).

What in the hell is DRM?

The DRM layer is intended to support the needs of complex graphics devices, usually containing programmable pipelines well suited to 3D graphics acceleration [2]. It deals with [3]:

  1. A DMA queue for graphic buffers transfers [4].
  2. It provides locks for graphics hardware, treating it as shared resource for simultaneous 3D applications [5].
  3. And it provides secure hardware access, preventing clients from escalating privileges [6].

The DRM layer consists of two in-kernel drivers: a generic DRM driver, and another which has specific support for the video hardware [7]. This is possible because the DRM engine is extensible, enabling the device-specific driver to hook out those functionalities that are required by the hardware. For example, in the case of the Intel cards, the Linux kernel driver i915 supports this card and couples its capabilities to the DRM driver.

The device-specific driver, in particular, should cover two main kernel interfaces: the Kernel Mode Settings (KMS) and the Graphics Execution Manager (GEM). Both elements are also exposed to the user-space through the DRM.

With KMS, the user can ask the kernel to enable native resolution in the frame buffer, setting certain display resolution and colour depth mode. One of the benefits of doing it in kernel is that, since the kernel is in complete control of the hardware, it can switch back in the case of failure [8].

In order to allocate command buffers, cursor memory, scanout buffers, etc., the device-specific driver should support a memory manager, and GEM is the manager with more acceptance these days, because of its simplicity [9].

Beside to the graphics memory management, GEM ensures conflict-free sharing of data between applications by managing the memory synchronization. This is important because modern graphics hardware are essentially NUMA environments.

The following diagram shows the components view of the DRM layer:

Direct Rendering Infrastructure

What is the deal with KMS?

KMS is important because on it relies GEM and DRM to allocate frame buffers and to configure the display. And it is important to us because almost all of the ioctls called by the GStreamer element are part of the KMS subset.

Even more, there are some voices saying that KMS is the future replacement for the frame buffer device [10].

To carry out its duties, the KMS identifies five main concepts [11,12]:

Frame buffer:
The frame buffer is just a buffer, in the video memory, that has an image encoded in it as an array of pixels. As KMS configures the ring buffer in this video memory, it holds a the information of this configuration, such as width, height, color depth, bits per pixel, pixel format, and so on.
CRTC:
Stands for Cathode Ray Tube Controller. It reads the data out of the frame buffer and generates the video mode timing. The CRTC also determines what part of the frame buffer is read; e.g., when multi-head is enabled, each CRTC scans out of a different part of the video memory; in clone mode, each CRTC scans out of the same part of the memory.Hence, from the KMS perspective, the CRTC’s abstraction contains the display mode information, including, resolution, depth, polarity, porch, refresh rate, etc. Also, it has the information of the buffer region to display and when to change to the next frame buffer.
Overlay planes:
Overlays are treated a little like CRTCs, but without associated modes our encoder trees hanging off of them: they can be enabled with a specific frame buffer attached at a specific location, but they don’t have to worry about mode setting, though they do need to have an associated CRTC to actually pump their pixels out [13].
Encoder:
The encoder takes the digital bitstream from the CRTC and converts it to the appropriate format across the connector to the monitor.
Connector:
The connector provides the appropriate physical plug for the monitor to connect to, such as HDMI, DVI-D, VGA, S-Video, etc..

And what about this KMSSink?

KMSSink is a first approach towards a video sink as a DRI client. For now it only works in the panda-board with a recent kernel (I guess, 3.3 would make it).

For now it only uses the custom non-tiled buffers and use an overlay plane to display them. So, it is in the to-do, add support for more hardware.

 Bibliography

[1] http://free-electrons.com/kerneldoc/latest/fb/framebuffer.txt
[2] http://free-electrons.com/kerneldoc/latest/DocBook/drm/drmIntroduction.html
[3] https://www.kernel.org/doc/readme/drivers-gpu-drm-README.drm
[4] http://dri.sourceforge.net/doc/drm_low_level.html
[5] http://dri.sourceforge.net/doc/hardware_locking_low_level.html
[6] http://dri.sourceforge.net/doc/security_low_level.html
[7] https://en.wikipedia.org/wiki/Direct_Rendering_Manager
[8] http://www.bitwiz.org.uk/s/how-dri-and-drm-work.html
[9] https://lwn.net/Articles/283798/
[10] http://phoronix.com/forums/showthread.php?23756-KMS-as-anext-gen-Framebuffer
[11] http://elinux.org/images/7/71/Elce11_dae.pdf
[12] http://www.botchco.com/agd5f/?p=51
[13] https://lwn.net/Articles/440192/