January, 2012

Jan 12

GStreamer video decoder for SysLink

This blog post is another one of our series about SysLink (1, 2): finally I came with a usable GStreamer element for video decoding, which talks directly with the SysLink framework in the OMAP4′s kernel.

As we stated before, SysLink is a set of kernel modules that enables the initialization of remote processors, in a multi-core system (which might be heterogeneous), that run their own operating systems in their own memory space, and also, SysLink, enables the communication between the host processor with the remotes ones. This software and hardware setup could be viewed as an Asymmetric Multi-Processing model.

TI provides a user-space library to access the SysLink services, but I find its implementation a bit clumsy, so I took the challenge of rewrite a part of it, in a simple and straightforward fashion, as gst-dsp does for DSP/Bridge. The result is the interface syslink.h.

Simultaneously, I wrote the utility to load and monitor the operating system into the Cortex-M3 processors for the PandaBoard. This board, such as all the OMAP4-based SoCs, has two ARM Core-M3 as remote processors. Hence, this so called daemon.c, is in charge of loading the firmware images, setting the processor in its running state, allocating the interchange memory areas, and monitoring for any error message.

In order to load the images files into the processors memory areas, it is required to parse the ELF header of the files, and that is the reason of why I decided to depend on libelf, rather than write another ELF parser. Yes, one sad dependency for the daemon. The use of libelf is isolated in elf.h.

When I was developing the daemon, for debugging purposes, I needed to trace the messages generated by the images in the remote processors. For that reason I wrote tracer.c, whose only responsibility is to read and to parse the ring buffer used by the images, in the remote processors, for logging.

Now, in OMAP4, the subsystem comprised by the two Cortex-M3 processors is called Ducati. The first processor is used only for the exchange of notification messages among the host processor and the second M3 processor, where all the multimedia processing is done.

There are at least two images for the second Cortex-M3 processor: DOMX, which is closed source and focused, as far as I know, on the OMX IL interface; and, in the other hand, DCE, which is open source, it’s developed by Rob Clark, and it provides a simple interface of buffers interchange.

My work use DCE, obviously.

But, please, let me go back one step in this component description: in order to send and receive commands between the host processor and one remote processor, SysLink uses a packet based protocol, called Remote Command Messaging, or just RCM for the friends. There are two types of interfaces of RCM, the client and the server. The client interface is used by the applications running in the host processor, and they request services to the server interface, exposed by the systems running in the remote processors, it is accepting the requests and it returns results.

The RCM client interface is in rcm.h.

Above the RCM client, sits my dce.h interface, which is in charge of control the state of the video decoders and it is also in charge of handling the buffers.

But these buffers are tricky. They are not memory areas allocated by a simple malloc, instead they are buffers allocated by a mechanism in the kernel called tiler. The purpose of this mechanism is to provide buffers with capacity of 2D operations by hardware (in other words, cheap and quick in computations). These buffers are shared along all the processing pipeline, so the copies of memory areas are not needed. Of course, in order to achieve this paradise, the video renderer must handle this type of buffers too.

In my code, the interface to the tiler mechanism is in tiler.h.

And finally, the all mighty GStreamer element for video decoding: gstsyslinkvdec.c! Following the spirit of gst-dsp, this element is intended to deal with all the available video decoders in the DCE image, although for now, the H264 decoding is the only one handled.

For now, I have only tested the decoder with fakesink, because the element pushes tiled buffers onto the source pad, and, in order to have an efficient video player, it is required a video renderer that handles this type of tiled buffers. TI is developing one, pvrvideosink, but it depends on EGL, and I would like to avoid X whenever is possible.

I have not measured either the performance of this work compared with the TI’s combo (syslink user-space library / memmgr / libdce / gst-ducati), but I suspect that my approach would be little more efficient, faster, and, at least, simpler ;)

The sad news, as in every hard paced development, all these kernel mechanisms are already deprecated: SysLink and DMM-Tiler will never be mainlined into the kernel, but their successors, rproc/rpmsg and omapdrm, have a good chance. And both have a very different approach since their predecessors. Nevertheless, SysLink is already here and it is being used widely, so this effort has an opportunity for being worthy.

My current task is to decide if I should drop the 2D buffers in the video decoders or if I should develop a video renderer for them.