FFmpeg 101
A high-level architecture overview to start with FFmpeg.
FFmpeg package content #
FFmpeg is composed of a suite of tools and libraries.
FFmpeg tools #
The tools can be used to encode/decode/transcode a multitude of different audio and video formats, and to stream the encoded media over networks.
- ffmpeg: a command line tool to convert multimedia files between formats
- ffplay: a simple mediaplayer based on SDL and the FFmpeg libraries
- ffprobe: a simple multimedia stream analyzer
FFmpeg libraries #
The libraries can be used to integrate those same features into your own product.
- libavformat: I/O and muxing/demuxing
- libavcodec: encoding/decoding
- libavfilter: graph-based filters for raw media
- libavdevice: input/output devices
- libavutil: common multimedia utilities
- libswresample: audio resampling, samples format conversion and audio mixing
- libswscale: color conversion and image scaling
- libpostproc: video post-processing (deblocking/noise filters)
FFmpeg simple player #
A basic usage of FFmpeg is to demux a multimedia stream (obtained from a file or from the network) into its audio and video streams and then to decode those streams into raw audio and raw video data.
To manage the media streams, FFmpeg uses the following structures:
- AVFormatContext: a high level structure providing sync, metadata and muxing for the streams
- AVStream: a continuous stream (audio or video)
- AVCodec: defines how data are encoded and decoded
- AVPacket: encoded data in the stream
- AVFrame: decoded data (raw video frame or raw audio samples)
The process used to demux and decode follows this logic:
Here is the basic code needed to read an encoded multimedia stream from a file, analyze its content and demux the audio
and video streams. Those features are provided by the libavformat library and it uses the AVFormatContext
and
AVStream
structures to store the information.
// Allocate memory for the context structure
AVFormatContext* format_context = avformat_alloc_context();
// Open a multimedia file (like an mp4 file or any format recognized by FFmpeg)
avformat_open_input(&format_context, filename, NULL, NULL);
printf("File: %s, format: %s\n", filename, format_context->iformat->name);
// Analyze the file content and identify the streams within
avformat_find_stream_info(format_context, NULL);
// List the streams
for (unsigned int i = 0; i < format_context->nb_streams; ++i)
{
AVStream* stream = format_context->streams[i];
printf("---- Stream %02d\n", i);
printf(" Time base: %d/%d\n", stream->time_base.num, stream->time_base.den);
printf(" Framerate: %d/%d\n", stream->r_frame_rate.num, stream->r_frame_rate.den);
printf(" Start time: %" PRId64 "\n", stream->start_time);
printf(" Duration: %" PRId64 "\n", stream->duration);
printf(" Type: %s\n", av_get_media_type_string(stream->codecpar->codec_type));
uint32_t fourcc = stream->codecpar->codec_tag;
printf(" FourCC: %c%c%c%c\n", fourcc & 0xff, (fourcc >> 8) & 0xff, (fourcc >> 16) & 0xff, (fourcc >> 24) & 0xff);
}
// Close the multimedia file and free the context structure
avformat_close_input(&format_context);
Once we’ve got the different streams from inside the multimedia file, we need to find specific codecs to decode the
streams to raw audio and raw video data. All codecs are statically included in libavcodec. You can easily create
your own codec by just creating an instance of the FFCodec structure and registering it as an
extern const FFCodec
in libavcodec/allcodecs.c, but this would be a different topic for another post.
To find the codec corresponding to the content of an AVStream
, we can use the following code:
// Stream obtained from the AVFormatContext structure in the former streams listing loop
AVStream* stream = format_context->streams[i];
// Search for a compatible codec
const AVCodec* codec = avcodec_find_decoder(stream->codecpar->codec_id);
if (!codec)
{
fprintf(stderr, "Unsupported codec\n");
continue;
}
printf(" Codec: %s, bitrate: %" PRId64 "\n", codec->name, stream->codecpar->bit_rate);
if (codec->type == AVMEDIA_TYPE_VIDEO)
{
printf(" Video resolution: %dx%d\n", stream->codecpar->width, stream->codecpar->height);
}
else if (codec->type == AVMEDIA_TYPE_AUDIO)
{
printf(" Audio: %d channels, sample rate: %d Hz\n",
stream->codecpar->ch_layout.nb_channels,
stream->codecpar->sample_rate);
}
With the right codec and codec parameters extracted from the AVStream
information, we can now allocate the
AVCodecContext
structure that will be used to decode the corresponding stream. It is important to remember the index
of the stream we want to decode from the former streams list (format_context->streams
) because this index will be
used later to identify the demuxed packets extracted by the AVFormatContext
.
In the following code we’re going to select the first video stream contained in the multimedia file.
// first_video_stream_index is determined during the streams listing in the former loop
int first_video_stream_index = ...;
AVStream* first_video_stream = format_context->streams[first_video_stream_index];
AVCodecParameters* first_video_stream_codec_params = first_video_stream->codecpar;
const AVCodec* first_video_stream_codec = avcodec_find_decoder(first_video_stream_codec_params->codec_id);
// Allocate memory for the decoding context structure
AVCodecContext* codec_context = avcodec_alloc_context3(first_video_stream_codec);
// Configure the decoder with the codec parameters
avcodec_parameters_to_context(codec_context, first_video_stream_codec_params);
// Open the decoder
avcodec_open2(codec_context, first_video_stream_codec, NULL);
Now that we have a running decoder, we can extract the demuxed packets using the AVFormatContext
structure and decode
them to raw video frames. For that we need 2 different structures:
AVPacket
which contains the encoded packets extracted from the input multimedia file,AVFrame
which will contain the raw video frame after theAVCodecContext
has decoded the former packets.
// Allocate memory for the encoded packet structure
AVPacket* packet = av_packet_alloc();
// Allocate memory for the decoded frame structure
AVFrame* frame = av_frame_alloc();
// Demux the next packet from the input multimedia file
while (av_read_frame(format_context, packet) >= 0)
{
// The demuxed packet uses the stream index to identify the AVStream it is coming from
printf("Packet received for stream %02d, pts: %" PRId64 "\n", packet->stream_index, packet->pts);
// In our example we are only decoding the first video stream identified formerly by first_video_stream_index
if (packet->stream_index == first_video_stream_index)
{
// Send the packet to the previsouly initialized decoder
int res = avcodec_send_packet(codec_context, packet);
if (res < 0)
{
fprintf(stderr, "Cannot send packet to the decoder: %s\n", av_err2str(res));
break;
}
// The decoder (AVCodecContext) acts like a FIFO queue, we push the encoded packets on one end and we need to
// poll the other end to fetch the decoded frames. The codec implementation may (or may not) use different
// threads to perform the actual decoding.
// Poll the running decoder to fetch all available decoded frames until now
while (res >= 0)
{
// Fetch the next available decoded frame
res = avcodec_receive_frame(codec_context, frame);
if (res == AVERROR(EAGAIN) || res == AVERROR_EOF)
{
// No more decoded frame is available in the decoder output queue, go to next encoded packet
break;
}
else if (res < 0)
{
fprintf(stderr, "Error while receiving a frame from the decoder: %s\n", av_err2str(res));
goto end;
}
// Now the AVFrame structure contains a decoded raw video frame, we can process it further...
printf("Frame %02" PRId64 ", type: %c, format: %d, pts: %03" PRId64 ", keyframe: %s\n",
codec_context->frame_num, av_get_picture_type_char(frame->pict_type), frame->format, frame->pts,
(frame->flags & AV_FRAME_FLAG_KEY) ? "true" : "false");
// The AVFrame internal content is automatically unreffed and recycled during the next call to
// avcodec_receive_frame(codec_context, frame)
}
}
// Unref the packet internal content to recycle it for the next demuxed packet
av_packet_unref(packet);
}
// Free the previously allocated memory for the different FFmpeg structures
end:
av_packet_free(&packet);
av_frame_free(&frame);
avcodec_free_context(&codec_context);
avformat_close_input(&format_context);
The way the former code is acting is resumed in the next diagram:
You can find the full code here.
To build the example you will need meson and ninja. If you have
python and pip installed, you can install them very easily by calling pip3 install meson ninja
. Then, once the
example archive extracted to a ffmpeg-101 folder, go to this folder and call: meson setup build
. It will
automatically download the right version of FFmpeg if you don’t have it already installed on your system. Then call:
ninja -C build
to build the code and ./build/ffmpeg-101 sample.mp4
to run it.
You should obtain the following result:
File: sample.mp4, format: mov,mp4,m4a,3gp,3g2,mj2
---- Stream 00
Time base: 1/3000
Framerate: 30/1
Start time: 0
Duration: 30000
Type: video
FourCC: avc1
Codec: h264, bitrate: 47094
Video resolution: 206x80
---- Stream 01
Time base: 1/44100
Framerate: 0/0
Start time: 0
Duration: 440320
Type: audio
FourCC: mp4a
Codec: aac, bitrate: 112000
Audio: 2 channels, sample rate: 44100 Hz
Packet received for stream 00, pts: 0
Send video packet to decoder...
Frame 01, type: I, format: 0, pts: 000, keyframe: true
Packet received for stream 00, pts: 100
Send video packet to decoder...
Frame 02, type: P, format: 0, pts: 100, keyframe: false
Packet received for stream 00, pts: 200
Send video packet to decoder...
Frame 03, type: P, format: 0, pts: 200, keyframe: false
Packet received for stream 00, pts: 300
Send video packet to decoder...
Frame 04, type: P, format: 0, pts: 300, keyframe: false
Packet received for stream 00, pts: 400
Send video packet to decoder...
Frame 05, type: P, format: 0, pts: 400, keyframe: false
Packet received for stream 00, pts: 500
Send video packet to decoder...
Frame 06, type: P, format: 0, pts: 500, keyframe: false
Packet received for stream 00, pts: 600
Send video packet to decoder...
Frame 07, type: P, format: 0, pts: 600, keyframe: false
Packet received for stream 00, pts: 700
Send video packet to decoder...
Frame 08, type: P, format: 0, pts: 700, keyframe: false
Packet received for stream 01, pts: 0
Packet received for stream 01, pts: 1024
Packet received for stream 01, pts: 2048
Packet received for stream 01, pts: 3072
Packet received for stream 01, pts: 4096
Packet received for stream 01, pts: 5120
Packet received for stream 01, pts: 6144
Packet received for stream 01, pts: 7168
Packet received for stream 01, pts: 8192
Packet received for stream 01, pts: 9216
Packet received for stream 01, pts: 10240
Packet received for stream 01, pts: 11264
Packet received for stream 01, pts: 12288
Packet received for stream 01, pts: 13312
Packet received for stream 01, pts: 14336
Packet received for stream 01, pts: 15360
Packet received for stream 01, pts: 16384
Packet received for stream 01, pts: 17408
Packet received for stream 01, pts: 18432
Packet received for stream 01, pts: 19456
Packet received for stream 01, pts: 20480
Packet received for stream 01, pts: 21504
Packet received for stream 00, pts: 800
Send video packet to decoder...
Frame 09, type: P, format: 0, pts: 800, keyframe: false
Packet received for stream 00, pts: 900
Send video packet to decoder...
Frame 10, type: P, format: 0, pts: 900, keyframe: false
- Previous: Use EGLStreams in a WPE WebKit backend