Trouble syncing libavformat/ffmpeg with x264 and RTP

https://stackoverflow.com//questions/11691921

12-12-2019
|

Question

I've been working on some streaming software that takes live feeds from various kinds of cameras and streams over the network using H.264. To accomplish this, I'm using the x264 encoder directly (with the "zerolatency" preset) and feeding NALs as they are available to libavformat to pack into RTP (ultimately RTSP). Ideally, this application should be as real-time as possible. For the most part, this has been working well.

Unfortunately, however, there is some sort of synchronization issue: any video playback on clients seems to show a few smooth frames, followed by a short pause, then more frames; repeat. Additionally, there appears to be approximately a 4-second delay. This happens with every video player I've tried: Totem, VLC, and basic gstreamer pipes.

I've boiled it all down to a somewhat small test case:

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <x264.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>

#define WIDTH       640
#define HEIGHT      480
#define FPS         30
#define BITRATE     400000
#define RTP_ADDRESS "127.0.0.1"
#define RTP_PORT    49990

struct AVFormatContext* avctx;
struct x264_t* encoder;
struct SwsContext* imgctx;

uint8_t test = 0x80;


void create_sample_picture(x264_picture_t* picture)
{
    // create a frame to store in
    x264_picture_alloc(picture, X264_CSP_I420, WIDTH, HEIGHT);

    // fake image generation
    // disregard how wrong this is; just writing a quick test
    int strides = WIDTH / 8;
    uint8_t* data = malloc(WIDTH * HEIGHT * 3);
    memset(data, test, WIDTH * HEIGHT * 3);
    test = (test << 1) | (test >> (8 - 1));

    // scale the image
    sws_scale(imgctx, (const uint8_t* const*) &data, &strides, 0, HEIGHT,
              picture->img.plane, picture->img.i_stride);
}

int encode_frame(x264_picture_t* picture, x264_nal_t** nals)
{
    // encode a frame
    x264_picture_t pic_out;
    int num_nals;
    int frame_size = x264_encoder_encode(encoder, nals, &num_nals, picture, &pic_out);

    // ignore bad frames
    if (frame_size < 0)
    {
        return frame_size;
    }

    return num_nals;
}

void stream_frame(uint8_t* payload, int size)
{
    // initalize a packet
    AVPacket p;
    av_init_packet(&p);
    p.data = payload;
    p.size = size;
    p.stream_index = 0;
    p.flags = AV_PKT_FLAG_KEY;
    p.pts = AV_NOPTS_VALUE;
    p.dts = AV_NOPTS_VALUE;

    // send it out
    av_interleaved_write_frame(avctx, &p);
}

int main(int argc, char* argv[])
{
    // initalize ffmpeg
    av_register_all();

    // set up image scaler
    // (in-width, in-height, in-format, out-width, out-height, out-format, scaling-method, 0, 0, 0)
    imgctx = sws_getContext(WIDTH, HEIGHT, PIX_FMT_MONOWHITE,
                            WIDTH, HEIGHT, PIX_FMT_YUV420P,
                            SWS_FAST_BILINEAR, NULL, NULL, NULL);

    // set up encoder presets
    x264_param_t param;
    x264_param_default_preset(&param, "ultrafast", "zerolatency");

    param.i_threads = 3;
    param.i_width = WIDTH;
    param.i_height = HEIGHT;
    param.i_fps_num = FPS;
    param.i_fps_den = 1;
    param.i_keyint_max = FPS;
    param.b_intra_refresh = 0;
    param.rc.i_bitrate = BITRATE;
    param.b_repeat_headers = 1; // whether to repeat headers or write just once
    param.b_annexb = 1;         // place start codes (1) or sizes (0)

    // initalize
    x264_param_apply_profile(&param, "high");
    encoder = x264_encoder_open(&param);

    // at this point, x264_encoder_headers can be used, but it has had no effect

    // set up streaming context. a lot of error handling has been ommitted
    // for brevity, but this should be pretty standard.
    avctx = avformat_alloc_context();
    struct AVOutputFormat* fmt = av_guess_format("rtp", NULL, NULL);
    avctx->oformat = fmt;

    snprintf(avctx->filename, sizeof(avctx->filename), "rtp://%s:%d", RTP_ADDRESS, RTP_PORT);
    if (url_fopen(&avctx->pb, avctx->filename, URL_WRONLY) < 0)
    {
        perror("url_fopen failed");
        return 1;
    }
    struct AVStream* stream = av_new_stream(avctx, 1);

    // initalize codec
    AVCodecContext* c = stream->codec;
    c->codec_id = CODEC_ID_H264;
    c->codec_type = AVMEDIA_TYPE_VIDEO;
    c->flags = CODEC_FLAG_GLOBAL_HEADER;
    c->width = WIDTH;
    c->height = HEIGHT;
    c->time_base.den = FPS;
    c->time_base.num = 1;
    c->gop_size = FPS;
    c->bit_rate = BITRATE;
    avctx->flags = AVFMT_FLAG_RTP_HINT;

    // write the header
    av_write_header(avctx);

    // make some frames
    for (int frame = 0; frame < 10000; frame++)
    {
        // create a sample moving frame
        x264_picture_t* pic = (x264_picture_t*) malloc(sizeof(x264_picture_t));
        create_sample_picture(pic);

        // encode the frame
        x264_nal_t* nals;
        int num_nals = encode_frame(pic, &nals);

        if (num_nals < 0)
            printf("invalid frame size: %d\n", num_nals);

        // send out NALs
        for (int i = 0; i < num_nals; i++)
        {
            stream_frame(nals[i].p_payload, nals[i].i_payload);
        }

        // free up resources
        x264_picture_clean(pic);
        free(pic);

        // stream at approx 30 fps
        printf("frame %d\n", frame);
        usleep(33333);
    }

    return 0;
}

This test shows black lines on a white background that should move smoothly to the left. It has been written for ffmpeg 0.6.5 but the problem can be reproduced on 0.8 and 0.10 (from what I've tested so far). I've taken some shortcuts in error handling to make this example as short as possible while still showing the problem, so please excuse some of the nasty code. I should also note that while an SDP is not used here, I have tried using that already with similar results. The test can be compiled with:

gcc -g -std=gnu99 streamtest.c -lswscale -lavformat -lx264 -lm -lpthread -o streamtest

It can be played with gtreamer directly:

gst-launch udpsrc port=49990 ! application/x-rtp,payload=96,clock-rate=90000 ! rtph264depay ! decodebin ! xvimagesink

You should immediately notice the stuttering. One common "fix" I've seen all over the Internet is to add sync=false to the pipeline:

gst-launch udpsrc port=49990 ! application/x-rtp,payload=96,clock-rate=90000 ! rtph264depay ! decodebin ! xvimagesink sync=false

This causes playback to be smooth (and near-realtime), but is a non-solution and only works with gstreamer. I'd like to fix the problem at the source. I've been able to stream with near-identical parameters using raw ffmpeg and haven't had any issues:

ffmpeg -re -i sample.mp4 -vcodec libx264 -vpre ultrafast -vpre baseline -b 400000 -an -f rtp rtp://127.0.0.1:49990 -an

So clearly I'm doing something wrong. But what is it?

Solution

1) You didn't set PTS for frames you send to libx264 (you probably should see "non-strictly-monotonic PTS" warnings) 2) You didn't set PTS/DTS for packets you send to libavformat's rtp muxer (I not 100% sure it need to be set but I guess it would be better. From source code it looks like rtp use PTS). 3) IMHO usleep(33333) is bad. It cause encoder to stall this time also (increasing latency) while you could encode next frame during this time even if you still don't need to send it by rtp.

P.S. btw you didn't set param.rc.i_rc_method to X264_RC_ABR so libx264 will use CRF 23 instead and ignore your "param.rc.i_bitrate = BITRATE". Also it can be good idea to use VBV when encoding for network sending.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow