在 Java 中使用 WebRTC 传输视频——端口限制和自定义编解码-阿里云开发者社区

引言

在本文中，我们将继续介绍一些对 WebRTC Native Lib 的覆写过程，主要涉及如何限制端口的使用以及如何重写编解码过程。其他在 Java 中使用 WebRTC 的经验均收录于 <在 Java 中使用 WebRTC> 中，对这个方向感兴趣的同学可以翻阅一下。本文源代码可通过扫描文章下方的公众号获取或付费下载。

限制连接端口

回顾一下之前进行端口限制的完成流程，在 创建PeerConnectionFactory 的时候，我们实例化了一个SocketFactory和一个默认的NetworkManager，随后在 创建PeerConnection 的时候，我们通过这两个实例创建了一个PortAllocator，并将这个PortAllocator注入到PeerConnection中。整个流程中，真正做端口限制的代码都在SocketFactory中，当然，也用到了PortAllocator的API。这里你可能会有疑问，PortAllocator中不是有接口可以限制端口范围吗，怎么还需要SocketFactory？

   std::unique_ptr<cricket::PortAllocator> port_allocator(
   new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
   port_allocator->SetPortRange(this->min_port, this->max_port); // Port allocator的端口限制API

我当时也是只通过这个API设置了端口，但是我发现它还是会申请限制之外的端口来做一些别的事情，所以最后我直接复写了SocketFactory，将所有非法端口的申请都给禁掉了，此外因为我们的服务器上还有一些不能用的子网IP，我也在SocketFactory中进行了处理，我的实现内容如下：

rtc::AsyncPacketSocket *
rtc::SocketFactoryWrapper::CreateUdpSocket(const rtc::SocketAddress &local_address, uint16_t min_port,
                                            uint16_t max_port) {
    // 端口非法判断
    if (min_port < this->min_port || max_port > this->max_port) {
        WEBRTC_LOG("Create udp socket cancelled, port out of range, expect port range is:" +
                    std::to_string(this->min_port) + "->" + std::to_string(this->max_port)
                    + "parameter port range is: " + std::to_string(min_port) + "->" + std::to_string(max_port),
                    LogLevel::INFO);
        return nullptr;
    // IP非法判断
    if (!local_address.IsPrivateIP() || local_address.HostAsURIString().find(this->white_private_ip_prefix) == 0) {
        rtc::AsyncPacketSocket *result = BasicPacketSocketFactory::CreateUdpSocket(local_address, min_port, max_port);
        const auto *address = static_cast<const void *>(result);
        std::stringstream ss;
        ss << address;
        WEBRTC_LOG("Create udp socket, min port is:" + std::to_string(min_port) + ", max port is: " +
                    std::to_string(max_port) + ", result is: " + result->GetLocalAddress().ToString() + "->" +
                    result->GetRemoteAddress().ToString() + ", new socket address is: " + ss.str(), LogLevel::INFO);
        return result;
    } else {
        WEBRTC_LOG("Create udp socket cancelled, this ip is not in while list:" + local_address.HostAsURIString(),
                    LogLevel::INFO);
        return nullptr;
}

自定义视频编码

您可能已经知道了，WebRTC技术默认是使用VP8进行编码的，而普遍的观点是VP8并没有H264好。此外Safari是不支持VP8编码的，所以在与Safari进行通讯的时候WebRTC使用的是OpenH264进行视频编码，而OpenH264效率又没有libx264高，所以我对编码部分的改善主要就集中在：

替换默认编码方案为H264
基于FFmpeg使用libx264进行视频编码，并且当宿主机有较好的GPU时我会使用GPU进行加速（h264_nvenc）
支持运行时修改传输比特率

替换默认编码

替换默认编码方案为H264比较简单，我们只需要复写VideoEncoderFactory的 GetSupportedFormats ：

// Returns a list of supported video formats in order of preference, to use
// for signaling etc.
std::vector<webrtc::SdpVideoFormat> GetSupportedFormats() const override {
    return GetAllSupportedFormats();
// 这里我设置了只支持H264编码，打包模式为NonInterleaved
std::vector<webrtc::SdpVideoFormat> GetAllSupportedFormats() {
    std::vector<webrtc::SdpVideoFormat> supported_codecs;
    supported_codecs.emplace_back(CreateH264Format(webrtc::H264::kProfileBaseline, webrtc::H264::kLevel3_1, "1"));
    return supported_codecs;
webrtc::SdpVideoFormat CreateH264Format(webrtc::H264::Profile profile,
                                        webrtc::H264::Level level,
                                        const std::string &packetization_mode) {
    const absl::optional<std::string> profile_string =
            webrtc::H264::ProfileLevelIdToString(webrtc::H264::ProfileLevelId(profile, level));
    RTC_CHECK(profile_string);
    return webrtc::SdpVideoFormat(cricket::kH264CodecName,
                                    {{cricket::kH264FmtpProfileLevelId,        *profile_string},
                                    {cricket::kH264FmtpLevelAsymmetryAllowed, "1"},
                                    {cricket::kH264FmtpPacketizationMode,     packetization_mode}});
}

实现编码器

然后是基于FFmpeg对 VideoEncoder 接口的实现，对FFmpeg的使用我主要参考了官方Example 。然后简单看看我们需要实现VideoEncoder的什么接口吧：

FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware_accelerate);
~FFmpegH264EncoderImpl() override;
// |max_payload_size| is ignored.
// The following members of |codec_settings| are used. The rest are ignored.
// - codecType (must be kVideoCodecH264)
// - targetBitrate
// - maxFramerate
// - width
// - height
// 初始化编码器
int32_t InitEncode(const webrtc::VideoCodec *codec_settings,
                    int32_t number_of_cores,
                    size_t max_payload_size) override;
// 释放资源
int32_t Release() override;
// 当我们编码完成时，通过该回调上交视频帧
int32_t RegisterEncodeCompleteCallback(
        webrtc::EncodedImageCallback *callback) override;
// WebRTC自己的码率控制器，它会根据当前网络情况，修改码率
int32_t SetRateAllocation(const webrtc::VideoBitrateAllocation &bitrate_allocation,
                            uint32_t framerate) override;
// The result of encoding - an EncodedImage and RTPFragmentationHeader - are
// passed to the encode complete callback.
int32_t Encode(const webrtc::VideoFrame &frame,
                const webrtc::CodecSpecificInfo *codec_specific_info,
                const std::vector<webrtc::FrameType> *frame_types) override;

在实现这个接口时，参考了WebRTC官方的OpenH264Encoder，需要注意的是WebRTC是能支持Simulcast的，所以这个的编码实例可能会有多个，也就是说一个Stream对应一个编码实例。接下来，我讲逐步讲解我的实现方案，因为这个地方比较复杂。
先介绍一下我这里定义的结构体和成员变量吧：

// 用该结构体保存一个编码实例的所有相关资源
typedef struct {
    AVCodec *codec = nullptr;        //指向编解码器实例
    AVFrame *frame = nullptr;        //保存解码之后/编码之前的像素数据
    AVCodecContext *context = nullptr;    //编解码器上下文，保存编解码器的一些参数设置
    AVPacket *pkt = nullptr;        //码流包结构，包含编码码流数据
} CodecCtx;
// 编码器实例
std::vector<CodecCtx *> encoders_;
// 编码器参数
std::vector<LayerConfig> configurations_;
// 编码完成后的图片
std::vector<webrtc::EncodedImage> encoded_images_;
// 图片缓存部分
std::vector<std::unique_ptr<uint8_t[]>> encoded_image_buffers_;
// 编码相关配置
webrtc::VideoCodec codec_;
webrtc::H264PacketizationMode packetization_mode_;
size_t max_payload_size_;
int32_t number_of_cores_;
// 编码完成后的回调
webrtc::EncodedImageCallback *encoded_image_callback_;

构造函数部分比较简单，就是保存打包格式，以及申请空间：

FFmpegH264EncoderImpl::FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware)
        : packetization_mode_(webrtc::H264PacketizationMode::SingleNalUnit),
            max_payload_size_(0),
            hardware_accelerate(hardware),
            number_of_cores_(0),
            encoded_image_callback_(nullptr),
            has_reported_init_(false),
            has_reported_error_(false) {
    RTC_CHECK(cricket::CodecNamesEq(codec.name, cricket::kH264CodecName));
    std::string packetization_mode_string;
    if (codec.GetParam(cricket::kH264FmtpPacketizationMode,
                        &packetization_mode_string) &&
        packetization_mode_string == "1") {
        packetization_mode_ = webrtc::H264PacketizationMode::NonInterleaved;
    encoded_images_.reserve(webrtc::kMaxSimulcastStreams);
    encoded_image_buffers_.reserve(webrtc::kMaxSimulcastStreams);
    encoders_.reserve(webrtc::kMaxSimulcastStreams);
    configurations_.reserve(webrtc::kMaxSimulcastStreams);
}

然后是非常关键的初始化编码器过程，在这里我先是进行了一个检查，然后对每一个Stream创建相应的编码器实例：

int32_t FFmpegH264EncoderImpl::InitEncode(const webrtc::VideoCodec *inst,
                                            int32_t number_of_cores,
                                            size_t max_payload_size) {
    ReportInit();
    if (!inst || inst->codecType != webrtc::kVideoCodecH264) {
        ReportError();
        return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
    if (inst->maxFramerate == 0) {
        ReportError();
        return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
    if (inst->width < 1 || inst->height < 1) {
        ReportError();
        return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
    int32_t release_ret = Release();
    if (release_ret != WEBRTC_VIDEO_CODEC_OK) {
        ReportError();
        return release_ret;
    int number_of_streams = webrtc::SimulcastUtility::NumberOfSimulcastStreams(*inst);
    bool doing_simulcast = (number_of_streams > 1);
    if (doing_simulcast && (!webrtc::SimulcastUtility::ValidSimulcastResolutions(
            *inst, number_of_streams) ||
                            !webrtc::SimulcastUtility::ValidSimulcastTemporalLayers(
                                    *inst, number_of_streams))) {
        return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
    encoded_images_.resize(static_cast<unsigned long>(number_of_streams));
    encoded_image_buffers_.resize(static_cast<unsigned long>(number_of_streams));
    encoders_.resize(static_cast<unsigned long>(number_of_streams));
    configurations_.resize(static_cast<unsigned long>(number_of_streams));
    for (int i = 0; i < number_of_streams; i++) {
        encoders_[i] = new CodecCtx();
    number_of_cores_ = number_of_cores;
    max_payload_size_ = max_payload_size;
    codec_ = *inst;
    // Code expects simulcastStream resolutions to be correct, make sure they are
    // filled even when there are no simulcast layers.
    if (codec_.numberOfSimulcastStreams == 0) {
        codec_.simulcastStream[0].width = codec_.width;
        codec_.simulcastStream[0].height = codec_.height;
    for (int i = 0, idx = number_of_streams - 1; i < number_of_streams;
        ++i, --idx) {
        // Temporal layers still not supported.
        if (inst->simulcastStream[i].numberOfTemporalLayers > 1) {
            Release();
            return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
        // Set internal settings from codec_settings
        configurations_[i].simulcast_idx = idx;
        configurations_[i].sending = false;
        configurations_[i].width = codec_.simulcastStream[idx].width;
        configurations_[i].height = codec_.simulcastStream[idx].height;
        configurations_[i].max_frame_rate = static_cast<float>(codec_.maxFramerate);
        configurations_[i].frame_dropping_on = codec_.H264()->frameDroppingOn;
        configurations_[i].key_frame_interval = codec_.H264()->keyFrameInterval;
        // Codec_settings uses kbits/second; encoder uses bits/second.
        configurations_[i].max_bps = codec_.maxBitrate * 1000;
        configurations_[i].target_bps = codec_.startBitrate * 1000;
        if (!OpenEncoder(encoders_[i], configurations_[i])) {
            Release();
            ReportError();
            return WEBRTC_VIDEO_CODEC_ERROR;
        // Initialize encoded image. Default buffer size: size of unencoded data.
        encoded_images_[i]._size =
                CalcBufferSize(webrtc::VideoType::kI420, codec_.simulcastStream[idx].width,
                                codec_.simulcastStream[idx].height);
        encoded_images_[i]._buffer = new uint8_t[encoded_images_[i]._size];
        encoded_image_buffers_[i].reset(encoded_images_[i]._buffer);
        encoded_images_[i]._completeFrame = true;
        encoded_images_[i]._encodedWidth = codec_.simulcastStream[idx].width;
        encoded_images_[i]._encodedHeight = codec_.simulcastStream[idx].height;
        encoded_images_[i]._length = 0;
    webrtc::SimulcastRateAllocator init_allocator(codec_);
    webrtc::BitrateAllocation allocation = init_allocator.GetAllocation(
            codec_.startBitrate * 1000, codec_.maxFramerate);
    return SetRateAllocation(allocation, codec_.maxFramerate);
// OpenEncoder函数是创建编码器的过程，这个函数中有一个隐晦的点是创建AVFrame时一定要记得设置为32内存对齐，这个之前我们在采集图像数据的时候提过
bool FFmpegH264EncoderImpl::OpenEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx, H264Encoder::LayerConfig &config) {
    int ret;
    /* find the mpeg1 video encoder */
#ifdef WEBRTC_LINUX
    if (hardware_accelerate) {
        ctx->codec = avcodec_find_encoder_by_name("h264_nvenc");
#endif
    if (!ctx->codec) {
        ctx->codec = avcodec_find_encoder_by_name("libx264");
    if (!ctx->codec) {
        WEBRTC_LOG("Codec not found", ERROR);
        return false;
    WEBRTC_LOG("Open encoder: " + std::string(ctx->codec->name) + ", and generate frame, packet", INFO);
    ctx->context = avcodec_alloc_context3(ctx->codec);
    if (!ctx->context) {
        WEBRTC_LOG("Could not allocate video codec context", ERROR);
        return false;
    config.target_bps = config.max_bps;
    SetContext(ctx, config, true);
    /* open it */
    ret = avcodec_open2(ctx->context, ctx->codec, nullptr);
    if (ret < 0) {
        WEBRTC_LOG("Could not open codec, error code:" + std::to_string(ret), ERROR);
        avcodec_free_context(&(ctx->context));
        return false;
    ctx->frame = av_frame_alloc();
    if (!ctx->frame) {
        WEBRTC_LOG("Could not allocate video frame", ERROR);
        return false;
    ctx->frame->format = ctx->context->pix_fmt;
    ctx->frame->width = ctx->context->width;
    ctx->frame->height = ctx->context->height;
    ctx->frame->color_range = ctx->context->color_range;
    /* the image can be allocated by any means and av_image_alloc() is
    * just the most convenient way if av_malloc() is to be used */
    ret = av_image_alloc(ctx->frame->data, ctx->frame->linesize, ctx->context->width, ctx->context->height,
                        ctx->context->pix_fmt, 32);
    if (ret < 0) {
        WEBRTC_LOG("Could not allocate raw picture buffer", ERROR);
        return false;
    ctx->frame->pts = 1;
    ctx->pkt = av_packet_alloc();
    return true;
// 设置FFmpeg编码器的参数
void FFmpegH264EncoderImpl::SetContext(CodecCtx *ctx, H264Encoder::LayerConfig &config, bool init) {
    if (init) {
        AVRational rational = {1, 25};
        ctx->context->time_base = rational;
        ctx->context->max_b_frames = 0;
        ctx->context->pix_fmt = AV_PIX_FMT_YUV420P;
        ctx->context->codec_type = AVMEDIA_TYPE_VIDEO;
        ctx->context->codec_id = AV_CODEC_ID_H264;
        ctx->context->gop_size = config.key_frame_interval;
        ctx->context->color_range = AVCOL_RANGE_JPEG;
        // 设置两个参数让编码过程更快
        if (std::string(ctx->codec->name) == "libx264") {
            av_opt_set(ctx->context->priv_data, "preset", "ultrafast", 0);
            av_opt_set(ctx->context->priv_data, "tune", "zerolatency", 0);
        av_log_set_level(AV_LOG_ERROR);
        WEBRTC_LOG("Init bitrate: " + std::to_string(config.target_bps), INFO);
    } else {
        WEBRTC_LOG("Change bitrate: " + std::to_string(config.target_bps), INFO);
    config.key_frame_request = true;
    ctx->context->width = config.width;
    ctx->context->height = config.height;
    ctx->context->bit_rate = config.target_bps * 0.7;
    ctx->context->rc_max_rate = config.target_bps * 0.85;
    ctx->context->rc_min_rate = config.target_bps * 0.1;
    ctx->context->rc_buffer_size = config.target_bps * 2; // buffer_size变化，触发libx264的码率编码，如果不设置这个前几条不生效
#ifdef WEBRTC_LINUX
    if (std::string(ctx->codec->name) == "h264_nvenc") { // 使用类似于Java反射的思想，设置h264_nvenc的码率
        NvencContext* nvenc_ctx = (NvencContext*)ctx->context->priv_data;
        nvenc_ctx->encode_config.rcParams.averageBitRate = ctx->context->bit_rate;
        nvenc_ctx->encode_config.rcParams.maxBitRate = ctx->context->rc_max_rate;
        return;
#endif
}

SetContext中的最后几行，主要是关于如何动态设置编码器码率，这些内容应该是整个编码器设置过程中最硬核的部分了，我正是通过这些来实现libx264以及h264_nvenc的 运行时 码率控制。
讲完了初始化编码器这一大块内容，让我们来放松一下，先看两个简单的接口，一个是编码回调的注册，一个是WebRTC中码率控制模块的注入，前面提过WebRTC会根据网络情况设置编码的码率。

int32_t FFmpegH264EncoderImpl::RegisterEncodeCompleteCallback(
        webrtc::EncodedImageCallback *callback) {
    encoded_image_callback_ = callback;
    return WEBRTC_VIDEO_CODEC_OK;
int32_t FFmpegH264EncoderImpl::SetRateAllocation(
        const webrtc::BitrateAllocation &bitrate,
        uint32_t new_framerate) {
    if (encoders_.empty())
        return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
    if (new_framerate < 1)
        return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
    if (bitrate.get_sum_bps() == 0) {
        // Encoder paused, turn off all encoding.
        for (auto &configuration : configurations_)
            configuration.SetStreamState(false);
        return WEBRTC_VIDEO_CODEC_OK;
    // At this point, bitrate allocation should already match codec settings.
    if (codec_.maxBitrate > 0)
        RTC_DCHECK_LE(bitrate.get_sum_kbps(), codec_.maxBitrate);
    RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.minBitrate);
    if (codec_.numberOfSimulcastStreams > 0)
        RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.simulcastStream[0].minBitrate);
    codec_.maxFramerate = new_framerate;
    size_t stream_idx = encoders_.size() - 1;
    for (size_t i = 0; i < encoders_.size(); ++i, --stream_idx) {
        // Update layer config.
        configurations_[i].target_bps = bitrate.GetSpatialLayerSum(stream_idx);
        configurations_[i].max_frame_rate = static_cast<float>(new_framerate);
        if (configurations_[i].target_bps) {
            configurations_[i].SetStreamState(true);
            SetContext(encoders_[i], configurations_[i], false);
        } else {
            configurations_[i].SetStreamState(false);
    return WEBRTC_VIDEO_CODEC_OK;
}

放松完了，让我们来看看最后一块难啃的骨头吧，没错，就是编码过程了，这块看似简单实则有个大坑。

int32_t FFmpegH264EncoderImpl::Encode(const webrtc::VideoFrame &input_frame,
                                        const webrtc::CodecSpecificInfo *codec_specific_info,
                                        const std::vector<webrtc::FrameType> *frame_types) {
    // 先进行一些常规检查
    if (encoders_.empty()) {
        ReportError();
        return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
    if (!encoded_image_callback_) {
        RTC_LOG(LS_WARNING)
            << "InitEncode() has been called, but a callback function "
            << "has not been set with RegisterEncodeCompleteCallback()";
        ReportError();
        return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
    // 获取视频帧
    webrtc::I420BufferInterface *frame_buffer = (webrtc::I420BufferInterface *) input_frame.video_frame_buffer().get();
    // 检查下一帧是否需要关键帧，一般进行码率变化时，会设定下一帧发送关键帧
    bool send_key_frame = false;
    for (auto &configuration : configurations_) {
        if (configuration.key_frame_request && configuration.sending) {
            send_key_frame = true;
            break;
    if (!send_key_frame && frame_types) {
        for (size_t i = 0; i < frame_types->size() && i < configurations_.size();
            ++i) {
            if ((*frame_types)[i] == webrtc::kVideoFrameKey && configurations_[i].sending) {
                send_key_frame = true;
                break;
    RTC_DCHECK_EQ(configurations_[0].width, frame_buffer->width());
    RTC_DCHECK_EQ(configurations_[0].height, frame_buffer->height());
    // Encode image for each layer.
    for (size_t i = 0; i < encoders_.size(); ++i) {
        // EncodeFrame input.
        copyFrame(encoders_[i]->frame, frame_buffer);
        if (!configurations_[i].sending) {
            continue;
        if (frame_types != nullptr) {
            // Skip frame?
            if ((*frame_types)[i] == webrtc::kEmptyFrame) {
                continue;
        // 控制编码器发送关键帧
        if (send_key_frame || encoders_[i]->frame->pts % configurations_[i].key_frame_interval == 0) {
            // API doc says ForceIntraFrame(false) does nothing, but calling this
            // function forces a key frame regardless of the |bIDR| argument's value.
            // (If every frame is a key frame we get lag/delays.)
            encoders_[i]->frame->key_frame = 1;
            encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_I;
            configurations_[i].key_frame_request = false;
        } else {
            encoders_[i]->frame->key_frame = 0;
            encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_P;
        // Encode!编码过程
        int got_output;
        int enc_ret;
        // 给编码器喂图片
        enc_ret = avcodec_send_frame(encoders_[i]->context, encoders_[i]->frame);
        if (enc_ret != 0) {
            WEBRTC_LOG("FFMPEG send frame failed, returned " + std::to_string(enc_ret), ERROR);
            ReportError();
            return WEBRTC_VIDEO_CODEC_ERROR;
        encoders_[i]->frame->pts++;
        while (enc_ret >= 0) {
            // 从编码器接受视频帧
            enc_ret = avcodec_receive_packet(encoders_[i]->context, encoders_[i]->pkt);
            if (enc_ret == AVERROR(EAGAIN) || enc_ret == AVERROR_EOF) {
                break;
            } else if (enc_ret < 0) {
                WEBRTC_LOG("FFMPEG receive frame failed, returned " + std::to_string(enc_ret), ERROR);
                ReportError();
                return WEBRTC_VIDEO_CODEC_ERROR;
            // 将编码器返回的帧转化为WebRTC需要的帧类型
            encoded_images_[i]._encodedWidth = static_cast<uint32_t>(configurations_[i].width);
            encoded_images_[i]._encodedHeight = static_cast<uint32_t>(configurations_[i].height);
            encoded_images_[i].SetTimestamp(input_frame.timestamp());
            encoded_images_[i].ntp_time_ms_ = input_frame.ntp_time_ms();
            encoded_images_[i].capture_time_ms_ = input_frame.render_time_ms();
            encoded_images_[i].rotation_ = input_frame.rotation();
            encoded_images_[i].content_type_ =
                    (codec_.mode == webrtc::VideoCodecMode::kScreensharing)
                    ? webrtc::VideoContentType::SCREENSHARE
                    : webrtc::VideoContentType::UNSPECIFIED;
            encoded_images_[i].timing_.flags = webrtc::VideoSendTiming::kInvalid;
            encoded_images_[i]._frameType = ConvertToVideoFrameType(encoders_[i]->frame);
            // Split encoded image up into fragments. This also updates
            // |encoded_image_|.
            // 这里就是前面提到的大坑，FFmpeg编码出来的视频帧每个NALU之间可能以0001作为头，也会出现以001作为头的情况
            // 而WebRTC只识别以0001作为头的NALU
            // 所以我接下来要处理一下编码器输出的视频帧，并生成一个RTC报文的头部来描述该帧的数据
            webrtc::RTPFragmentationHeader frag_header;
            RtpFragmentize(&encoded_images_[i], &encoded_image_buffers_[i], *frame_buffer, encoders_[i]->pkt,
                            &frag_header);
            av_packet_unref(encoders_[i]->pkt);
            // Encoder can skip frames to save bandwidth in which case
            // |encoded_images_[i]._length| == 0.
            if (encoded_images_[i]._length > 0) {
                // Parse QP.
                h264_bitstream_parser_.ParseBitstream(encoded_images_[i]._buffer,
                                                        encoded_images_[i]._length);
                h264_bitstream_parser_.GetLastSliceQp(&encoded_images_[i].qp_);
                // Deliver encoded image.
                webrtc::CodecSpecificInfo codec_specific;
                codec_specific.codecType = webrtc::kVideoCodecH264;
                codec_specific.codecSpecific.H264.packetization_mode =
                        packetization_mode_;
                codec_specific.codecSpecific.H264.simulcast_idx = static_cast<uint8_t>(configurations_[i].simulcast_idx);
                encoded_image_callback_->OnEncodedImage(encoded_images_[i],
                                                        &codec_specific, &frag_header);
    return WEBRTC_VIDEO_CODEC_OK;
}

下面就是进行NAL转换以及提取RTP头部信息的过程：

// Helper method used by FFmpegH264EncoderImpl::Encode.
// Copies the encoded bytes from |info| to |encoded_image| and updates the
// fragmentation information of |frag_header|. The |encoded_image->_buffer| may
// be deleted and reallocated if a bigger buffer is required.
// After OpenH264 encoding, the encoded bytes are stored in |info| spread out
// over a number of layers and "NAL units". Each NAL unit is a fragment starting
// with the four-byte start code {0,0,0,1}. All of this data (including the
// start codes) is copied to the |encoded_image->_buffer| and the |frag_header|
// is updated to point to each fragment, with offsets and lengths set as to
// exclude the start codes.
void FFmpegH264EncoderImpl::RtpFragmentize(webrtc::EncodedImage *encoded_image,
                                            std::unique_ptr<uint8_t[]> *encoded_image_buffer,
                                            const webrtc::VideoFrameBuffer &frame_buffer, AVPacket *packet,
                                            webrtc::RTPFragmentationHeader *frag_header) {
    std::list<int> data_start_index;
    std::list<int> data_length;
    int payload_length = 0;
    // 以001 或者 0001 作为开头的情况下，遍历出所有的NAL并记录NALU数据开始的下标和NALU数据长度
    for (int i = 2; i < packet->size; i++) {
        if (i > 2
            && packet->data[i - 3] == start_code[0]
            && packet->data[i - 2] == start_code[1]
            && packet->data[i - 1] == start_code[2]
            && packet->data[i] == start_code[3]) {
            if (!data_start_index.empty()) {
                data_length.push_back((i - 3 - data_start_index.back()));
            data_start_index.push_back(i + 1);
        } else if (packet->data[i - 2] == start_code[1] &&
                    packet->data[i - 1] == start_code[2] &&
                    packet->data[i] == start_code[3]) {
            if (!data_start_index.empty()) {
                data_length.push_back((i - 2 - data_start_index.back()));
            data_start_index.push_back(i + 1);
    if (!data_start_index.empty()) {
        data_length.push_back((packet->size - data_start_index.back()));
    for (auto &it : data_length) {
        payload_length += +it;
    // Calculate minimum buffer size required to hold encoded data.
    auto required_size = payload_length + data_start_index.size() * 4;
    if (encoded_image->_size < required_size) {
        // Increase buffer size. Allocate enough to hold an unencoded image, this
        // should be more than enough to hold any encoded data of future frames of
        // the same size (avoiding possible future reallocation due to variations in
        // required size).
        encoded_image->_size = CalcBufferSize(
                webrtc::VideoType::kI420, frame_buffer.width(), frame_buffer.height());
        if (encoded_image->_size < required_size) {
            // Encoded data > unencoded data. Allocate required bytes.
            WEBRTC_LOG("Encoding produced more bytes than the original image data! Original bytes: " +
                        std::to_string(encoded_image->_size) + ", encoded bytes: " + std::to_string(required_size) + ".",
                        WARNING);
            encoded_image->_size = required_size;
        encoded_image->_buffer = new uint8_t[encoded_image->_size];
        encoded_image_buffer->reset(encoded_image->_buffer);
    // Iterate layers and NAL units, note each NAL unit as a fragment and copy
    // the data to |encoded_image->_buffer|.
    int index = 0;
    encoded_image->_length = 0;
    frag_header->VerifyAndAllocateFragmentationHeader(data_start_index.size());
    for (auto it_start = data_start_index.begin(), it_length = data_length.begin();
        it_start != data_start_index.end(); ++it_start, ++it_length, ++index) {
        memcpy(encoded_image->_buffer + encoded_image->_length, start_code, sizeof(start_code));
        encoded_image->_length += sizeof(start_code);
        frag_header->fragmentationOffset[index] = encoded_image->_length;
        memcpy(encoded_image->_buffer + encoded_image->_length, packet->data + *it_start,
                static_cast<size_t>(*it_length));
        encoded_image->_length += *it_length;
        frag_header->fragmentationLength[index] = static_cast<size_t>(*it_length);
}

最后，是非常简单的编码器释放的过程：

int32_t FFmpegH264EncoderImpl::Release() {
    while (!encoders_.empty()) {
        CodecCtx *encoder = encoders_.back();
        CloseEncoder(encoder);
        encoders_.pop_back();
    configurations_.clear();
    encoded_images_.clear();
    encoded_image_buffers_.clear();
    return WEBRTC_VIDEO_CODEC_OK;
void FFmpegH264EncoderImpl::CloseEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx) {
    if (ctx) {
        if (ctx->context) {
            avcodec_close(ctx->context);
            avcodec_free_context(&(ctx->context));
        if (ctx->frame) {
            av_frame_free(&(ctx->frame));
        if (ctx->pkt) {
            av_packet_free(&(ctx->pkt));
        WEBRTC_LOG("Close encoder context and release context, frame, packet", INFO);
        delete ctx;
}

至此，我对WebRTC的使用经历就已经介绍完了，希望我的经验能帮到大家。能坚持看完的童鞋，我真的觉得很不容易，我都一度觉得这篇文章写的太冗长，涉及的内容太多了。但是，因为各个部分的内容环环相扣，拆开来描述又怕思路会断。所以是以一条常规使用流程为主，中间依次引入一些我的改动内容，最后以附加项的形式详细介绍我对WebRTC Native APIs的改动。
而且，我也是近期才开始写文章来分享经验，可能比较词穷描述的不是很到位，希望大家海涵。如果哪位童鞋发现我有什么说的不对的地方，希望能留言告诉我，我会尽可能地及时作出处理的。

文章说明

更多有价值的文章均收录于贝贝猫的文章目录

stun

创作声明：本文基于下列所有参考内容进行创作，其中可能涉及复制、修改或者转换，图片均来自网络，如有侵权请联系我，我会第一时间进行删除。

参考内容

[1] JNI的替代者—使用JNA访问Java外部功能接口
[2] Linux共享对象之编译参数fPIC
[3] Android JNI 使用总结
[4] FFmpeg 仓库