How to synchronize audio video stream (a/v sync)

How to a/v sync in IETF RFC?

The RFC specified how to do a/v sycn generally in https://tools.ietf.org/html/rfc6051

RTP flows are synchronised by receivers based on information that is
   contained in RTCP SR packets generated by senders (specifically, the
   NTP-format timestamp and the RTP timestamp).  Synchronisation
   requires that a common reference clock MUST be used to generate the
   NTP-format timestamps in a set of flows that are to be synchronised
   (i.e., when synchronising several RTP flows, the RTP timestamps for
   each flow are derived from separate, and media specific, clocks, but
   the NTP-format timestamps in the RTCP SR packets of all flows to be
   synchronised MUST be sampled from the same clock).  To achieve faster
   and more accurate synchronisation, it is further RECOMMENDED that
   senders and receivers use a synchronised common NTP-format reference
   clock with common properties, especially timebase, where possible
   (recognising that this is often not possible when RTP is used outside
   of controlled environments); the means by which that common reference
   clock and its properties are signalled and distributed is outside the
   scope of this memo.

A minimum reporting interval of 5 seconds is RECOMMENDED.

Rtcp Sender report

The sample sender report:

6.3.1 SR: Sender report RTCP packet

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|    RC   |   PT=SR=200   |             length            | header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         SSRC of sender                        |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|              NTP timestamp, most significant word             | sender
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ info
|             NTP timestamp, least significant word             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         RTP timestamp                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     sender's packet count                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      sender's octet count                     |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                 SSRC_1 (SSRC of first source)                 | report
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ block
| fraction lost |       cumulative number of packets lost       |   1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           extended highest sequence number received           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      interarrival jitter                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         last SR (LSR)                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   delay since last SR (DLSR)                  |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                 SSRC_2 (SSRC of second source)                | report
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ block
:                               ...                             :   2
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                  profile-specific extensions                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

How Webrtc send sender report

the releated code is at:

bool RTCPSender::TimeToSendRTCPReport(bool sendKeyframeBeforeRTP) const {
/*
For audio we use a configurable interval (default: 5 seconds)

For video we use a configurable interval (default: 1 second) for a BW
smaller than 360 kbit/s, technicaly we break the max 5% RTCP BW for
video below 10 kbit/s but that should be extremely rare

// how to build sender report

std::unique_ptr<rtcp::RtcpPacket> RTCPSender::BuildSR(const RtcpContext& ctx) {
// Timestamp shouldn’t be estimated before first media frame.
RTC_DCHECK_GE(last_frame_capture_time_ms_, 0);
// The timestamp of this RTCP packet should be estimated as the timestamp of
// the frame being captured at this moment. We are calculating that
// timestamp as the last frame’s timestamp + the time since the last frame
// was captured.
int rtp_rate = rtp_clock_rates_khz_[last_payload_type_];
if (rtp_rate <= 0) {
rtp_rate =
(audio_ ? kBogusRtpRateForAudioRtcp : kVideoPayloadTypeFrequency) /
1000;
}
// Round now_us_ to the closest millisecond, because Ntp time is rounded
// when converted to milliseconds,
uint32_t rtp_timestamp =
timestamp_offset_ + last_rtp_timestamp_ +
((ctx.now_us_ + 500) / 1000 – last_frame_capture_time_ms_) * rtp_rate;

rtcp::SenderReport* report = new rtcp::SenderReport();
report->SetSenderSsrc(ssrc_);
report->SetNtp(TimeMicrosToNtp(ctx.now_us_));
report->SetRtpTimestamp(rtp_timestamp);
report->SetPacketCount(ctx.feedback_state_.packets_sent);
report->SetOctetCount(ctx.feedback_state_.media_bytes_sent);
report->SetReportBlocks(CreateReportBlocks(ctx.feedback_state_));

return std::unique_ptr<rtcp::RtcpPacket>(report);
}

How Webrtc handle when received Sender Report

when webrtc received sender report, it try to calculate the playout(delay) time :

rtp/rtcp delay + decode delay + render delay = playout delay ( for a/v)

The main code is at:

https://cs.chromium.org/chromium/src/third_party/webrtc/video/rtp_streams_synchronizer.cc

RtpStreamsSynchronizer::Process() , which is const running, if diff is within 30ms, do nothing, otherwise

 syncable_audio_->SetMinimumPlayoutDelay(target_audio_delay_ms);
 syncable_video_->SetMinimumPlayoutDelay(target_video_delay_ms);

https://cs.chromium.org/chromium/src/third_party/webrtc/video/stream_synchronization.cc

ComputeRelativeDelay ( mainy from rtp/rtcp point of view) and ComputeDelays ( mainly for playout delay)

How to synchronize audio video stream (a/v sync)

ByMin Wang

How to a/v sync in IETF RFC?

Rtcp Sender report

6.3.1 SR: Sender report RTCP packet

How Webrtc send sender report

How Webrtc handle when received Sender Report

By Min Wang

Related Post

sip monitoring tools

cross compile native libwebrtc ( on x86_64 host) for arm64

SIP Register and Kamailio AoR

You missed

troubleshooing missing ip in k8s ( metallb-system)

Q&A: Fine-Tuning and Guidance on diffusion models

coding judge system

what is std::forward and universal reference