How to a/v sync in IETF RFC?
The RFC specified how to do a/v sycn generally in https://tools.ietf.org/html/rfc6051
RTP flows are synchronised by receivers based on information that is contained in RTCP SR packets generated by senders (specifically, the NTP-format timestamp and the RTP timestamp). Synchronisation requires that a common reference clock MUST be used to generate the NTP-format timestamps in a set of flows that are to be synchronised (i.e., when synchronising several RTP flows, the RTP timestamps for each flow are derived from separate, and media specific, clocks, but the NTP-format timestamps in the RTCP SR packets of all flows to be synchronised MUST be sampled from the same clock). To achieve faster and more accurate synchronisation, it is further RECOMMENDED that senders and receivers use a synchronised common NTP-format reference clock with common properties, especially timebase, where possible (recognising that this is often not possible when RTP is used outside of controlled environments); the means by which that common reference clock and its properties are signalled and distributed is outside the scope of this memo.
A minimum reporting interval of 5 seconds is RECOMMENDED.
Rtcp Sender report
The sample sender report:
6.3.1 SR: Sender report RTCP packet
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P| RC | PT=SR=200 | length | header +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC of sender | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | NTP timestamp, most significant word | sender +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ info | NTP timestamp, least significant word | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sender's packet count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sender's octet count | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | SSRC_1 (SSRC of first source) | report +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ block | fraction lost | cumulative number of packets lost | 1 -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | extended highest sequence number received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | interarrival jitter | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | last SR (LSR) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | delay since last SR (DLSR) | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | SSRC_2 (SSRC of second source) | report +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ block : ... : 2 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | profile-specific extensions | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
How Webrtc send sender report
the releated code is at:
bool RTCPSender::TimeToSendRTCPReport(bool sendKeyframeBeforeRTP) const {
/*
For audio we use a configurable interval (default: 5 seconds)
For video we use a configurable interval (default: 1 second) for a BW
smaller than 360 kbit/s, technicaly we break the max 5% RTCP BW for
video below 10 kbit/s but that should be extremely rare
// how to build sender report
std::unique_ptr<rtcp::RtcpPacket> RTCPSender::BuildSR(const RtcpContext& ctx) {
// Timestamp shouldn’t be estimated before first media frame.
RTC_DCHECK_GE(last_frame_capture_time_ms_, 0);
// The timestamp of this RTCP packet should be estimated as the timestamp of
// the frame being captured at this moment. We are calculating that
// timestamp as the last frame’s timestamp + the time since the last frame
// was captured.
int rtp_rate = rtp_clock_rates_khz_[last_payload_type_];
if (rtp_rate <= 0) {
rtp_rate =
(audio_ ? kBogusRtpRateForAudioRtcp : kVideoPayloadTypeFrequency) /
1000;
}
// Round now_us_ to the closest millisecond, because Ntp time is rounded
// when converted to milliseconds,
uint32_t rtp_timestamp =
timestamp_offset_ + last_rtp_timestamp_ +
((ctx.now_us_ + 500) / 1000 – last_frame_capture_time_ms_) * rtp_rate;
rtcp::SenderReport* report = new rtcp::SenderReport();
report->SetSenderSsrc(ssrc_);
report->SetNtp(TimeMicrosToNtp(ctx.now_us_));
report->SetRtpTimestamp(rtp_timestamp);
report->SetPacketCount(ctx.feedback_state_.packets_sent);
report->SetOctetCount(ctx.feedback_state_.media_bytes_sent);
report->SetReportBlocks(CreateReportBlocks(ctx.feedback_state_));
return std::unique_ptr<rtcp::RtcpPacket>(report);
}
How Webrtc handle when received Sender Report
when webrtc received sender report, it try to calculate the playout(delay) time :
rtp/rtcp delay + decode delay + render delay = playout delay ( for a/v)
The main code is at:
https://cs.chromium.org/chromium/src/third_party/webrtc/video/rtp_streams_synchronizer.cc
RtpStreamsSynchronizer::Process() , which is const running, if diff is within 30ms, do nothing, otherwise
syncable_audio_->SetMinimumPlayoutDelay(target_audio_delay_ms); syncable_video_->SetMinimumPlayoutDelay(target_video_delay_ms);
https://cs.chromium.org/chromium/src/third_party/webrtc/video/stream_synchronization.cc
ComputeRelativeDelay ( mainy from rtp/rtcp point of view) and ComputeDelays ( mainly for playout delay)