Audio and video synchronization method in a kind of network monitoring system
Technical field
The present invention relates to network monitoring field, especially a kind of Voice & Video synchronous method of network monitoring system.
Background technology
Be accompanied by the raising of Video Supervision Technique and monitor the extensive use in every field, in a lot of occasions, people start to pay attention to Voice Surveillance.No matter be mechanisms of public security organs, or the key unit such as airport, railway, bank, increasing high-quality safe protection engineering is badly in need of audio-visual simultaneous monitoring system clear, true to nature, and Voice Surveillance field has become the new highlight of security protection industry.There is adding of Voice Surveillance, just can take leave of " silent movie " epoch of simple video monitoring, be conducive to the comprehensive control to accident, carried out accurate evaluation and disposal.Voice applications, in monitoring, has been filled up to a large blank of safety-security area, is a great development direction of network monitoring in recent years.Yet the objects such as the Voice & Video in multi-medium data itself have strict time relationship, and Internet Transmission can cause this original time domain relation destroyed, and then causes sound and image in system cannot realize synchronized playback.Therefore, research and the realization of network monitoring system sound intermediate frequency and audio video synchronization technology, just seem especially important.But there is the problems such as complicated, synchronous accuracy is not high that realize in existing simultaneous techniques.
Summary of the invention
In order to overcome synchronous complicated, the synchronous not high deficiency of accuracy of realization of Voice & Video of existing network monitoring system, the invention provides a kind of audio and video synchronization method in the network monitoring system that simple, synchronous accuracy is higher of realizing.
The technical solution adopted for the present invention to solve the technical problems is:
In an audio and video synchronization method, described synchronous method comprises the following steps:
(1) hardware-compressed that video carries by TW2835 chip realizes data compression, generates H.264 data, and audio frequency generates G.729 data format by Software Compression; Then being sent into RTP storehouse encapsulates, sends;
Described RTP packet header comprises sequence number and timestamp, and in the process of transmitting of data, the sequence number in the RTP packet that each sends increases one by one, and timestamp is identifying the collection moment of audio, video data;
(2) after the packet receiving from network removes IP packet header and UDP packet header, first according to the loadtype in RTP packet header, determine RTP bag to put into audio frequency or screen buffer, and then according to the order of the sequence-number field in RTP bag, the load data in RTP bag is inserted in tram, buffering area;
After network data receives and to start, sound, the screen buffer appropriate packet that all can prestore; When the data of sound, video flowing, fill up and prestore behind district, start to play simultaneously; Synchronously take audio stream as the time leading, by adjusting object video, realize the synchronous of audio frequency and video.
Further, in described step (2), using the timestamp of audio frequency as the relative reference time, after playback starts, with constant speed, voice data is taken out and sends into decoder from buffering area, and write the timestamp A of buffering area the first blocks of data
t, then by A
ttimestamp V with screen buffer the first blocks of data
tcompare, according to both difference A
t-V
tdetermine the propelling movement speed of video data and the playback rate of video, specific implementation is:
2.1) as-100ms≤A
t-V
tduring≤100ms, audio frequency and video buffering is pressed normal speed propelling data, and the speed of broadcasting also remains unchanged;
2.2) when 100ms≤| A
t-V
t| during≤160ms, need to synchronously adjust;
If 1. 100ms≤A
t-V
t≤ 160ms, the leading video of audio frequency, accelerates the data in pushing video buffering, accelerates the playback rate of video, audio frequency and video timestamp is trended towards identical;
If 2.-160ms≤A
t-V
t≤-100ms, i.e. audio frequency hysteresis video, the propelling movement speed of the video buffer that slows down, the playback rate of reduction video, trends towards audio frequency and video timestamp identical;
2.3) as | A
t-V
t| during>=160ms, need to re-start synchronous;
If 1. A
t-V
t>=160ms, the serious leading video of audio frequency, abandons video packets the oldest in screen buffer, until A
t=V
t, start to start playback with normal speed;
If 2. A
t-V
t≤-160ms, the audio frequency video that seriously lags behind, abandons audio pack the oldest in audio buffer, until A
t=V
t, start to start playback with normal speed.
Further again, described buffering area is a data link table, and described data link table comprises back end, and every kind of Media Stream has two kinds of back end, and a kind of is idle back end FreeDatanode, and a kind of is back end BusyDatanode in using; When there being new RTP bag to receive, just apply for that a FreeDatanode is as BusyDatanode, write the media data of load in RTP bag and the sequence number of RTP bag, and according to this sequence number, this BusyDatanode is inserted into the tram of buffering area, to recover original time relationship of media data in buffering area; After data in BusyDatanode back end are sent into decoder and play, BusyDatanode will become FreeDatanode.When FreeDatanode uses, when has expired buffering area, the data in the oldest BusyDatanode are by deleted, and itself can change into FreeDatanode automatically.
Technical conceive of the present invention is: a complete network monitoring system comprises the collection compression of audio, video data, the links such as transmission, Internet Transmission, network reception, real-time synchronization realization of packing, different according to its residing position, system can be divided into 3 parts: equipment end (comprise and gather compression and packing transmission), the webserver (Internet Transmission), receiving terminal (comprising network reception and synchronous realization), as shown in Figure 1, its basic implementation procedure is as follows: receiving terminal sends control command, the audio, video data of notification server forwarding unit end; Transmitting terminal receives after instruction, by audio-video collection chip, gathers audio, video data, then compresses respectively (video format for H.264, audio format is for G.729), is packaged into RTP Packet Generation to the webserver afterwards according to Real-time Transport Protocol standard; The webserver is transmitted to receiving terminal after receiving these data, at receiving terminal, by dynamic buffering technology, Directshow technology etc., realizes the real-time synchronization playback to monitoring site audio, video data.
Beneficial effect of the present invention is mainly manifested in: 1) H.264 the compression of audio frequency and video adopts respectively and G.729, these two kinds of algorithms are all the algorithms with high compression rate of current popular, can effectively save bandwidth, improve the efficiency of transmission of network.2) adopt RTP host-host protocol, there is very strong live effect, can reproduce in real time field scene by network remote.3) it is short that audio-visual synchronization is adjusted the time, and synchronization accuracy is high, and the maximum step-out interval of system is 160ms.4) synchronous implementation complexity is low, has greatly simplified audio-visual synchronization algorithm, has improved efficiency.
Accompanying drawing explanation
Fig. 1 is the Organization Chart of network monitoring system.
Fig. 2 is the schematic diagram of audio, video data encapsulation process.
The schematic diagram in Tu3Shi dynamic buffering district.
Fig. 4 is the schematic diagram that audio frequency is play thread.
Fig. 5 is the schematic diagram of video playback thread.
Embodiment
Below in conjunction with accompanying drawing, the invention will be further described.
With reference to Fig. 1~Fig. 5, audio and video synchronization method in a kind of network monitoring system, described synchronous method comprises the following steps:
(1) hardware-compressed that video carries by TW2835 chip realizes data compression, generates H.264 data, and audio frequency generates G.729 data format by Software Compression; Then being sent into RTP storehouse encapsulates, sends;
Described RTP packet header comprises sequence number and timestamp, and in the process of transmitting of data, the sequence number in the RTP packet that each sends increases one by one, and timestamp is identifying the collection moment of audio, video data;
(2) after the packet receiving from network removes IP packet header and UDP packet header, first according to the loadtype in RTP packet header, determine RTP bag to put into audio frequency or screen buffer, and then according to the order of the sequence-number field in RTP bag, the load data in RTP bag is inserted in tram, buffering area;
After network data receives and to start, sound, the screen buffer appropriate packet that all can prestore; When the data of sound, video flowing, fill up and prestore behind district, start to play simultaneously; Synchronously take audio stream as the time leading, by adjusting object video, realize the synchronous of audio frequency and video.
In monitoring site, the audio, video data of equipment end collects by WM8731 chip and TW2835 chip respectively.Afterwards, the hardware-compressed that video carries by TW2835 chip realizes data compression, generates H.264 data, and audio frequency generates G.729 data format by Software Compression.Then sent into RTP storehouse and encapsulate, send, encapsulation process as shown in Figure 2.
In synchronous implementation procedure, these two fields of the sequence number in RTP packet header and timestamp are particularly important.In the process of transmitting of data, sequence number in the RTP packet that each sends increases one by one, to facilitate at receiving terminal, packet is sorted, to recover the original time relation of packet, and then overcome the impact that network congestion, server time delay etc. cause packet.Timestamp is even more important, and it is identifying the collection moment of audio, video data, is an of paramount importance scalar in Synchronization Control.These information are all to seal in the process of dressing up RTP bag and realize at audio, video data.
The webserver main effect in synchronous implementation procedure is exactly the transfer that realizes signaling and data.
Under the impact of the various factorss such as network delay, packet loss, the RTP bag front and back order that arrives receiving terminal can be entanglement, and even certain video packets has arrived receiving terminal, and corresponding audio pack is also in network with it.For this reason, need to be respectively Voice & Video at receiving terminal and be provided with one section of dynamic buffer, as shown in Figure 3.After the packet receiving from network removes IP packet header and UDP packet header, first according to the loadtype in RTP packet header (PT value), determine RTP bag to put into audio frequency or screen buffer.And then according to the order of the sequence-number field in RTP bag, the load data in RTP bag is inserted in tram, buffering area.In actual design, buffering area is a data link table, some nodes, consists of, and specific implementation is: 1) every kind of Media Stream has two kinds of back end, an idle back end FreeDatanode, a kind of is back end BusyDatanode in using.2) there is new RTP bag to receive, just apply for that a FreeDatanode is as BusyDatanode, write the media data of load in RTP bag and the sequence number of RTP bag, and according to this sequence number, this BusyDatanode is inserted into the tram of buffering area, to recover original time relationship of media data in buffering area.3) after the data in BusyDatanode back end are sent into decoder and play, BusyDatanode will become FreeDatanode.When FreeDatanode uses, when has expired buffering area, the data in the oldest BusyDatanode are by deleted, and itself can change into FreeDatanode automatically.Fig. 3 has shown the concrete structure of Datanode simultaneously, except data data, also have four signs Len, Key, SequNum, Timestamp, whether representative data section length, data are the part of key frame of video, the sequence number of packet, timestamp respectively.
The design of dynamic buffer is that receiving terminal is realized synchronous committed step, and it not only recovers the normal play order in Media Stream, is controlling realization synchronous between media simultaneously.After network data receives and to start, sound, the screen buffer appropriate packet that all can prestore.The length in district of prestoring not only wants to meet the demand of shaking in Media Stream that makes up, and the while meets the requirement of real-time again, is unlikely to allow period of reservation of number long.So the length general control in the district that prestores is in 500ms.When the data of sound, video flowing, fill up and prestore behind district, start to play simultaneously.When processing audio-visual synchronization broadcasting, need to select suitable reference stream.Because people's the sense of hearing is more responsive than vision, when fixed frequency sound is play, the change of time-out and speed all can make people be difficult to accept, bandwidth the lacking more than video that audio stream takies in addition.Therefore, synchronously take audio stream as the time leading, by adjusting object video, realize the synchronous of audio frequency and video.Herein, we using the timestamp of audio frequency as the relative reference time.After playback starts, with constant speed, voice data is taken out and sends into decoder from buffering area, and write the timestamp A of buffering area the first blocks of data
t.Then by A
ttimestamp V with screen buffer the first blocks of data
tcompare, according to both difference A
t-V
tdetermine the propelling movement speed of video data and the playback rate of video, specific implementation is:
1) as-100ms≤A
t-V
tduring≤100ms, the asynchrony phenomenon of the imperceptible audio frequency and video of people, Here it is synchronous region.In this case, audio frequency and video buffering is pressed normal speed propelling data, and the speed of broadcasting also remains unchanged.
2) when 100ms≤| A
t-V
t| during≤160ms, Here it is synchronous critical zone, need to synchronously adjust.
If 1. 100ms≤A
t-V
t≤ 160ms, the leading video of audio frequency, accelerates the data in pushing video buffering, accelerates the playback rate of video, audio frequency and video timestamp is trended towards identical.
If 2.-160ms≤A
t-V
t≤-100ms, i.e. audio frequency hysteresis video, the propelling movement speed of the video buffer that slows down, the playback rate of reduction video, trends towards audio frequency and video timestamp identical.
3) as | A
t-V
t| during>=160ms, people can obviously feel the asynchrony phenomenon of audio frequency and video, and Here it is step-out region need to re-start synchronous.
If 1. A
t-V
t>=160ms, the serious leading video of audio frequency, abandons video packets the oldest in screen buffer, until A
t=V
t, start to start playback with normal speed.
If 2. A
t-V
t≤-160ms, the audio frequency video that seriously lags behind, abandons audio pack the oldest in audio buffer, until A
t=V
t, start to start playback with normal speed.
Above-mentioned synchronisation control means has kept well the time relationship between audio stream and video flowing in the running of system, in long playing situation, also there will not be the phenomenon that audio frequency is leading or lag behind, also do not occur the discontinuous broadcasting phenomenon of audio stream or video flowing.
Audio frequency is play thread as shown in Figure 4, when audio buffer arrives, prestores after length, and the sign that audio frequency is play will be set to true, and program starts from buffering area reading out data.Because the frequency acquisition of native system transmitting terminal audio frequency is 8KHZ, be quantified as 16, so we read voice data by the speed with 16000 bytes per second, simultaneously by timestamp assignment in every voice data to A
tvariable, judges whether that according to the difference of itself and video time stamp it is directly sent into decoder plays.
Video playback thread is similar with the flow process that audio frequency is play thread, difference is that audio frequency broadcasting only need to adjust in re-synchronization, the adjustment of video playback has run through whole synchronous playing process, when the time tolerance of audio stream and video flowing changes, playback rate that need to be by adjusting video flowing is to reach audio-visual synchronization effect.