WO2017050067A1

WO2017050067A1 - Video communication method, apparatus, and system

Info

Publication number: WO2017050067A1
Application number: PCT/CN2016/095549
Authority: WO
Inventors: 谢峰; 李乃鹏; 陈一帅; 郭宇春
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-09-25
Filing date: 2016-08-16
Publication date: 2017-03-30
Also published as: CN106559636A

Abstract

A video communication method, apparatus, and system. The method comprises: a sending end collects a video image and a voice signal, performs image semantic feature processing on the video image to acquire image semantic feature information, encodes the voice signal, acquires voice encoding information, and sends the image semantic feature information and the voice encoding information; a receiving end receives the image semantic feature information and the voice encoding information, calls an image semantic feature database, and generates a video image according to the image semantic feature information, the image semantic feature database comprising a mapping relationship between image semantic feature information and video image fragments, generates a voice signal according to the voice encoding information, and outputs the video image and the voice signal. According to the technical solution, in a transmission process, only image semantic feature information and voice encoding information are transmitted, and when the quality of a channel is relatively poor, a normal video image can also be continuously provided, thereby resolving the problem that video communication cannot be carried out normally when the quality of a channel is poor in the related art.

Description

Video communication method, device and system

Technical field

This document relates to, but is not limited to, the field of video communication applications, and relates to a video communication method, apparatus and system.

Background technique

Wireless video communication is a communication application mode that arises with the development of mobile Internet and intelligent mobile terminal devices. Compared with traditional video communication systems, wireless video communication applications have strong scalability and greater flexibility. At any time, any place, as long as the mobile device can access the network, you can make video calls, hold video conferences, and so on in real time. However, unlike general video communication, this convenience and speed make wireless video transmission have higher requirements on the quality of the network. The network not only needs to provide sufficient bandwidth for video transmission, but also requires time delay and bit error rate. limits. Because compressed video is very sensitive to transmission errors (such as packet loss, etc.), and the requirements for delay are very strict, and the inherent high bit error rate, severe channel interference, limited transmission bandwidth, and large fluctuations of the wireless channel are very It is difficult to provide reliable service quality assurance for video transmission.

The development of wireless communication technologies and intelligent mobile terminals has enabled more and more users to start using mobile terminals (mobile phones, tablets, notebook computers, special devices, etc.) for video communication. The current wireless video communication systems are in good channel quality. It can guarantee the basic communication quality, but it can not modify the video information captured by the local camera (including image and voice) or the video information transmitted by the other party. In the case of poor channel quality, the communication quality will be sharp. Drops, even normal communication is not guaranteed.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a video communication method, device and system, which solves the problem that the video communication cannot be normal video when the channel quality is poor in the related art.

An embodiment of the present invention provides a video communication method, including:

Collecting video images and voice signals;

Performing image semantic feature processing on the video image to acquire image semantic feature information; encoding the voice signal to obtain voice coding information;

And transmitting the image semantic feature information and the voice coding information.

Optionally, before performing the image semantic feature processing on the video image, the method further includes: acquiring channel information of the communication channel, and determining, according to the channel information, whether image semantic feature processing is required for the video image;

If the video image feature processing is not required to be performed on the video image, encoding the video image, acquiring image encoding information, and transmitting the image encoding information and the voice encoding information;

If image semantic feature processing is required on the video image, image semantic feature processing is performed on the video image.

Optionally, before the sending the image semantic feature information and the voice encoding information, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;

Sending the image semantic feature information and the voice coding information together if the condition for transmitting the image semantic feature information is met;

If the condition for transmitting the image semantic feature information is not satisfied, only the voice coding information is sent;

Before the sending the image encoding information and the voice encoding information, the method further includes:

Determining, according to the channel information, whether a condition for transmitting the image encoding information is satisfied;

And if the condition for transmitting the image coding information is met, transmitting the image coding information and the voice coding information;

If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted.

Optionally, before performing image semantic feature processing on the video image, the method further includes:

Receiving a control operation of the user, and determining, according to the control operation, whether the image semantic feature of the user needs to be kept secret;

If the image semantic feature of the user needs to be kept secret, image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image language is generated. Meaning characteristic information;

If the image semantic feature of the user is not required to be kept secret, the video image is encoded, the image encoding information is acquired, and the image encoding information and the voice encoding information are transmitted.

Optionally, the method further includes: sending an image data processing mode by using control information;

The image data processing mode includes processing based on image semantic features, or based on image encoding processing, or based on speech analysis processing.

The embodiment of the invention provides a video communication method, including:

Receiving image semantic feature information and voice coding information;

Calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment;

Generating a voice signal according to the voice encoded information;

The video image and the voice signal are output.

Optionally, the method further includes:

Receiving and parsing control information to obtain an image data processing mode;

The received data is processed and output according to the image data processing mode; wherein the image data processing mode includes: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.

Optionally, if the image data processing mode includes a voice analysis processing, the method further includes: performing semantic analysis on the voice coding information, converting the image semantic feature information, and generating the image according to the image semantic feature database. Video image.

Optionally, the method further includes: receiving normal video data, and establishing the image semantic feature database according to the normal video data.

The embodiment of the invention further provides a video communication method, including:

The transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to obtain image semantic feature information; encodes the voice signal to obtain voice coding information; and sends the image semantic feature information and the voice Coded information;

Receiving, by the receiving end, image semantic feature information and voice encoding information; calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes mapping of the image semantic feature information and the video image fragment a relationship; generating a speech signal based on the speech encoded information; outputting the video image and the speech signal.

The embodiment of the invention further provides a video communication device, including:

The acquisition module is configured to collect video images and voice signals;

a processing module, configured to perform image semantic feature processing on the video image, acquire image semantic feature information, and encode the voice signal to obtain voice coding information;

a sending module, configured to send the image semantic feature information and the voice encoding information

Optionally, the device further includes:

The determining module is configured to obtain channel information of the communication channel, and determine, according to the channel information, whether image semantic feature processing is required for the video image; if image semantic feature processing is not required for the video image, the video is Encoding an image, acquiring image encoding information, and transmitting the image encoding information and the speech encoding information; if image semantic feature processing is required on the video image, performing image semantic feature processing on the video image to obtain the image Image semantic feature information.

Optionally, the determining module is further configured to:

Determining, according to the channel information, whether the condition for transmitting the image semantic feature information is met, and if the condition for transmitting the image semantic feature information is met, sending the image semantic feature information and the voice coding information together, if not satisfied Sending the condition of the image semantic feature information, and transmitting only the voice coding information;

If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted. .

Optionally, the device further includes:

The encryption module is configured to receive a control operation of the user, and determine whether it is required according to the control operation The image semantic feature of the user is to be kept secret; if the image semantic feature of the user needs to be kept secret, the processing module is triggered to perform image semantic feature processing on the video image to hide or replace or blur the image semantic feature of the user. Generating the image semantic feature information; if the image semantic feature of the user is not required to be kept secret, triggering the processing module to perform encoding processing on the video image, acquiring image encoding information, and transmitting the image encoding information and the Voice coded information.

Optionally, the sending module is further configured to send the image data processing mode by using control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or based on voice analysis deal with.

The embodiment of the invention provides a video communication device, including:

a receiving module, configured to receive image semantic feature information and voice encoding information;

a restoration module, configured to invoke an image semantic feature database, to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; Information generating a voice signal;

And an output module configured to output the video image and the voice signal.

Optionally, the restoring module is further configured to receive and parse the control information, acquire an image data processing mode, process the received data according to the image data processing mode, and output the image data processing mode: the image-based processing mode includes: Semantic feature processing, or based on image encoding processing, or based on speech analysis processing.

Optionally, the restoring module is further configured to: if the image data processing mode is based on a voice analysis process, performing semantic analysis on the voice coded information, converting into image semantic feature information, and according to the image semantic feature database Generate a video image.

Optionally, the apparatus further includes a training module configured to receive normal video data, and the image semantic feature database is established according to the normal video data.

The embodiment of the invention further provides a video communication system, including: a transmitting end and a receiving end;

The transmitting end includes the video communication device according to any one of the preceding claims, wherein the receiving end comprises the video communication device according to any one of the preceding claims.

The embodiment of the invention further provides a computer readable storage medium, the computer readable storage medium The computer executable instructions are stored in the medium, and the computer-executable instructions are executed to implement a video communication method on the transmitting side.

The embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method at the receiving end side is implemented.

Advantageous effects of embodiments of the present invention:

The embodiment of the invention provides a new video communication method. The transmitting end separates the collected video data to obtain a video image and a voice signal, and performs image semantic feature processing on the video image to obtain image semantic feature information, and sends the image. Semantic feature information and speech coding information, the receiving end calls the image semantic feature database, restores the video image according to the image semantic feature information, and outputs the video signal with the speech signal output. In the above technical solution, since only the image semantic feature information and the voice coding information are transmitted during the transmission process, the requirement for communication resources is greatly reduced compared with the method of directly transmitting the video data, and when the channel quality is poor, The normal video image is continued, which solves the problem that the video communication in the related art has an abnormal video when the channel quality is poor, and enhances the user experience. Other aspects will be apparent upon reading and understanding the drawings and detailed description.

DRAWINGS

1 is a schematic structural diagram of a video communication apparatus according to Embodiment 1 of the present invention;

2 is another schematic structural diagram of a video communication apparatus according to Embodiment 1 of the present invention;

3 is a schematic structural diagram of a video communication system according to Embodiment 1 of the present invention;

4 is a flowchart of a video communication method according to Embodiment 2 of the present invention;

FIG. 5 is a flowchart of a sending end of a video communication method according to Embodiment 2 of the present invention;

6 is a flowchart of a receiving end of a video communication method according to Embodiment 2 of the present invention;

FIG. 7 is a schematic diagram of a communication mode switching according to Embodiment 3 of the present invention; FIG.

FIG. 8 is a flowchart of a video communication method according to Embodiment 3 of the present invention.

detailed description

The present application is further explained in the following with reference to the specific embodiments and the accompanying drawings.

Embodiment 1:

As shown in FIG. 1 , an embodiment of the present invention provides a video communication apparatus 11 including:

The acquiring module 111 is configured to collect video images and voice signals;

The processing module 112 is configured to perform image semantic feature processing on the video image to acquire image semantic feature information, and encode the voice signal to obtain voice coding information.

The sending module 113 is configured to send image semantic feature information and voice coded information.

As shown in FIG. 1 , in some embodiments, the sending end 11 in the foregoing embodiment further includes a determining module 114 configured to acquire channel information of a communication channel, and determine, according to the channel information, whether a video image is needed. Image semantic feature processing; if image semantic feature processing is not required for video image, the video image is encoded, image encoding information is acquired, image encoding information and speech encoding information are transmitted; if image semantic feature processing is required on the video image, The image semantic feature processing is performed on the video image to obtain image semantic feature information.

Optionally, in some embodiments, the determining module 114 is further configured to

According to the channel information, it is judged whether the condition for transmitting the image semantic feature information is satisfied. If the condition for transmitting the image semantic feature information is satisfied, the sent image semantic feature information and the voice coding information are sent together, and if the condition for transmitting the image semantic feature information is not satisfied, only Transmitting the speech encoded information;

Determining whether the condition for transmitting the image coding information is satisfied according to the channel information;

If the condition for transmitting the image coding information is satisfied, the image coding information and the voice coding information are transmitted;

Optionally, as shown in FIG. 1 , in some embodiments, the video communication further includes an encryption module 115 configured to receive a control operation of the user, and determine, according to the control operation, whether the image semantic feature of the user needs to be kept secret; The image semantic feature of the user needs to be kept secret, and the trigger processing module performs image semantic feature processing on the video image to hide or replace or blur the image language of the user. The semantic feature generates image semantic feature information; if the image semantic feature of the user is not required to be kept secret, the trigger processing module encodes the video image, acquires image encoding information, and transmits image encoding information and voice encoding information.

Optionally, in some embodiments, the sending module 111 is further configured to send the image data processing mode by using the control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or Based on speech analysis processing.

As shown in FIG. 2, an embodiment of the present invention provides a video communication device 12, including:

The receiving module 121 is configured to receive image semantic feature information and voice encoding information;

The restoration module 122 is configured to invoke the image semantic feature database to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; and generating a voice signal according to the voice encoding information;

The output module 123 is configured to output a video image and a voice signal.

Optionally, in some embodiments, the restoration module 122 is further configured to receive and parse the control information, acquire an image data processing mode, process the received data according to the image data processing mode, and output; wherein the image data processing The modes include: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing. In this embodiment, the received data is different according to different image data processing modes, and the received data may be one or more of the following information: image semantic feature information, image coding information, or voice coding. information.

Optionally, in some embodiments, the restoring module 122 is further configured to: when the image data processing mode is based on the voice analysis process, perform semantic analysis on the voice coded information, convert the image into the image semantic feature information, and according to the image semantics The feature database generates a video image.

Optionally, as shown in FIG. 2, in some embodiments, the video communication device 12 further includes a training module 124 configured to receive normal video data and establish an image semantic feature database based on normal video data.

As shown in FIG. 3, the embodiment of the present invention further provides a schematic structural diagram of a video communication system, including a transmitting end 1 and a receiving end 2;

The transmitting end 1 comprises any of the aforementioned video communication devices 11, and the receiving end 2 comprises any of the aforementioned video communication devices 12.

Embodiment 2:

4 is a flowchart of a video communication method according to Embodiment 2 of the present invention. As shown in FIG. 4, in the embodiment, the management method provided by the embodiment of the present invention includes:

Step S201: the transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to acquire image semantic feature information; encodes the voice signal to obtain voice coded information; and sends image semantic feature information and voice coded information;

Step S202: The receiving end receives the image semantic feature information and the voice encoding information; invokes the image semantic feature database, and generates a video image according to the image semantic feature information; the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; The information generates a voice signal; the video image and the voice signal are output.

As shown in FIG. 5, in the embodiment, the embodiment of the video communication method provided by the embodiment of the present invention includes:

Step S301: collecting a video image and a voice signal;

Step S302: performing image semantic feature processing on the video image to acquire image semantic feature information; encoding the voice signal to obtain voice coding information;

Step S303: Send image semantic feature information and voice coding information.

Optionally, in some embodiments, before step S302, the method further includes: acquiring channel information of the communication channel, and determining, according to channel information (eg, channel quality, information delay, channel loss rate, etc.), whether the video is needed. The image is subjected to image semantic feature processing; if image semantic feature processing is not required for the video image, the video image is encoded (using commonly used H.264, H.265, etc.) to obtain image coding information. Sending image coding information and voice coding information; if image semantic feature processing is needed on the video image, performing image semantic feature processing on the video image to obtain image semantic feature information.

That is to say, optionally, after the video image and the voice signal are collected, it may be first determined whether the image semantic feature processing needs to be performed on the video image, and if the video image is required to be image semantically If the feature processing is performed, step S302 is performed; if image semantic feature processing is not required for the video image, then another execution flow is performed, that is, the flow of encoding the video image is performed.

Optionally, in some embodiments, before step S303, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;

If the condition for transmitting the semantic feature information of the image is satisfied, the sent image semantic feature information and the voice encoding information are sent together; if the condition for transmitting the image semantic feature information is not satisfied, only the voice encoding information is sent;

Determining whether the condition for transmitting the image encoding information is satisfied according to the channel information; if the condition for transmitting the image encoding information is satisfied, transmitting the image encoding information and the voice encoding information; if the condition for transmitting the image encoding information is not satisfied, transmitting only the voice encoding information .

Optionally, in some embodiments, before step S302 performs image semantic feature processing on the video image, the method further includes: receiving a control operation of the user, and determining, according to the control operation, whether the image semantic feature of the user needs to be kept secret; The image semantic feature of the user needs to be kept secret, and the image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image semantic feature information is generated;

Optionally, in some embodiments, the method further includes: transmitting the image data processing mode by using the control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or Based on speech analysis processing.

Correspondingly, as shown in FIG. 6, the embodiment of the video communication method provided by the embodiment of the present invention at the receiving end includes:

Step S401: Receive image semantic feature information and voice coding information;

Step S402: Invoking an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; and generating a voice signal according to the voice encoding information;

Step S403: Output a video image and a voice signal.

Optionally, in some embodiments, the method further includes: receiving and parsing the control information, acquiring an image data processing mode; processing the received data according to the image data processing mode and outputting; the image data processing mode includes: based on image semantics Feature processing, or based on image encoding processing, or based on speech analysis processing.

Optionally, in some embodiments, the method further includes: if the image data processing mode includes the voice analysis processing, performing semantic analysis on the voice coding information, converting the image semantic feature information, and generating the image according to the image semantic feature database. Video image.

Optionally, in some embodiments, the method further comprises: receiving normal video data, and establishing an image semantic feature database according to the normal video data.

Third embodiment:

This application is further explained in the context of a specific application scenario.

This embodiment proposes a wireless video communication system based on image semantic feature extraction and reproduction technology of video content, which can ensure normal communication while providing poor local channel quality, and can also provide users with local and counterpart video modification. And the opportunity to change, in order to achieve a better user experience.

The design idea of the wireless video communication system is to add a set of video image semantic feature extraction and reproduction module on the current wireless communication system, so as to use the copy of the video signal to extract the semantic features of the video image without affecting In normal video communication, after the channel quality deteriorates, the mode control module can freely switch to the sub-channel of the video image semantic feature extraction module for video communication. It can be embedded in the wireless communication system as part of the entire communication system or as a plug-in, which increases the flexibility of use and reduces the cost of retrofitting the wireless communication system.

The whole module mainly includes functional modules such as mode control, video image semantic feature extraction, feature database and feature synthesis. The video image semantic feature extraction module of the transmitting end and the receiving end should be a module with the same function, and the image detection, feature extraction and the like follow the same algorithm and standard.

The mode control module controls a complete video image semantic feature extraction and reproduction module, which receives channel quality feedback from the transmitting end and the receiving end (eg, signal strength information, channel quality information, delay) Information, buffer status information, mobile status information, etc., responsible for turning on or switching various communication modes.

The video image feature information extraction module is configured to parse the video image signal, perform feature detection, feature extraction, image cutting, etc. on the scene, the character, the expression, and the like in the video image, and may send the processed feature prototype and feature information into the image. The feature information is sent to the sender in the database. An implementation manner includes: the video image feature information extraction module directly obtains a copy of the transmission video from an upper layer of the transmitting end, and then parses the video image signal according to the system configuration, and extracts a feature prototype and feature information in the required video image. This extraction process may be a link in the video transmission process, that is, only transmitting the feature information, or may be independent of the video transmission process, that is, only to extract the feature prototype and not interfere with the video communication.

The feature database is configured to store feature prototypes and feature information transmitted by the video semantic feature information extraction module, and classify and store various feature prototypes and feature information according to system configuration, and transmit control signals according to the video feature synthesis module when needed. (or feature information) provides a feature prototype to the feature synthesis module. The feature prototype can be a mathematical model or a cropped image.

The feature synthesis module will recombine the feature prototypes passed by the feature database according to the system configuration, and then combine a complete image with the voice signal and send it to the video application to complete the video communication task.

As shown in Figure 7, the following describes the process of video passing for the following types of communication modes:

1, normal communication:

The transmitting end: the video application directly performs image encoding and voice encoding through the main channel, and sends it to the transmitting end to transmit to the receiving end through the channel. At this time, the mode control module does not interfere with the video communication, and does not transmit to the video feature extraction module. Video copy.

Receiving end: the received image encoding information and voice encoding information are directly handed over to the video application to complete the video communication, and the mode control module saves a video copy to the receiving end of the video feature extraction module, and the video feature extraction module according to the system The default configuration and other information is used to parse the video image (or image plus speech), extract the image prototype and feature information and send it to the feature database, which is mainly used to create and maintain the feature database.

2, analog communication:

When the channel quality deteriorates, the transmitter or receiver constantly monitors the channel quality, and the mode control module The channel feedback of the transmitting end or the receiving end is obtained, and the system enters the analog communication mode at any time according to the channel feedback.

The transmitting end: at this time, the video image signal and the voice signal submitted by the upper layer video application are processed differently in the mode control module, and the video image (or image plus voice) is transmitted to the video feature extraction module to extract the feature information. The video speech is obtained by speech coding to obtain speech coding information, and then the image feature information and the speech coding information are delivered to the transmitting end and transmitted to the channel. At this time, the video image information sent by the transmitting end will all come from the feature extraction module.

Receiver: After the mode control module obtains the video image feature information, it will be handed over to the feature synthesis module. The feature synthesis module analyzes the picture state of the video at this time by using the received video image feature information, and then synthesizes the complete video picture according to the pre-saved feature prototype (image template) obtained from the feature database according to the received image feature information. It is then sent to the upper video application along with the speech decoded speech signal. In addition, the speech signal can also be assisted into the feature synthesis module to improve the synthesis of the video picture with analysis of the speech, such as to better match the video picture (eg, mouth shape) and speech.

3. Mixed communication:

When the channel quality is unstable, the channel state at this time is insufficient to support full normal communication, but the channel state is superior to the channel requirement of analog communication, or the channel state is in a fast fluctuation state. At this point, the mode control module will turn on the hybrid communication mode according to a judgment criterion.

Transmitter: The mode control module performs fast switching between normal communication and analog communication according to a time parameter configuration. The time parameter configuration can be determined according to channel status or artificial regulations.

Receiver: After the receiver receives the video information, there are two submodes available. The first seed mode is a processing mode of switching video images according to control information of normal communication and analog communication. Another seed mode does not open the main channel even in normal communication (i.e., the encoded information of the video image is transmitted on the channel). The video image after communication decoding is sent to the video image feature extraction module to continuously update the feature database. On the other hand, the feature extraction module sends the feature information to the feature synthesis module for analog communication video frame synthesis, and the feature synthesis module sends the synthesized video image to the upper layer video application. In the case of analog communication (ie, feature information of a video image transmitted on a channel), the mode control module sends the received image feature information to the feature synthesis module for analog communication video frame synthesis. The purpose of this sub-mode is to provide users with consistent picture quality. Amount to avoid a bad user experience caused by the fast switching of normal communication and analog communication.

Feature prototypes and feature information pre-saved in the feature database may be created and maintained during normal communication, or may have been created for different users or proprietary channels, such as in the form of files (packages). .

4. Very poor channel communication:

Transmitter: In this state, the mode control module will completely turn off or ignore the video signal, and only encode the voice signal and send it to the receiving end through the channel.

Receiver: The mode control module passes the received voice signal (from the decoding of the voice coded information) to the feature synthesis module, and through semantic analysis, analyzes the possible state of the video frame at this time, and utilizes the feature information and image prototype in the database. The video picture is directly synthesized and sent to the video application along with the voice signal to maintain minimal video communication. If it is necessary to support the poor channel communication, in the normal communication, analog communication or hybrid communication, the receiving end needs to input the voice signal into the feature extraction module in the creation or maintenance of the feature database to establish the feature information based on the voice analysis and Correspondence between image feature prototypes.

The above functions are also applicable to GPRS (General Packet Radio Service)/CDMA (Code Division Multiple Access)/3G/4G/5G/WLAN (Wireless Local Area Network). Switching between standard wireless networks. In the video feature extraction module of the receiving end, the video parsing includes detecting character features, character expression features, background features, etc. in the video image, and then extracting corresponding image feature prototypes and feature information into the feature database, simultaneously at the same time The speech content is semantically analyzed, and the semantic features are extracted and stored in one-to-one correspondence with the feature information of the video image. The feature information pre-stored in the feature database may be created and maintained during normal communication before, or may be feature information that has been created or acquired for different users or proprietary channels. Modeling of the feature database is done, for example, before the user switches from a high throughput network to a low throughput network. The feature database can be matched/merged according to the identity of the caller, the geographical location, the time, or the image recognition result, to maintain the feature database, and can be used to enter the analog communication or the poor communication state when the communication is initiated.

As shown in FIG. 7, in the case where the feature database has been established, various communication modes can be flexibly switched.

The following describes the specific application scenarios.

scene 1,

Assume that in this scenario, the user determines that the current communication request information is confidential to other communication parties that do not have a feature database.

As shown in FIG. 8, in the embodiment, the video communication method provided by the embodiment of the present invention includes:

Step S501: The user sets a communication mode.

Before the video communication is formally established, the user determines that the communication request information is confidential to other communication parties without the feature database (for example, all or specified eyes are kept secret, all or designated faces are kept secret, or all or specified background is kept secret) The established video communication may be double communication, video conferencing, especially multi-person video conferencing. The mode control module passes the user configuration to the video image feature extraction module; the image capture device such as the camera is turned on, and the visual signal is collected. The video first enters the video image feature extraction module and does not enter the transmitting end through the main channel. At the same time, the sender sends a connection request to the receiver to request video communication.

Step S502: The sender transmits the video data in a secure manner;

The transmitting end detects each frame of the picture, finds the feature to be encrypted according to the user's requirement, cuts the picture, extracts the transmittable image, and then combines the extracted video image and sends it as the final video information. The terminal enters the channel.

In practical applications, the following two video image processing methods are specifically included:

Method 1: The video image feature extraction module at the transmitting end detects each frame of the image, and after finding the feature that needs to be kept secret according to the user's needs, cuts the image, and extracts the transmittable by hiding or replacing or blurring the feature that needs to be kept secret. The image is then encoded by the extracted video image. At the same time, the image feature extraction module also outputs image feature information, and then sends the image encoding information, the image feature information, and the speech encoding to the channel.

Manner 2: The video image feature extraction module at the transmitting end extracts feature information from the video image, and then replaces part of the feature information related to the feature that needs to be kept secret into feature information that does not need to be kept secret, and then sends the feature information and the voice code to the transmitting end. channel.

Step S503: The receiving end receives the video data.

The sender and the receiver must always check the channel quality after powering on, and feedback the channel quality in time. The control module selects the corresponding communication mode according to the channel quality feedback.

When the receiving end receives the connection request of the encrypted communication and the channel quality is good, the video signal is re-modified on the one hand in the mode control module, and the user or the system performs the modification work by default, and then passes the upper layer video application through the main channel, and On the one hand, a copy of the video signal is sent to the video feature extraction module at the receiving end.

Corresponding to the sender, it also includes the following two methods:

Method 1 (corresponding to the first method of the foregoing transmitting end): the receiving end decodes the received image encoding information to obtain an image signal, and synthesizes the image image together with the image feature information in the feature synthesizing module, and also receives the received speech encoding information. Decoding is performed to obtain a voice signal, and finally an image signal and a voice signal are output to an upper layer application or an external device.

Method 2 (corresponding to the second method of the foregoing transmitting end): the receiving end sends the received image feature information to the feature synthesizing module, synthesizes the image image based on the feature database, and further decodes the received speech encoding information to obtain a speech signal, and finally Output image signals and voice signals to an upper layer application or an external device.

Step S504: The receiving end establishes a feature database;

After receiving the copy of the video signal, the receiving end determines the current communication mode and the feature extraction mode according to the control information in the signal. After learning that the communication is currently encrypted, the module starts the feature extraction operation and cuts the video image. At the same time, the semantic analysis of the speech signal at the same time is carried out, and the mood characteristics of the user at this time are analyzed. After matching with the image features at the same time, the image features and semantic features are paired one by one and then transmitted to the feature database. Complete feature database modeling.

The video application at the receiving end directly receives the video signal and communicates.

Step S505: the communication mode is switched to analog communication, and the video communication is continued;

The channel quality deteriorates and is already below the preset threshold. The mode control module automatically switches the system to analog communication.

After the video feature extraction module of the sender obtains the analog communication command, combined with semantic analysis and image analysis, the user's expression state is judged, the user's expression feature in the video image is extracted, and the current expression feature is pre-agreed with the feature code or feature. Representation instead, then match the voice signal and pass it to the sender.

The transmitting end directly sends the compressed video signal transmitted by the feature extraction module to the channel, and at this time, the main channel of the transmitting end does not transmit any video information.

The mode selection module at the receiving end sends the signal directly to the receiving end video feature extraction module after obtaining the video signal, and simultaneously cuts off the main channel. The receiver end feature extraction module extracts the user emoticon image template saved in the normal communication state from the feature database according to the code name or feature representation in the signal, and sends the feature synthesizing module to the image synthesizing module. After obtaining the image template, the feature synthesis module performs image synthesis according to the feature information, and then directly combines the voice into the video application to complete the communication.

Step S506: The communication mode is switched to the poor channel communication, and the video communication is continued;

The channel quality deteriorates and is already below the preset threshold. The mode control module automatically switches the system to the poor channel communication mode.

The feature extraction module at the transmitting end directly strips the picture information in the video signal; the voice signal is greatly compressed and directly sent to the channel through the transmitting end.

After receiving the voice signal, the mode control module of the receiving end directly sends the video feature extraction module, and simultaneously cuts off the main channel. The receiving end video feature processing module performs semantic analysis on the received speech signal, extracts a feature code or feature representation method, extracts a user emoticon image template saved in a normal communication state from the feature database, and sends the image to the feature synthesizing module to perform an image. synthesis. After the feature synthesis module obtains the image template, the image is synthesized according to the feature information, and then combined with the voice and directly sent to the video application of the receiving end to complete the communication.

Scene 2

Assume that in this scenario, the user has established a feature database, and the user determines that the communication request part information is confidential to the recipient.

In this embodiment, the video communication method provided by the embodiment of the present invention includes:

Step S601: The user sets the communication mode.

Before the video communication is formally established, the user determines that the communication request part of the information is confidential to the recipient (for example, all or specified eyes are kept secret, all or specified faces are kept secret, or all or specified background is kept secret), the established video Communication may involve two-person communication, video conferencing, and especially multi-person video conferencing. The mode control module passes the user configuration to the video image feature extraction module; the image capture device such as the camera is turned on, and the visual signal is collected. Video first enters video image features The extraction module does not enter the sender through the main channel. At the same time, the sender sends a connection request to the receiver to request video communication.

Step S602: The transmitting end encrypts and transmits the video data.

The sender detects each frame of the picture, finds the feature to be encrypted according to the user's needs, cuts the picture, extracts the transmittable image by hiding or replacing or blurring the feature that needs to be encrypted, and then extracts the extracted video. The image is encoded and then sent to the sender to enter the channel along with the speech code.

Step S603: The receiving end receives the video data.

The receiving end decodes the received image coding information to obtain an image signal, decodes the received voice coding information to obtain a voice signal, and outputs an upper layer application or an external device to output an image signal and a voice signal.

In summary, according to the technical solution of the embodiment of the present invention, at least the following beneficial effects exist:

The embodiment of the invention provides a new video communication method. The transmitting end separates the collected video data to obtain a video image and a voice signal, and performs image semantic feature processing on the video image to obtain image semantic feature information, and sends the image. Semantic feature information and speech coding information, the receiving end calls the image semantic feature database, restores the video image according to the image semantic feature information, and outputs the video signal with the speech signal output. In the above technical solution, since only the image semantic feature information and the voice coding information are transmitted during the transmission process, the requirement for communication resources is greatly reduced compared with the method of directly transmitting the video data, and when the channel quality is poor, The normal video image is continued, which solves the problem that the video communication in the related art has an abnormal video when the channel quality is poor, and enhances the user experience.

The embodiment of the invention further provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method on the transmitting end side is implemented.

One of ordinary skill in the art will appreciate that all or part of the steps in the above methods may be passed through the program. The instructions are related to hardware (eg, a processor) that can be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the above embodiment may be implemented in the form of hardware, for example, by implementing an integrated circuit to implement its corresponding function, or may be implemented in the form of a software function module, for example, executing a program stored in the memory by a processor. / instruction to achieve its corresponding function. This application is not limited to any specific combination of hardware and software. A person skilled in the art should understand that the technical solutions of the present application can be modified or equivalent, without departing from the spirit and scope of the technical solutions of the present application, and should be included in the scope of the claims of the present application.

Industrial applicability

The above technical solution greatly reduces the requirement for communication resources, and can continue normal video images when the channel quality is poor, and solves the problem that the video communication has an abnormal video when the channel quality is poor in the related art, and the user is enhanced. The experience of using.

Claims

A video communication method includes:

Collecting video images and voice signals;

Performing image semantic feature processing on the video image to acquire image semantic feature information; encoding the voice signal to obtain voice coding information;

And transmitting the image semantic feature information and the voice coding information.
The video communication method according to claim 1,

Before performing image semantic feature processing on the video image, the method further includes: acquiring channel information of the communication channel, and determining, according to the channel information, whether image semantic feature processing is required on the video image;

If the video image feature processing is not required to be performed on the video image, encoding the video image, acquiring image encoding information, and transmitting the image encoding information and the voice encoding information;

If image semantic feature processing is required on the video image, image semantic feature processing is performed on the video image.
The video communication method according to claim 2,

Before the sending the image semantic feature information and the voice encoding information, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;

Sending the image semantic feature information and the voice coding information together if the condition for transmitting the image semantic feature information is met;

If the condition for transmitting the image semantic feature information is not satisfied, only the voice coding information is sent;

Before the sending the image encoding information and the voice encoding information, the method further includes:

Determining, according to the channel information, whether a condition for transmitting the image encoding information is satisfied;

And if the condition for transmitting the image coding information is met, transmitting the image coding information and the voice coding information;

If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted.
The video communication method according to claim 1, wherein image semantics are performed on said video image Before feature processing, it also includes:

Receiving a control operation of the user, and determining, according to the control operation, whether the image semantic feature of the user needs to be kept secret;

If the image semantic feature of the user needs to be kept secret, image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image semantic feature information is generated;

If the image semantic feature of the user is not required to be kept secret, the video image is encoded, the image encoding information is acquired, and the image encoding information and the voice encoding information are transmitted.
The video communication method according to any one of claims 1 to 4, further comprising: transmitting the image data processing mode through the control information;

The image data processing mode includes processing based on image semantic features, or based on image encoding processing, or based on speech analysis processing.
A video communication method includes:

Receiving image semantic feature information and voice coding information;

Calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment;

Generating a voice signal according to the voice encoded information;

The video image and the voice signal are output.
The video communication method of claim 6, the method further comprising:

Receiving and parsing control information to obtain an image data processing mode;

The received data is processed and output according to the image data processing mode; wherein the image data processing mode includes: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
The video communication method according to claim 7, wherein if the image data processing mode comprises speech-based analysis processing, the method further comprises: performing semantic analysis on the speech encoded information, converting into image semantic feature information, and The image semantic feature database generates a video image.
The video communication method according to any one of claims 6 to 8, the method further comprising: receiving normal video data, and establishing the image semantic feature database based on the normal video data.
A video communication method includes:

The transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to obtain image semantic feature information; encodes the voice signal to obtain voice coding information; and sends the image semantic feature information and the voice Coded information;

Receiving, by the receiving end, image semantic feature information and voice encoding information; calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes mapping of the image semantic feature information and the video image fragment a relationship; generating a speech signal based on the speech encoded information; outputting the video image and the speech signal.
A video communication device comprising:

The acquisition module is configured to collect video images and voice signals;

a processing module, configured to perform image semantic feature processing on the video image, acquire image semantic feature information, and encode the voice signal to obtain voice coding information;

And a sending module, configured to send the image semantic feature information and the voice encoding information.
The video communication device of claim 11 further comprising:

The determining module is configured to obtain channel information of the communication channel, and determine, according to the channel information, whether image semantic feature processing is required for the video image; if image semantic feature processing is not required for the video image, the video is Encoding an image, acquiring image encoding information, and transmitting the image encoding information and the speech encoding information; if image semantic feature processing is required on the video image, performing image semantic feature processing on the video image to obtain the image Image semantic feature information.
The video communication device of claim 12, wherein the determining module is further configured to:

Determining, according to the channel information, whether the condition for transmitting the image semantic feature information is met, and if the condition for transmitting the image semantic feature information is met, sending the image semantic feature information and the voice coding information together, if not satisfied Sending the condition of the image semantic feature information, and transmitting only the voice coding information;

Determining, according to the channel information, whether a condition for transmitting the image encoding information is satisfied;

And if the condition for transmitting the image coding information is met, transmitting the image coding information and the voice coding information;

If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted.
The video communication device of claim 11 further comprising:

The cryptographic module is configured to receive a control operation of the user, and determine, according to the control operation, whether the image semantic feature of the user needs to be kept secret; if the image semantic feature of the user needs to be kept secret, triggering, by the processing module, the video image Performing image semantic feature processing to hide or replace or blur the image semantic feature of the user to generate the image semantic feature information; if the image semantic feature of the user is not required to be kept secret, triggering the processing module to the video image Performing an encoding process, acquiring image encoding information, and transmitting the image encoding information and the speech encoding information.
The video communication device according to any one of claims 11 to 14, wherein: said transmitting module is further configured to transmit an image data processing mode through control information; wherein said image data processing mode comprises: based on image semantic features Processing, or based on image encoding processing, or based on speech analysis processing.
A video communication device comprising:

a receiving module, configured to receive image semantic feature information and voice encoding information;

a restoration module, configured to invoke an image semantic feature database, to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; Information generating a voice signal;

And an output module configured to output the video image and the voice signal.
The video communication device according to claim 16, wherein: said restoration module is further configured to receive and parse control information, acquire an image data processing mode; process the received data according to said image data processing mode and output; The image data processing mode includes: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
The video communication device according to claim 17, wherein: said restoration module is further configured to perform semantic analysis on said speech coding information and convert it into image semantic feature information if said image data processing mode is based on speech analysis processing And generating a video image according to the image semantic feature database.
The video communication device according to any one of claims 16 to 18, further comprising a training module configured to receive normal video data and to establish the image semantic feature database based on the normal video data.
A video communication system includes: a transmitting end and a receiving end;

The transmitting end comprises the video communication device according to any one of claims 11 to 15, and the receiving end comprises the video communication device according to any one of claims 16 to 19.