[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2017050067A1 - Video communication method, apparatus, and system - Google Patents

Video communication method, apparatus, and system Download PDF

Info

Publication number
WO2017050067A1
WO2017050067A1 PCT/CN2016/095549 CN2016095549W WO2017050067A1 WO 2017050067 A1 WO2017050067 A1 WO 2017050067A1 CN 2016095549 W CN2016095549 W CN 2016095549W WO 2017050067 A1 WO2017050067 A1 WO 2017050067A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
semantic feature
video
image semantic
Prior art date
Application number
PCT/CN2016/095549
Other languages
French (fr)
Chinese (zh)
Inventor
谢峰
李乃鹏
陈一帅
郭宇春
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017050067A1 publication Critical patent/WO2017050067A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • This document relates to, but is not limited to, the field of video communication applications, and relates to a video communication method, apparatus and system.
  • Wireless video communication is a communication application mode that arises with the development of mobile Internet and intelligent mobile terminal devices. Compared with traditional video communication systems, wireless video communication applications have strong scalability and greater flexibility. At any time, any place, as long as the mobile device can access the network, you can make video calls, hold video conferences, and so on in real time. However, unlike general video communication, this convenience and speed make wireless video transmission have higher requirements on the quality of the network. The network not only needs to provide sufficient bandwidth for video transmission, but also requires time delay and bit error rate. limits.
  • the development of wireless communication technologies and intelligent mobile terminals has enabled more and more users to start using mobile terminals (mobile phones, tablets, notebook computers, special devices, etc.) for video communication.
  • the current wireless video communication systems are in good channel quality. It can guarantee the basic communication quality, but it can not modify the video information captured by the local camera (including image and voice) or the video information transmitted by the other party. In the case of poor channel quality, the communication quality will be sharp. Drops, even normal communication is not guaranteed.
  • the embodiment of the invention provides a video communication method, device and system, which solves the problem that the video communication cannot be normal video when the channel quality is poor in the related art.
  • An embodiment of the present invention provides a video communication method, including:
  • the method before performing the image semantic feature processing on the video image, the method further includes: acquiring channel information of the communication channel, and determining, according to the channel information, whether image semantic feature processing is required for the video image;
  • the video image feature processing is not required to be performed on the video image, encoding the video image, acquiring image encoding information, and transmitting the image encoding information and the voice encoding information;
  • image semantic feature processing is required on the video image, image semantic feature processing is performed on the video image.
  • the method before the sending the image semantic feature information and the voice encoding information, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;
  • the method further includes:
  • the method before performing image semantic feature processing on the video image, the method further includes:
  • image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image language is generated. Meaning characteristic information;
  • the video image is encoded, the image encoding information is acquired, and the image encoding information and the voice encoding information are transmitted.
  • the method further includes: sending an image data processing mode by using control information;
  • the image data processing mode includes processing based on image semantic features, or based on image encoding processing, or based on speech analysis processing.
  • the embodiment of the invention provides a video communication method, including:
  • an image semantic feature database Calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment;
  • the video image and the voice signal are output.
  • the method further includes:
  • the received data is processed and output according to the image data processing mode; wherein the image data processing mode includes: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
  • the method further includes: performing semantic analysis on the voice coding information, converting the image semantic feature information, and generating the image according to the image semantic feature database.
  • Video image if the image data processing mode includes a voice analysis processing, the method further includes: performing semantic analysis on the voice coding information, converting the image semantic feature information, and generating the image according to the image semantic feature database.
  • the method further includes: receiving normal video data, and establishing the image semantic feature database according to the normal video data.
  • the embodiment of the invention further provides a video communication method, including:
  • the transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to obtain image semantic feature information; encodes the voice signal to obtain voice coding information; and sends the image semantic feature information and the voice Coded information;
  • Image semantic feature information and voice encoding information Receiving, by the receiving end, image semantic feature information and voice encoding information; calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes mapping of the image semantic feature information and the video image fragment a relationship; generating a speech signal based on the speech encoded information; outputting the video image and the speech signal.
  • the embodiment of the invention further provides a video communication device, including:
  • the acquisition module is configured to collect video images and voice signals
  • a processing module configured to perform image semantic feature processing on the video image, acquire image semantic feature information, and encode the voice signal to obtain voice coding information
  • a sending module configured to send the image semantic feature information and the voice encoding information
  • the device further includes:
  • the determining module is configured to obtain channel information of the communication channel, and determine, according to the channel information, whether image semantic feature processing is required for the video image; if image semantic feature processing is not required for the video image, the video is Encoding an image, acquiring image encoding information, and transmitting the image encoding information and the speech encoding information; if image semantic feature processing is required on the video image, performing image semantic feature processing on the video image to obtain the image Image semantic feature information.
  • the determining module is further configured to:
  • the device further includes:
  • the encryption module is configured to receive a control operation of the user, and determine whether it is required according to the control operation
  • the image semantic feature of the user is to be kept secret; if the image semantic feature of the user needs to be kept secret, the processing module is triggered to perform image semantic feature processing on the video image to hide or replace or blur the image semantic feature of the user.
  • Generating the image semantic feature information if the image semantic feature of the user is not required to be kept secret, triggering the processing module to perform encoding processing on the video image, acquiring image encoding information, and transmitting the image encoding information and the Voice coded information.
  • the sending module is further configured to send the image data processing mode by using control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or based on voice analysis deal with.
  • the embodiment of the invention provides a video communication device, including:
  • a receiving module configured to receive image semantic feature information and voice encoding information
  • a restoration module configured to invoke an image semantic feature database, to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; Information generating a voice signal;
  • an output module configured to output the video image and the voice signal.
  • the restoring module is further configured to receive and parse the control information, acquire an image data processing mode, process the received data according to the image data processing mode, and output the image data processing mode:
  • the image-based processing mode includes: Semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
  • the restoring module is further configured to: if the image data processing mode is based on a voice analysis process, performing semantic analysis on the voice coded information, converting into image semantic feature information, and according to the image semantic feature database Generate a video image.
  • the apparatus further includes a training module configured to receive normal video data, and the image semantic feature database is established according to the normal video data.
  • the embodiment of the invention further provides a video communication system, including: a transmitting end and a receiving end;
  • the transmitting end includes the video communication device according to any one of the preceding claims, wherein the receiving end comprises the video communication device according to any one of the preceding claims.
  • the embodiment of the invention further provides a computer readable storage medium, the computer readable storage medium
  • the computer executable instructions are stored in the medium, and the computer-executable instructions are executed to implement a video communication method on the transmitting side.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method at the receiving end side is implemented.
  • the embodiment of the invention provides a new video communication method.
  • the transmitting end separates the collected video data to obtain a video image and a voice signal, and performs image semantic feature processing on the video image to obtain image semantic feature information, and sends the image.
  • Semantic feature information and speech coding information the receiving end calls the image semantic feature database, restores the video image according to the image semantic feature information, and outputs the video signal with the speech signal output.
  • FIG. 1 is a schematic structural diagram of a video communication apparatus according to Embodiment 1 of the present invention.
  • FIG. 2 is another schematic structural diagram of a video communication apparatus according to Embodiment 1 of the present invention.
  • FIG. 3 is a schematic structural diagram of a video communication system according to Embodiment 1 of the present invention.
  • FIG. 4 is a flowchart of a video communication method according to Embodiment 2 of the present invention.
  • FIG. 5 is a flowchart of a sending end of a video communication method according to Embodiment 2 of the present invention.
  • FIG. 6 is a flowchart of a receiving end of a video communication method according to Embodiment 2 of the present invention.
  • FIG. 7 is a schematic diagram of a communication mode switching according to Embodiment 3 of the present invention.
  • FIG. 8 is a flowchart of a video communication method according to Embodiment 3 of the present invention.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • an embodiment of the present invention provides a video communication apparatus 11 including:
  • the acquiring module 111 is configured to collect video images and voice signals
  • the processing module 112 is configured to perform image semantic feature processing on the video image to acquire image semantic feature information, and encode the voice signal to obtain voice coding information.
  • the sending module 113 is configured to send image semantic feature information and voice coded information.
  • the sending end 11 in the foregoing embodiment further includes a determining module 114 configured to acquire channel information of a communication channel, and determine, according to the channel information, whether a video image is needed.
  • Image semantic feature processing if image semantic feature processing is not required for video image, the video image is encoded, image encoding information is acquired, image encoding information and speech encoding information are transmitted; if image semantic feature processing is required on the video image, The image semantic feature processing is performed on the video image to obtain image semantic feature information.
  • the determining module 114 is further configured to
  • the channel information it is judged whether the condition for transmitting the image semantic feature information is satisfied. If the condition for transmitting the image semantic feature information is satisfied, the sent image semantic feature information and the voice coding information are sent together, and if the condition for transmitting the image semantic feature information is not satisfied, only Transmitting the speech encoded information;
  • the image coding information and the voice coding information are transmitted;
  • the video communication further includes an encryption module 115 configured to receive a control operation of the user, and determine, according to the control operation, whether the image semantic feature of the user needs to be kept secret;
  • the image semantic feature of the user needs to be kept secret, and the trigger processing module performs image semantic feature processing on the video image to hide or replace or blur the image language of the user.
  • the semantic feature generates image semantic feature information; if the image semantic feature of the user is not required to be kept secret, the trigger processing module encodes the video image, acquires image encoding information, and transmits image encoding information and voice encoding information.
  • the sending module 111 is further configured to send the image data processing mode by using the control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or Based on speech analysis processing.
  • an embodiment of the present invention provides a video communication device 12, including:
  • the receiving module 121 is configured to receive image semantic feature information and voice encoding information
  • the restoration module 122 is configured to invoke the image semantic feature database to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; and generating a voice signal according to the voice encoding information;
  • the output module 123 is configured to output a video image and a voice signal.
  • the restoration module 122 is further configured to receive and parse the control information, acquire an image data processing mode, process the received data according to the image data processing mode, and output; wherein the image data processing The modes include: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
  • the received data is different according to different image data processing modes, and the received data may be one or more of the following information: image semantic feature information, image coding information, or voice coding. information.
  • the restoring module 122 is further configured to: when the image data processing mode is based on the voice analysis process, perform semantic analysis on the voice coded information, convert the image into the image semantic feature information, and according to the image semantics
  • the feature database generates a video image.
  • the video communication device 12 further includes a training module 124 configured to receive normal video data and establish an image semantic feature database based on normal video data.
  • the embodiment of the present invention further provides a schematic structural diagram of a video communication system, including a transmitting end 1 and a receiving end 2;
  • the transmitting end 1 comprises any of the aforementioned video communication devices 11, and the receiving end 2 comprises any of the aforementioned video communication devices 12.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the management method provided by the embodiment of the present invention includes:
  • Step S201 the transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to acquire image semantic feature information; encodes the voice signal to obtain voice coded information; and sends image semantic feature information and voice coded information;
  • Step S202 The receiving end receives the image semantic feature information and the voice encoding information; invokes the image semantic feature database, and generates a video image according to the image semantic feature information; the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; The information generates a voice signal; the video image and the voice signal are output.
  • the embodiment of the video communication method provided by the embodiment of the present invention includes:
  • Step S301 collecting a video image and a voice signal
  • Step S302 performing image semantic feature processing on the video image to acquire image semantic feature information; encoding the voice signal to obtain voice coding information;
  • Step S303 Send image semantic feature information and voice coding information.
  • the method before step S302, the method further includes: acquiring channel information of the communication channel, and determining, according to channel information (eg, channel quality, information delay, channel loss rate, etc.), whether the video is needed.
  • the image is subjected to image semantic feature processing; if image semantic feature processing is not required for the video image, the video image is encoded (using commonly used H.264, H.265, etc.) to obtain image coding information.
  • step S302 it may be first determined whether the image semantic feature processing needs to be performed on the video image, and if the video image is required to be image semantically If the feature processing is performed, step S302 is performed; if image semantic feature processing is not required for the video image, then another execution flow is performed, that is, the flow of encoding the video image is performed.
  • the method before step S303, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;
  • the sent image semantic feature information and the voice encoding information are sent together; if the condition for transmitting the image semantic feature information is not satisfied, only the voice encoding information is sent;
  • the method further includes:
  • the method before step S302 performs image semantic feature processing on the video image, the method further includes: receiving a control operation of the user, and determining, according to the control operation, whether the image semantic feature of the user needs to be kept secret; The image semantic feature of the user needs to be kept secret, and the image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image semantic feature information is generated;
  • the video image is encoded, the image encoding information is acquired, and the image encoding information and the voice encoding information are transmitted.
  • the method further includes: transmitting the image data processing mode by using the control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or Based on speech analysis processing.
  • the embodiment of the video communication method provided by the embodiment of the present invention at the receiving end includes:
  • Step S401 Receive image semantic feature information and voice coding information
  • Step S402 Invoking an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; and generating a voice signal according to the voice encoding information;
  • Step S403 Output a video image and a voice signal.
  • the method further includes: receiving and parsing the control information, acquiring an image data processing mode; processing the received data according to the image data processing mode and outputting; the image data processing mode includes: based on image semantics Feature processing, or based on image encoding processing, or based on speech analysis processing.
  • the method further includes: if the image data processing mode includes the voice analysis processing, performing semantic analysis on the voice coding information, converting the image semantic feature information, and generating the image according to the image semantic feature database. Video image.
  • the method further comprises: receiving normal video data, and establishing an image semantic feature database according to the normal video data.
  • This embodiment proposes a wireless video communication system based on image semantic feature extraction and reproduction technology of video content, which can ensure normal communication while providing poor local channel quality, and can also provide users with local and counterpart video modification. And the opportunity to change, in order to achieve a better user experience.
  • the design idea of the wireless video communication system is to add a set of video image semantic feature extraction and reproduction module on the current wireless communication system, so as to use the copy of the video signal to extract the semantic features of the video image without affecting
  • the mode control module can freely switch to the sub-channel of the video image semantic feature extraction module for video communication. It can be embedded in the wireless communication system as part of the entire communication system or as a plug-in, which increases the flexibility of use and reduces the cost of retrofitting the wireless communication system.
  • the whole module mainly includes functional modules such as mode control, video image semantic feature extraction, feature database and feature synthesis.
  • the video image semantic feature extraction module of the transmitting end and the receiving end should be a module with the same function, and the image detection, feature extraction and the like follow the same algorithm and standard.
  • the mode control module controls a complete video image semantic feature extraction and reproduction module, which receives channel quality feedback from the transmitting end and the receiving end (eg, signal strength information, channel quality information, delay) Information, buffer status information, mobile status information, etc., responsible for turning on or switching various communication modes.
  • channel quality feedback eg, signal strength information, channel quality information, delay
  • buffer status information e.g., buffer status information, mobile status information, etc.
  • the video image feature information extraction module is configured to parse the video image signal, perform feature detection, feature extraction, image cutting, etc. on the scene, the character, the expression, and the like in the video image, and may send the processed feature prototype and feature information into the image.
  • the feature information is sent to the sender in the database.
  • An implementation manner includes: the video image feature information extraction module directly obtains a copy of the transmission video from an upper layer of the transmitting end, and then parses the video image signal according to the system configuration, and extracts a feature prototype and feature information in the required video image.
  • This extraction process may be a link in the video transmission process, that is, only transmitting the feature information, or may be independent of the video transmission process, that is, only to extract the feature prototype and not interfere with the video communication.
  • the feature database is configured to store feature prototypes and feature information transmitted by the video semantic feature information extraction module, and classify and store various feature prototypes and feature information according to system configuration, and transmit control signals according to the video feature synthesis module when needed. (or feature information) provides a feature prototype to the feature synthesis module.
  • the feature prototype can be a mathematical model or a cropped image.
  • the feature synthesis module will recombine the feature prototypes passed by the feature database according to the system configuration, and then combine a complete image with the voice signal and send it to the video application to complete the video communication task.
  • the transmitting end the video application directly performs image encoding and voice encoding through the main channel, and sends it to the transmitting end to transmit to the receiving end through the channel.
  • the mode control module does not interfere with the video communication, and does not transmit to the video feature extraction module. Video copy.
  • Receiving end the received image encoding information and voice encoding information are directly handed over to the video application to complete the video communication, and the mode control module saves a video copy to the receiving end of the video feature extraction module, and the video feature extraction module according to the system
  • the default configuration and other information is used to parse the video image (or image plus speech), extract the image prototype and feature information and send it to the feature database, which is mainly used to create and maintain the feature database.
  • the transmitter or receiver constantly monitors the channel quality, and the mode control module The channel feedback of the transmitting end or the receiving end is obtained, and the system enters the analog communication mode at any time according to the channel feedback.
  • the transmitting end at this time, the video image signal and the voice signal submitted by the upper layer video application are processed differently in the mode control module, and the video image (or image plus voice) is transmitted to the video feature extraction module to extract the feature information.
  • the video speech is obtained by speech coding to obtain speech coding information, and then the image feature information and the speech coding information are delivered to the transmitting end and transmitted to the channel.
  • the video image information sent by the transmitting end will all come from the feature extraction module.
  • the mode control module After the mode control module obtains the video image feature information, it will be handed over to the feature synthesis module.
  • the feature synthesis module analyzes the picture state of the video at this time by using the received video image feature information, and then synthesizes the complete video picture according to the pre-saved feature prototype (image template) obtained from the feature database according to the received image feature information. It is then sent to the upper video application along with the speech decoded speech signal.
  • the speech signal can also be assisted into the feature synthesis module to improve the synthesis of the video picture with analysis of the speech, such as to better match the video picture (eg, mouth shape) and speech.
  • the mode control module will turn on the hybrid communication mode according to a judgment criterion.
  • the mode control module performs fast switching between normal communication and analog communication according to a time parameter configuration.
  • the time parameter configuration can be determined according to channel status or artificial regulations.
  • the first seed mode is a processing mode of switching video images according to control information of normal communication and analog communication. Another seed mode does not open the main channel even in normal communication (i.e., the encoded information of the video image is transmitted on the channel).
  • the video image after communication decoding is sent to the video image feature extraction module to continuously update the feature database.
  • the feature extraction module sends the feature information to the feature synthesis module for analog communication video frame synthesis, and the feature synthesis module sends the synthesized video image to the upper layer video application.
  • the mode control module sends the received image feature information to the feature synthesis module for analog communication video frame synthesis.
  • the purpose of this sub-mode is to provide users with consistent picture quality. Amount to avoid a bad user experience caused by the fast switching of normal communication and analog communication.
  • Feature prototypes and feature information pre-saved in the feature database may be created and maintained during normal communication, or may have been created for different users or proprietary channels, such as in the form of files (packages). .
  • the mode control module In this state, the mode control module will completely turn off or ignore the video signal, and only encode the voice signal and send it to the receiving end through the channel.
  • the mode control module passes the received voice signal (from the decoding of the voice coded information) to the feature synthesis module, and through semantic analysis, analyzes the possible state of the video frame at this time, and utilizes the feature information and image prototype in the database.
  • the video picture is directly synthesized and sent to the video application along with the voice signal to maintain minimal video communication. If it is necessary to support the poor channel communication, in the normal communication, analog communication or hybrid communication, the receiving end needs to input the voice signal into the feature extraction module in the creation or maintenance of the feature database to establish the feature information based on the voice analysis and Correspondence between image feature prototypes.
  • the video parsing includes detecting character features, character expression features, background features, etc. in the video image, and then extracting corresponding image feature prototypes and feature information into the feature database, simultaneously at the same time
  • the speech content is semantically analyzed, and the semantic features are extracted and stored in one-to-one correspondence with the feature information of the video image.
  • the feature information pre-stored in the feature database may be created and maintained during normal communication before, or may be feature information that has been created or acquired for different users or proprietary channels.
  • Modeling of the feature database is done, for example, before the user switches from a high throughput network to a low throughput network.
  • the feature database can be matched/merged according to the identity of the caller, the geographical location, the time, or the image recognition result, to maintain the feature database, and can be used to enter the analog communication or the poor communication state when the communication is initiated.
  • the user determines that the current communication request information is confidential to other communication parties that do not have a feature database.
  • the video communication method provided by the embodiment of the present invention includes:
  • Step S501 The user sets a communication mode.
  • the established video communication may be double communication, video conferencing, especially multi-person video conferencing.
  • the mode control module passes the user configuration to the video image feature extraction module; the image capture device such as the camera is turned on, and the visual signal is collected.
  • the video first enters the video image feature extraction module and does not enter the transmitting end through the main channel.
  • the sender sends a connection request to the receiver to request video communication.
  • Step S502 The sender transmits the video data in a secure manner
  • the transmitting end detects each frame of the picture, finds the feature to be encrypted according to the user's requirement, cuts the picture, extracts the transmittable image, and then combines the extracted video image and sends it as the final video information.
  • the terminal enters the channel.
  • Method 1 The video image feature extraction module at the transmitting end detects each frame of the image, and after finding the feature that needs to be kept secret according to the user's needs, cuts the image, and extracts the transmittable by hiding or replacing or blurring the feature that needs to be kept secret. The image is then encoded by the extracted video image. At the same time, the image feature extraction module also outputs image feature information, and then sends the image encoding information, the image feature information, and the speech encoding to the channel.
  • the video image feature extraction module at the transmitting end extracts feature information from the video image, and then replaces part of the feature information related to the feature that needs to be kept secret into feature information that does not need to be kept secret, and then sends the feature information and the voice code to the transmitting end. channel.
  • Step S503 The receiving end receives the video data.
  • the sender and the receiver must always check the channel quality after powering on, and feedback the channel quality in time.
  • the control module selects the corresponding communication mode according to the channel quality feedback.
  • the video signal is re-modified on the one hand in the mode control module, and the user or the system performs the modification work by default, and then passes the upper layer video application through the main channel, and On the one hand, a copy of the video signal is sent to the video feature extraction module at the receiving end.
  • Method 1 (corresponding to the first method of the foregoing transmitting end): the receiving end decodes the received image encoding information to obtain an image signal, and synthesizes the image image together with the image feature information in the feature synthesizing module, and also receives the received speech encoding information. Decoding is performed to obtain a voice signal, and finally an image signal and a voice signal are output to an upper layer application or an external device.
  • Method 2 (corresponding to the second method of the foregoing transmitting end): the receiving end sends the received image feature information to the feature synthesizing module, synthesizes the image image based on the feature database, and further decodes the received speech encoding information to obtain a speech signal, and finally Output image signals and voice signals to an upper layer application or an external device.
  • Step S504 The receiving end establishes a feature database
  • the receiving end After receiving the copy of the video signal, the receiving end determines the current communication mode and the feature extraction mode according to the control information in the signal. After learning that the communication is currently encrypted, the module starts the feature extraction operation and cuts the video image. At the same time, the semantic analysis of the speech signal at the same time is carried out, and the mood characteristics of the user at this time are analyzed. After matching with the image features at the same time, the image features and semantic features are paired one by one and then transmitted to the feature database. Complete feature database modeling.
  • the video application at the receiving end directly receives the video signal and communicates.
  • Step S505 the communication mode is switched to analog communication, and the video communication is continued;
  • the mode control module automatically switches the system to analog communication.
  • the video feature extraction module of the sender obtains the analog communication command, combined with semantic analysis and image analysis, the user's expression state is judged, the user's expression feature in the video image is extracted, and the current expression feature is pre-agreed with the feature code or feature. Representation instead, then match the voice signal and pass it to the sender.
  • the transmitting end directly sends the compressed video signal transmitted by the feature extraction module to the channel, and at this time, the main channel of the transmitting end does not transmit any video information.
  • the mode selection module at the receiving end sends the signal directly to the receiving end video feature extraction module after obtaining the video signal, and simultaneously cuts off the main channel.
  • the receiver end feature extraction module extracts the user emoticon image template saved in the normal communication state from the feature database according to the code name or feature representation in the signal, and sends the feature synthesizing module to the image synthesizing module.
  • the feature synthesis module After obtaining the image template, the feature synthesis module performs image synthesis according to the feature information, and then directly combines the voice into the video application to complete the communication.
  • Step S506 The communication mode is switched to the poor channel communication, and the video communication is continued;
  • the mode control module automatically switches the system to the poor channel communication mode.
  • the feature extraction module at the transmitting end directly strips the picture information in the video signal; the voice signal is greatly compressed and directly sent to the channel through the transmitting end.
  • the mode control module of the receiving end After receiving the voice signal, the mode control module of the receiving end directly sends the video feature extraction module, and simultaneously cuts off the main channel.
  • the receiving end video feature processing module performs semantic analysis on the received speech signal, extracts a feature code or feature representation method, extracts a user emoticon image template saved in a normal communication state from the feature database, and sends the image to the feature synthesizing module to perform an image. synthesis. After the feature synthesis module obtains the image template, the image is synthesized according to the feature information, and then combined with the voice and directly sent to the video application of the receiving end to complete the communication.
  • the user has established a feature database, and the user determines that the communication request part information is confidential to the recipient.
  • the video communication method provided by the embodiment of the present invention includes:
  • Step S601 The user sets the communication mode.
  • the established video Communication may involve two-person communication, video conferencing, and especially multi-person video conferencing.
  • the mode control module passes the user configuration to the video image feature extraction module; the image capture device such as the camera is turned on, and the visual signal is collected.
  • Video first enters video image features
  • the extraction module does not enter the sender through the main channel.
  • the sender sends a connection request to the receiver to request video communication.
  • Step S602 The transmitting end encrypts and transmits the video data.
  • the sender detects each frame of the picture, finds the feature to be encrypted according to the user's needs, cuts the picture, extracts the transmittable image by hiding or replacing or blurring the feature that needs to be encrypted, and then extracts the extracted video.
  • the image is encoded and then sent to the sender to enter the channel along with the speech code.
  • Step S603 The receiving end receives the video data.
  • the receiving end decodes the received image coding information to obtain an image signal, decodes the received voice coding information to obtain a voice signal, and outputs an upper layer application or an external device to output an image signal and a voice signal.
  • the embodiment of the invention provides a new video communication method.
  • the transmitting end separates the collected video data to obtain a video image and a voice signal, and performs image semantic feature processing on the video image to obtain image semantic feature information, and sends the image.
  • Semantic feature information and speech coding information the receiving end calls the image semantic feature database, restores the video image according to the image semantic feature information, and outputs the video signal with the speech signal output.
  • the embodiment of the invention further provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method on the transmitting end side is implemented.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method at the receiving end side is implemented.
  • the instructions are related to hardware (eg, a processor) that can be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk.
  • a computer readable storage medium such as a read only memory, a magnetic disk, or an optical disk.
  • all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits.
  • each module/unit in the above embodiment may be implemented in the form of hardware, for example, by implementing an integrated circuit to implement its corresponding function, or may be implemented in the form of a software function module, for example, executing a program stored in the memory by a processor. / instruction to achieve its corresponding function.
  • This application is not limited to any specific combination of hardware and software.
  • a person skilled in the art should understand that the technical solutions of the present application can be modified or equivalent, without departing from the spirit and scope of the technical solutions of the present application, and should be included in the scope of the claims of the present application.
  • the above technical solution greatly reduces the requirement for communication resources, and can continue normal video images when the channel quality is poor, and solves the problem that the video communication has an abnormal video when the channel quality is poor in the related art, and the user is enhanced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A video communication method, apparatus, and system. The method comprises: a sending end collects a video image and a voice signal, performs image semantic feature processing on the video image to acquire image semantic feature information, encodes the voice signal, acquires voice encoding information, and sends the image semantic feature information and the voice encoding information; a receiving end receives the image semantic feature information and the voice encoding information, calls an image semantic feature database, and generates a video image according to the image semantic feature information, the image semantic feature database comprising a mapping relationship between image semantic feature information and video image fragments, generates a voice signal according to the voice encoding information, and outputs the video image and the voice signal. According to the technical solution, in a transmission process, only image semantic feature information and voice encoding information are transmitted, and when the quality of a channel is relatively poor, a normal video image can also be continuously provided, thereby resolving the problem that video communication cannot be carried out normally when the quality of a channel is poor in the related art.

Description

一种视频通信方法、装置及系统Video communication method, device and system 技术领域Technical field
本文涉及但不限于视频通信应用领域,涉及一种视频通信方法、装置及系统。This document relates to, but is not limited to, the field of video communication applications, and relates to a video communication method, apparatus and system.
背景技术Background technique
无线视频通信是随着移动互联网和智能移动终端设备的发展而兴起的一种通信应用模式,与传统的视频通信系统相比,无线视频通信的应用具有很强的扩展性和更大的灵活性,在任何时间、任何地点,只要移动设备可以接入网络,就可以实时的进行可视通话、召开视频会议等等。然而,与一般的视频通信不同,这种方便与快捷使得无线视频传输对网络的质量有着更高的要求,网络不但要为视频传输提供足够的带宽,同时也要有时延的要求及误码率的限制。因为压缩视频对于传输错误(如分组丢失等)非常敏感,并对时延的要求非常严格,而无线信道固有的高误码率、严重的信道干扰、有限的传输带宽和大幅度波动等特点很难为视频传输提供可靠的服务质量保证。Wireless video communication is a communication application mode that arises with the development of mobile Internet and intelligent mobile terminal devices. Compared with traditional video communication systems, wireless video communication applications have strong scalability and greater flexibility. At any time, any place, as long as the mobile device can access the network, you can make video calls, hold video conferences, and so on in real time. However, unlike general video communication, this convenience and speed make wireless video transmission have higher requirements on the quality of the network. The network not only needs to provide sufficient bandwidth for video transmission, but also requires time delay and bit error rate. limits. Because compressed video is very sensitive to transmission errors (such as packet loss, etc.), and the requirements for delay are very strict, and the inherent high bit error rate, severe channel interference, limited transmission bandwidth, and large fluctuations of the wireless channel are very It is difficult to provide reliable service quality assurance for video transmission.
无线通信技术和智能移动终端的发展使得越来越多的用户开始使用移动终端(手机、平板电脑、笔记本电脑、专用设备等)进行视频通信,目前的无线视频通信系统在信道质量良好的情况下可以保证基本的通信质量,但是还不可以对本地摄像头捕捉到的视频信息(包括图像和语音)或对方传输来的视频信息进行修饰改变,而在信道质量变差的情况下,通信质量会急剧下降,甚至无法保证正常通信。The development of wireless communication technologies and intelligent mobile terminals has enabled more and more users to start using mobile terminals (mobile phones, tablets, notebook computers, special devices, etc.) for video communication. The current wireless video communication systems are in good channel quality. It can guarantee the basic communication quality, but it can not modify the video information captured by the local camera (including image and voice) or the video information transmitted by the other party. In the case of poor channel quality, the communication quality will be sharp. Drops, even normal communication is not guaranteed.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本发明实施例提供了一种视频通信方法、装置及系统,解决了相关技术中视频通信在信道质量差时存在的无法正常视频的问题。The embodiment of the invention provides a video communication method, device and system, which solves the problem that the video communication cannot be normal video when the channel quality is poor in the related art.
本发明实施例提供了一种视频通信方法,其包括: An embodiment of the present invention provides a video communication method, including:
采集视频图像及语音信号;Collecting video images and voice signals;
对所述视频图像进行图像语义特征处理,获取图像语义特征信息;对所述语音信号进行编码,获取语音编码信息;Performing image semantic feature processing on the video image to acquire image semantic feature information; encoding the voice signal to obtain voice coding information;
发送所述图像语义特征信息及所述语音编码信息。And transmitting the image semantic feature information and the voice coding information.
可选地,在对所述视频图像进行图像语义特征处理之前还包括:获取通信信道的信道信息,根据所述信道信息判断是否需要对所述视频图像进行图像语义特征处理;Optionally, before performing the image semantic feature processing on the video image, the method further includes: acquiring channel information of the communication channel, and determining, according to the channel information, whether image semantic feature processing is required for the video image;
若不需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行编码,获取图像编码信息,发送所述图像编码信息及所述语音编码信息;If the video image feature processing is not required to be performed on the video image, encoding the video image, acquiring image encoding information, and transmitting the image encoding information and the voice encoding information;
若需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行图像语义特征处理。If image semantic feature processing is required on the video image, image semantic feature processing is performed on the video image.
可选地,所述发送所述图像语义特征信息及所述语音编码信息之前还包括:根据所述信道信息判断是否满足发送所述图像语义特征信息的条件;Optionally, before the sending the image semantic feature information and the voice encoding information, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;
若满足发送所述图像语义特征信息的条件,则发送所述图像语义特征信息及所述语音编码信息一起发送;Sending the image semantic feature information and the voice coding information together if the condition for transmitting the image semantic feature information is met;
若不满足发送所述图像语义特征信息的条件,则仅发送所述语音编码信息;If the condition for transmitting the image semantic feature information is not satisfied, only the voice coding information is sent;
所述发送所述图像编码信息及所述语音编码信息之前还包括:Before the sending the image encoding information and the voice encoding information, the method further includes:
根据所述信道信息判断是否满足发送所述图像编码信息的条件;Determining, according to the channel information, whether a condition for transmitting the image encoding information is satisfied;
若满足发送所述图像编码信息的条件,则发送所述图像编码信息及所述语音编码信息;And if the condition for transmitting the image coding information is met, transmitting the image coding information and the voice coding information;
若不满足发送所述图像编码信息的条件,则仅发送所述语音编码信息。If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted.
可选地,在对所述视频图像进行图像语义特征处理之前还包括:Optionally, before performing image semantic feature processing on the video image, the method further includes:
接收用户的控制操作,根据所述控制操作判断是否需要对用户的图像语义特征进行保密;Receiving a control operation of the user, and determining, according to the control operation, whether the image semantic feature of the user needs to be kept secret;
若需要对用户的图像语义特征进行保密,则对所述视频图像进行图像语义特征处理,隐藏或替换或模糊所述用户的图像语义特征,生成所述图像语 义特征信息;If the image semantic feature of the user needs to be kept secret, image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image language is generated. Meaning characteristic information;
若不需要对用户的图像语义特征进行保密,则对所述视频图像进行编码处理,获取图像编码信息,发送所述图像编码信息及所述语音编码信息。If the image semantic feature of the user is not required to be kept secret, the video image is encoded, the image encoding information is acquired, and the image encoding information and the voice encoding information are transmitted.
可选地,所述方法还包括:将图像数据处理模式通过控制信息发送出去;Optionally, the method further includes: sending an image data processing mode by using control information;
所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。The image data processing mode includes processing based on image semantic features, or based on image encoding processing, or based on speech analysis processing.
本发明实施例提供一种视频通信方法,包括:The embodiment of the invention provides a video communication method, including:
接收图像语义特征信息及语音编码信息;Receiving image semantic feature information and voice coding information;
调用图像语义特征数据库,根据所述图像语义特征信息生成视频图像;其中,所述图像语义特征数据库包括所述图像语义特征信息与视频图像碎片的映射关系;Calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment;
根据所述语音编码信息生成语音信号;Generating a voice signal according to the voice encoded information;
输出所述视频图像及所述语音信号。The video image and the voice signal are output.
可选地,所述方法还包括:Optionally, the method further includes:
接收并解析控制信息,获取图像数据处理模式;Receiving and parsing control information to obtain an image data processing mode;
根据所述图像数据处理模式处理接收到的数据并输出;其中,所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。The received data is processed and output according to the image data processing mode; wherein the image data processing mode includes: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
可选地,若所述图像数据处理模式包括基于语音分析处理时,所述方法还包括:对所述语音编码信息进行语义分析,转化为图像语义特征信息,并根据所述图像语义特征数据库生成视频图像。Optionally, if the image data processing mode includes a voice analysis processing, the method further includes: performing semantic analysis on the voice coding information, converting the image semantic feature information, and generating the image according to the image semantic feature database. Video image.
可选地,所述方法还包括:接收正常的视频数据,根据所述正常的视频数据建立所述图像语义特征数据库。Optionally, the method further includes: receiving normal video data, and establishing the image semantic feature database according to the normal video data.
本发明实施例还提供一种视频通信方法,包括:The embodiment of the invention further provides a video communication method, including:
发送端采集视频图像及语音信号;对所述视频图像进行图像语义特征处理,获取图像语义特征信息;对所述语音信号进行编码,获取语音编码信息;发送所述图像语义特征信息及所述语音编码信息; The transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to obtain image semantic feature information; encodes the voice signal to obtain voice coding information; and sends the image semantic feature information and the voice Coded information;
接收端接收图像语义特征信息及语音编码信息;调用图像语义特征数据库,根据所述图像语义特征信息生成视频图像;其中,所述图像语义特征数据库包括所述图像语义特征信息与视频图像碎片的映射关系;根据所述语音编码信息生成语音信号;输出所述视频图像及所述语音信号。Receiving, by the receiving end, image semantic feature information and voice encoding information; calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes mapping of the image semantic feature information and the video image fragment a relationship; generating a speech signal based on the speech encoded information; outputting the video image and the speech signal.
本发明实施例还提供一种视频通信装置,包括:The embodiment of the invention further provides a video communication device, including:
采集模块,设置为采集视频图像及语音信号;The acquisition module is configured to collect video images and voice signals;
处理模块,设置为对所述视频图像进行图像语义特征处理,获取图像语义特征信息;对所述语音信号进行编码,获取语音编码信息;a processing module, configured to perform image semantic feature processing on the video image, acquire image semantic feature information, and encode the voice signal to obtain voice coding information;
发送模块,设置为发送所述图像语义特征信息及所述语音编码信息a sending module, configured to send the image semantic feature information and the voice encoding information
可选地,所述装置还包括:Optionally, the device further includes:
判断模块,设置为获取通信信道的信道信息,根据所述信道信息判断是否需要对所述视频图像进行图像语义特征处理;若不需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行编码,获取图像编码信息,发送所述图像编码信息及所述语音编码信息;若需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行图像语义特征处理,获取所述图像语义特征信息。The determining module is configured to obtain channel information of the communication channel, and determine, according to the channel information, whether image semantic feature processing is required for the video image; if image semantic feature processing is not required for the video image, the video is Encoding an image, acquiring image encoding information, and transmitting the image encoding information and the speech encoding information; if image semantic feature processing is required on the video image, performing image semantic feature processing on the video image to obtain the image Image semantic feature information.
可选地,所述判断模块还设置为:Optionally, the determining module is further configured to:
根据所述信道信息判断是否满足发送所述图像语义特征信息的条件,若满足发送所述图像语义特征信息的条件,则发送所述图像语义特征信息及所述语音编码信息一起发送,若不满足发送所述图像语义特征信息的条件,则仅发送所述语音编码信息;Determining, according to the channel information, whether the condition for transmitting the image semantic feature information is met, and if the condition for transmitting the image semantic feature information is met, sending the image semantic feature information and the voice coding information together, if not satisfied Sending the condition of the image semantic feature information, and transmitting only the voice coding information;
根据所述信道信息判断是否满足发送所述图像编码信息的条件;Determining, according to the channel information, whether a condition for transmitting the image encoding information is satisfied;
若满足发送所述图像编码信息的条件,则发送所述图像编码信息及所述语音编码信息;And if the condition for transmitting the image coding information is met, transmitting the image coding information and the voice coding information;
若不满足发送所述图像编码信息的条件,则仅发送所述语音编码信息。。If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted. .
可选地,所述装置还包括:Optionally, the device further includes:
加密模块,设置为接收用户的控制操作,根据所述控制操作判断是否需 要对用户的图像语义特征进行保密;若需要对用户的图像语义特征进行保密,则触发所述处理模块对所述视频图像进行图像语义特征处理,隐藏或替换或模糊所述用户的图像语义特征,生成所述图像语义特征信息;若不需要对用户的图像语义特征进行保密,则触发所述处理模块对所述视频图像进行编码处理,获取图像编码信息,发送所述图像编码信息及所述语音编码信息。The encryption module is configured to receive a control operation of the user, and determine whether it is required according to the control operation The image semantic feature of the user is to be kept secret; if the image semantic feature of the user needs to be kept secret, the processing module is triggered to perform image semantic feature processing on the video image to hide or replace or blur the image semantic feature of the user. Generating the image semantic feature information; if the image semantic feature of the user is not required to be kept secret, triggering the processing module to perform encoding processing on the video image, acquiring image encoding information, and transmitting the image encoding information and the Voice coded information.
可选地,所述发送模块还设置为将图像数据处理模式通过控制信息发送出去;其中,所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。Optionally, the sending module is further configured to send the image data processing mode by using control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or based on voice analysis deal with.
本发明实施例提供一种视频通信装置,包括:The embodiment of the invention provides a video communication device, including:
接收模块,设置为接收图像语义特征信息及语音编码信息;a receiving module, configured to receive image semantic feature information and voice encoding information;
还原模块,设置为调用图像语义特征数据库,根据所述图像语义特征信息生成视频图像;其中,所述图像语义特征数据库包括所述图像语义特征信息与视频图像碎片的映射关系;根据所述语音编码信息生成语音信号;a restoration module, configured to invoke an image semantic feature database, to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; Information generating a voice signal;
输出模块,设置为输出所述视频图像及所述语音信号。And an output module configured to output the video image and the voice signal.
可选地,所述还原模块还设置为接收并解析控制信息,获取图像数据处理模式;根据所述图像数据处理模式处理接收到的数据并输出;其中,所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。Optionally, the restoring module is further configured to receive and parse the control information, acquire an image data processing mode, process the received data according to the image data processing mode, and output the image data processing mode: the image-based processing mode includes: Semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
可选地,所述还原模块还设置为若所述图像数据处理模式为基于语音分析处理,则对所述语音编码信息进行语义分析,转化为图像语义特征信息,并根据所述图像语义特征数据库生成视频图像。Optionally, the restoring module is further configured to: if the image data processing mode is based on a voice analysis process, performing semantic analysis on the voice coded information, converting into image semantic feature information, and according to the image semantic feature database Generate a video image.
可选地,所述装置还包括训练模块,设置为接收正常的视频数据,根据所述正常的视频数据建立所述图像语义特征数据库。Optionally, the apparatus further includes a training module configured to receive normal video data, and the image semantic feature database is established according to the normal video data.
本发明实施例还提供一种视频通信系统,包括:发送端和接收端;The embodiment of the invention further provides a video communication system, including: a transmitting end and a receiving end;
所述发送端包括前述任一项所述的视频通信装置,所述接收端包括前述任一项所述的视频通信装置。The transmitting end includes the video communication device according to any one of the preceding claims, wherein the receiving end comprises the video communication device according to any one of the preceding claims.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介 质中存储有计算机可执行指令,所述计算机可执行指令被执行时实现发送端侧的视频通信方法。The embodiment of the invention further provides a computer readable storage medium, the computer readable storage medium The computer executable instructions are stored in the medium, and the computer-executable instructions are executed to implement a video communication method on the transmitting side.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令被执行时实现接收端侧的视频通信方法。The embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method at the receiving end side is implemented.
本发明实施例的有益效果:Advantageous effects of embodiments of the present invention:
本发明实施例提供了一种新的视频通信方法,发送端通过对采集到的视频数据进行分离,得到视频图像及语音信号,对视频图像进行图像语义特征处理处理得到图像语义特征信息,发送图像语义特征信息及语音编码信息,接收端调用图像语义特征数据库,根据图像语义特征信息还原视频图像,将其与语音信号输出完成视频接收。上述技术方案,由于在传输过程中,仅传输图像语义特征信息及语音编码信息,这样与直接传输视频数据的方式相比,大大降低了对通信资源的要求,在信道质量较差时,也可以继续正常视频图像,解决了相关技术中视频通信在信道质量差时存在的无法正常视频的问题,增强了用户的使用体验。在阅读并理解了附图和详细描述后,可以明白其它方面。The embodiment of the invention provides a new video communication method. The transmitting end separates the collected video data to obtain a video image and a voice signal, and performs image semantic feature processing on the video image to obtain image semantic feature information, and sends the image. Semantic feature information and speech coding information, the receiving end calls the image semantic feature database, restores the video image according to the image semantic feature information, and outputs the video signal with the speech signal output. In the above technical solution, since only the image semantic feature information and the voice coding information are transmitted during the transmission process, the requirement for communication resources is greatly reduced compared with the method of directly transmitting the video data, and when the channel quality is poor, The normal video image is continued, which solves the problem that the video communication in the related art has an abnormal video when the channel quality is poor, and enhances the user experience. Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图说明DRAWINGS
图1为本发明实施例一提供的视频通信装置的结构示意图;1 is a schematic structural diagram of a video communication apparatus according to Embodiment 1 of the present invention;
图2为本发明实施例一提供的视频通信装置的另一结构示意图;2 is another schematic structural diagram of a video communication apparatus according to Embodiment 1 of the present invention;
图3为本发明实施例一提供的视频通信系统的结构示意图;3 is a schematic structural diagram of a video communication system according to Embodiment 1 of the present invention;
图4为本发明实施例二提供的视频通信方法的流程图;4 is a flowchart of a video communication method according to Embodiment 2 of the present invention;
图5为本发明实施例二提供的视频通信方法的发送端的流程图;FIG. 5 is a flowchart of a sending end of a video communication method according to Embodiment 2 of the present invention;
图6为本发明实施例二提供的视频通信方法的接收端的流程图;6 is a flowchart of a receiving end of a video communication method according to Embodiment 2 of the present invention;
图7为本发明实施例三中通信模式切换的示意图;FIG. 7 is a schematic diagram of a communication mode switching according to Embodiment 3 of the present invention; FIG.
图8为本发明实施例三提供的视频通信方法的流程图。 FIG. 8 is a flowchart of a video communication method according to Embodiment 3 of the present invention.
具体实施方式detailed description
下面结合具体实施例以及附图对本申请做出进一步的诠释说明。The present application is further explained in the following with reference to the specific embodiments and the accompanying drawings.
实施例一:Embodiment 1:
如图1所示,本发明实施例提供了一种视频通信装置11,其包括:As shown in FIG. 1 , an embodiment of the present invention provides a video communication apparatus 11 including:
采集模块111,设置为采集视频图像及语音信号;The acquiring module 111 is configured to collect video images and voice signals;
处理模块112,设置为对视频图像进行图像语义特征处理,获取图像语义特征信息;对语音信号进行编码,获取语音编码信息;The processing module 112 is configured to perform image semantic feature processing on the video image to acquire image semantic feature information, and encode the voice signal to obtain voice coding information.
发送模块113,设置为发送图像语义特征信息及语音编码信息The sending module 113 is configured to send image semantic feature information and voice coded information.
如图1所示,可选地,在一些实施例中,上述实施例中的发送端11还包括判断模块114,设置为获取通信信道的信道信息,根据所述信道信息判断是否需要对视频图像进行图像语义特征处理;若不需要对视频图像进行图像语义特征处理,则对视频图像进行编码,获取图像编码信息,发送图像编码信息及语音编码信息;若需要对视频图像进行图像语义特征处理,则对视频图像进行图像语义特征处理,获取图像语义特征信息。As shown in FIG. 1 , in some embodiments, the sending end 11 in the foregoing embodiment further includes a determining module 114 configured to acquire channel information of a communication channel, and determine, according to the channel information, whether a video image is needed. Image semantic feature processing; if image semantic feature processing is not required for video image, the video image is encoded, image encoding information is acquired, image encoding information and speech encoding information are transmitted; if image semantic feature processing is required on the video image, The image semantic feature processing is performed on the video image to obtain image semantic feature information.
可选地,在一些实施例中,判断模块114还设置为Optionally, in some embodiments, the determining module 114 is further configured to
根据信道信息判断是否满足发送图像语义特征信息的条件,若满足发送图像语义特征信息的条件,则发送图像语义特征信息及语音编码信息一起发送,若不满足发送图像语义特征信息的条件,则仅发送所述语音编码信息;According to the channel information, it is judged whether the condition for transmitting the image semantic feature information is satisfied. If the condition for transmitting the image semantic feature information is satisfied, the sent image semantic feature information and the voice coding information are sent together, and if the condition for transmitting the image semantic feature information is not satisfied, only Transmitting the speech encoded information;
根据信道信息判断是否满足发送图像编码信息的条件;Determining whether the condition for transmitting the image coding information is satisfied according to the channel information;
若满足发送图像编码信息的条件,则发送图像编码信息及所述语音编码信息;If the condition for transmitting the image coding information is satisfied, the image coding information and the voice coding information are transmitted;
若不满足发送图像编码信息的条件,则仅发送语音编码信息。If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted.
可选地,如图1所示,在一些实施例中,所述视频通信还包括加密模块115,设置为接收用户的控制操作,根据控制操作判断是否需要对用户的图像语义特征进行保密;若需要对用户的图像语义特征进行保密,则触发处理模块对视频图像进行图像语义特征处理,隐藏或替换或模糊所述用户的图像语 义特征,生成图像语义特征信息;若不需要对用户的图像语义特征进行保密,则触发处理模块对视频图像进行编码处理,获取图像编码信息,发送图像编码信息及语音编码信息。Optionally, as shown in FIG. 1 , in some embodiments, the video communication further includes an encryption module 115 configured to receive a control operation of the user, and determine, according to the control operation, whether the image semantic feature of the user needs to be kept secret; The image semantic feature of the user needs to be kept secret, and the trigger processing module performs image semantic feature processing on the video image to hide or replace or blur the image language of the user. The semantic feature generates image semantic feature information; if the image semantic feature of the user is not required to be kept secret, the trigger processing module encodes the video image, acquires image encoding information, and transmits image encoding information and voice encoding information.
可选地,在一些实施例中,发送模块111还设置为将图像数据处理模式通过控制信息发送出去;其中,图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。Optionally, in some embodiments, the sending module 111 is further configured to send the image data processing mode by using the control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or Based on speech analysis processing.
如图2所示,本发明实施例提供了一种视频通信装置12,包括:As shown in FIG. 2, an embodiment of the present invention provides a video communication device 12, including:
接收模块121,设置为接收图像语义特征信息及语音编码信息;The receiving module 121 is configured to receive image semantic feature information and voice encoding information;
还原模块122,设置为调用图像语义特征数据库,根据图像语义特征信息生成视频图像;其中,图像语义特征数据库包括图像语义特征信息与视频图像碎片的映射关系;根据语音编码信息生成语音信号;The restoration module 122 is configured to invoke the image semantic feature database to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; and generating a voice signal according to the voice encoding information;
输出模块123,设置为输出视频图像及语音信号。The output module 123 is configured to output a video image and a voice signal.
可选地,在一些实施例中,还原模块122还设置为接收并解析控制信息,获取图像数据处理模式;根据所述图像数据处理模式处理接收到的数据并输出;其中,所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。在本实施例中,接收到的数据根据图像数据处理模式的不同而有所不同,接收到的数据可以是以下信息中的一种或多种:图像语义特征信息、图像编码信息、或者语音编码信息。Optionally, in some embodiments, the restoration module 122 is further configured to receive and parse the control information, acquire an image data processing mode, process the received data according to the image data processing mode, and output; wherein the image data processing The modes include: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing. In this embodiment, the received data is different according to different image data processing modes, and the received data may be one or more of the following information: image semantic feature information, image coding information, or voice coding. information.
可选地,在一些实施例中,所述还原模块122还设置为若图像数据处理模式为基于语音分析处理时,则对语音编码信息进行语义分析,转化为图像语义特征信息,并根据图像语义特征数据库生成视频图像。Optionally, in some embodiments, the restoring module 122 is further configured to: when the image data processing mode is based on the voice analysis process, perform semantic analysis on the voice coded information, convert the image into the image semantic feature information, and according to the image semantics The feature database generates a video image.
可选地,如图2所示,在一些实施例中,所述视频通信装置12还包括训练模块124,设置为接收正常的视频数据,根据正常的视频数据建立图像语义特征数据库。Optionally, as shown in FIG. 2, in some embodiments, the video communication device 12 further includes a training module 124 configured to receive normal video data and establish an image semantic feature database based on normal video data.
如图3所示,本发明实施例还提供一种视频通信系统的结构示意图,包括发送端1及接收端2;其中, As shown in FIG. 3, the embodiment of the present invention further provides a schematic structural diagram of a video communication system, including a transmitting end 1 and a receiving end 2;
发送端1包括前述任一视频通信装置11,接收端2包括前述任一视频通信装置12。The transmitting end 1 comprises any of the aforementioned video communication devices 11, and the receiving end 2 comprises any of the aforementioned video communication devices 12.
实施例二:Embodiment 2:
图4为本发明实施例二提供的视频通信方法的流程图,由图4可知,在本实施例中,本发明实施例提供的管理方法包括骤:4 is a flowchart of a video communication method according to Embodiment 2 of the present invention. As shown in FIG. 4, in the embodiment, the management method provided by the embodiment of the present invention includes:
步骤S201:发送端采集视频图像及语音信号;对视频图像进行图像语义特征处理,获取图像语义特征信息;对语音信号进行编码,获取语音编码信息;发送图像语义特征信息及语音编码信息;Step S201: the transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to acquire image semantic feature information; encodes the voice signal to obtain voice coded information; and sends image semantic feature information and voice coded information;
步骤S202:接收端接收图像语义特征信息及语音编码信息;调用图像语义特征数据库,根据图像语义特征信息生成视频图像;图像语义特征数据库包括图像语义特征信息与视频图像碎片的映射关系;根据语音编码信息生成语音信号;输出视频图像及语音信号。Step S202: The receiving end receives the image semantic feature information and the voice encoding information; invokes the image semantic feature database, and generates a video image according to the image semantic feature information; the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; The information generates a voice signal; the video image and the voice signal are output.
如图5所示,在本实施例中,本发明实施例提供的视频通信方法在发送端的体现包括:As shown in FIG. 5, in the embodiment, the embodiment of the video communication method provided by the embodiment of the present invention includes:
步骤S301:采集视频图像及语音信号;Step S301: collecting a video image and a voice signal;
步骤S302:对视频图像进行图像语义特征处理,获取图像语义特征信息;对语音信号进行编码,获取语音编码信息;Step S302: performing image semantic feature processing on the video image to acquire image semantic feature information; encoding the voice signal to obtain voice coding information;
步骤S303:发送图像语义特征信息及语音编码信息。Step S303: Send image semantic feature information and voice coding information.
可选地,在一些实施例中,步骤S302之前,所述方法还包括:获取通信信道的信道信息,根据信道信息(例如信道质量、信息时延、信道丢包率等)判断是否需要对视频图像进行图像语义特征处理;若不需要对所述视频图像进行图像语义特征处理,则对视频图像进行编码(可以采用常用的H.264,H.265等编解码方案),获取图像编码信息,发送图像编码信息及语音编码信息;若需要对所述视频图像进行图像语义特征处理,则对视频图像进行图像语义特征处理,获取图像语义特征信息。Optionally, in some embodiments, before step S302, the method further includes: acquiring channel information of the communication channel, and determining, according to channel information (eg, channel quality, information delay, channel loss rate, etc.), whether the video is needed. The image is subjected to image semantic feature processing; if image semantic feature processing is not required for the video image, the video image is encoded (using commonly used H.264, H.265, etc.) to obtain image coding information. Sending image coding information and voice coding information; if image semantic feature processing is needed on the video image, performing image semantic feature processing on the video image to obtain image semantic feature information.
也就是说,可选地,在采集到视频图像及语音信号后,可以先判断是否需要对视频图像进行图像语义特征处理,如果需要对视频图像进行图像语义 特征处理,则执行步骤S302;如果不需要对视频图像进行图像语义特征处理,则转另外的执行流程,即及转到对视频图像进行编码处理的流程。That is to say, optionally, after the video image and the voice signal are collected, it may be first determined whether the image semantic feature processing needs to be performed on the video image, and if the video image is required to be image semantically If the feature processing is performed, step S302 is performed; if image semantic feature processing is not required for the video image, then another execution flow is performed, that is, the flow of encoding the video image is performed.
可选地,在一些实施例中,在步骤S303之前,所述方法还包括:根据所述信道信息判断是否满足发送所述图像语义特征信息的条件;Optionally, in some embodiments, before step S303, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;
若满足发送图像语义特征信息的条件,则发送图像语义特征信息及语音编码信息一起发送;若不满足发送图像语义特征信息的条件,则仅发送所述语音编码信息;If the condition for transmitting the semantic feature information of the image is satisfied, the sent image semantic feature information and the voice encoding information are sent together; if the condition for transmitting the image semantic feature information is not satisfied, only the voice encoding information is sent;
所述发送所述图像编码信息及语音编码信息之前还包括:Before the sending the image encoding information and the voice encoding information, the method further includes:
根据信道信息判断是否满足发送图像编码信息的条件;若满足发送图像编码信息的条件,则发送图像编码信息及所述语音编码信息;若不满足发送图像编码信息的条件,则仅发送语音编码信息。Determining whether the condition for transmitting the image encoding information is satisfied according to the channel information; if the condition for transmitting the image encoding information is satisfied, transmitting the image encoding information and the voice encoding information; if the condition for transmitting the image encoding information is not satisfied, transmitting only the voice encoding information .
可选地,在一些实施例中,步骤S302对视频图像进行图像语义特征处理之前,所述方法还包括:接收用户的控制操作,根据控制操作判断是否需要对用户的图像语义特征进行保密;若需要对用户的图像语义特征进行保密,则对视频图像进行图像语义特征处理,隐藏或替换或模糊所述用户的图像语义特征,生成图像语义特征信息;Optionally, in some embodiments, before step S302 performs image semantic feature processing on the video image, the method further includes: receiving a control operation of the user, and determining, according to the control operation, whether the image semantic feature of the user needs to be kept secret; The image semantic feature of the user needs to be kept secret, and the image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image semantic feature information is generated;
若不需要对用户的图像语义特征进行保密,则对视频图像进行编码处理,获取图像编码信息,发送图像编码信息及所述语音编码信息。If the image semantic feature of the user is not required to be kept secret, the video image is encoded, the image encoding information is acquired, and the image encoding information and the voice encoding information are transmitted.
可选地,在一些实施例中,所述方法还包括:将图像数据处理模式通过控制信息发送出去;其中,图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。Optionally, in some embodiments, the method further includes: transmitting the image data processing mode by using the control information; wherein the image data processing mode comprises: based on image semantic feature processing, or based on image encoding processing, or Based on speech analysis processing.
对应的,如图6所示,本发明实施例提供的视频通信方法在接收端的体现包括:Correspondingly, as shown in FIG. 6, the embodiment of the video communication method provided by the embodiment of the present invention at the receiving end includes:
步骤S401:接收图像语义特征信息及语音编码信息;Step S401: Receive image semantic feature information and voice coding information;
步骤S402:调用图像语义特征数据库,根据图像语义特征信息生成视频图像;其中,图像语义特征数据库包括图像语义特征信息与视频图像碎片的映射关系;根据语音编码信息生成语音信号;Step S402: Invoking an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; and generating a voice signal according to the voice encoding information;
步骤S403:输出视频图像及语音信号。 Step S403: Output a video image and a voice signal.
可选地,在一些实施例中,所述方法还包括:接收并解析控制信息,获取图像数据处理模式;根据图像数据处理模式处理接收到的数据并输出;图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。Optionally, in some embodiments, the method further includes: receiving and parsing the control information, acquiring an image data processing mode; processing the received data according to the image data processing mode and outputting; the image data processing mode includes: based on image semantics Feature processing, or based on image encoding processing, or based on speech analysis processing.
可选地,在一些实施例中,所述方法还包括:若图像数据处理模式包括基于语音分析处理时,对语音编码信息进行语义分析,转化为图像语义特征信息,并根据图像语义特征数据库生成视频图像。Optionally, in some embodiments, the method further includes: if the image data processing mode includes the voice analysis processing, performing semantic analysis on the voice coding information, converting the image semantic feature information, and generating the image according to the image semantic feature database. Video image.
可选地,在一些实施例中,所述方法还包括:接收正常的视频数据,根据正常的视频数据建立图像语义特征数据库。Optionally, in some embodiments, the method further comprises: receiving normal video data, and establishing an image semantic feature database according to the normal video data.
第三实施例:Third embodiment:
现结合具体应用场景对本申请做进一步的诠释说明。This application is further explained in the context of a specific application scenario.
本实施例提出了一个基于视频内容的图像语义特征提取和重现技术的无线视频通信系统,可以在较差信道质量下依然保证正常通信的同时,还可以为用户提供对本地及对方视频进行修饰和改变的机会,以期可以达到更好的用户体验。This embodiment proposes a wireless video communication system based on image semantic feature extraction and reproduction technology of video content, which can ensure normal communication while providing poor local channel quality, and can also provide users with local and counterpart video modification. And the opportunity to change, in order to achieve a better user experience.
该无线视频通信系统的设计思想是在目前的无线通信系统之上附加了一套视频图像语义特征提取与重现模块,目的是可以使用视频信号的副本进行视频图像语义特征提取,不会影响到正常的视频通信,在信道质量变差后,模式控制模块可以自由切换到视频图像语义特征提取模块的副通道进行视频通信。其既可以作为整个通信系统的一部分,也可以以插件的方式嵌入到无线通信系统中,这样既增加了使用灵活性,也降低了无线通信系统的改造成本。The design idea of the wireless video communication system is to add a set of video image semantic feature extraction and reproduction module on the current wireless communication system, so as to use the copy of the video signal to extract the semantic features of the video image without affecting In normal video communication, after the channel quality deteriorates, the mode control module can freely switch to the sub-channel of the video image semantic feature extraction module for video communication. It can be embedded in the wireless communication system as part of the entire communication system or as a plug-in, which increases the flexibility of use and reduces the cost of retrofitting the wireless communication system.
整套模块主要包括模式控制、视频图像语义特征提取、特征数据库、特征合成等功能模块。其中发送端与接收端的视频图像语义特征提取模块应为含有相同功能的模块,其图像检测、特征提取等遵循相同的算法与标准。The whole module mainly includes functional modules such as mode control, video image semantic feature extraction, feature database and feature synthesis. The video image semantic feature extraction module of the transmitting end and the receiving end should be a module with the same function, and the image detection, feature extraction and the like follow the same algorithm and standard.
模式控制模块控制整套视频图像语义特征提取与重现模块,它接收来自发送端和接收端的信道质量反馈(例如信号强度信息,信道质量信息,时延 信息,缓冲(buffer)状态信息,移动状态信息等),负责开启或切换各种通信模式。The mode control module controls a complete video image semantic feature extraction and reproduction module, which receives channel quality feedback from the transmitting end and the receiving end (eg, signal strength information, channel quality information, delay) Information, buffer status information, mobile status information, etc., responsible for turning on or switching various communication modes.
视频图像特征信息提取模块设置为对视频图像信号进行解析,针对视频画面中的场景、人物、表情等进行特征检测、特征提取、图像切割等操作并可以将处理后的特征原型和特征信息送入数据库中或是将特征信息送给发送端。一种实现方式包括:视频图像特征信息提取模块从发送端的上层直接获取传输视频的副本,然后根据系统配置对视频图像信号进行解析,提取出所需的视频图像中的特征原型和特征信息。这种提取过程可以是视频传输过程中的一个环节,即只传输特征信息,也可以独立于视频传输过程,即只为了提取特征原型,不干涉视频通信。The video image feature information extraction module is configured to parse the video image signal, perform feature detection, feature extraction, image cutting, etc. on the scene, the character, the expression, and the like in the video image, and may send the processed feature prototype and feature information into the image. The feature information is sent to the sender in the database. An implementation manner includes: the video image feature information extraction module directly obtains a copy of the transmission video from an upper layer of the transmitting end, and then parses the video image signal according to the system configuration, and extracts a feature prototype and feature information in the required video image. This extraction process may be a link in the video transmission process, that is, only transmitting the feature information, or may be independent of the video transmission process, that is, only to extract the feature prototype and not interfere with the video communication.
特征数据库设置为存储由视频语义特征信息提取模块传递过来的特征原型和特征信息等,并根据系统配置将各类特征原型和特征信息分类存储,在需要时根据视频特征合成模块传递过来的控制信号(或者特征信息)向特征合成模块提供特征原型。特征原型可以是数学模型或者是剪裁后的图片。The feature database is configured to store feature prototypes and feature information transmitted by the video semantic feature information extraction module, and classify and store various feature prototypes and feature information according to system configuration, and transmit control signals according to the video feature synthesis module when needed. (or feature information) provides a feature prototype to the feature synthesis module. The feature prototype can be a mathematical model or a cropped image.
特征合成模块将根据系统配置,将特征数据库传递过来的特征原型重新组合出一副完整的图像结合语音信号后发送给视频应用完成视频通信任务。The feature synthesis module will recombine the feature prototypes passed by the feature database according to the system configuration, and then combine a complete image with the voice signal and send it to the video application to complete the video communication task.
如图7所示,下面针对以下几类通信模式说明视频通过的过程:As shown in Figure 7, the following describes the process of video passing for the following types of communication modes:
1、正常通信:1, normal communication:
发送端:视频应用将视频信号通过主通道直接进行图像编码和语音编码,并交给发送端通过信道发送到接收端,此时模式控制模块不干涉视频通信,也不会向视频特征提取模块传递视频副本。The transmitting end: the video application directly performs image encoding and voice encoding through the main channel, and sends it to the transmitting end to transmit to the receiving end through the channel. At this time, the mode control module does not interfere with the video communication, and does not transmit to the video feature extraction module. Video copy.
接收端:接收到的图像编码信息和语音编码信息直接上交给视频应用来完成视频通信,同时模式控制模块会保存一份视频副本传递给接收端的视频特征提取模块,视频特征提取模块会根据系统的默认配置等信息对视频的图像(或图像加语音)进行解析,提取出图像原型和特征信息后送入特征数据库,主要用来对特征数据库进行创建和维护。Receiving end: the received image encoding information and voice encoding information are directly handed over to the video application to complete the video communication, and the mode control module saves a video copy to the receiving end of the video feature extraction module, and the video feature extraction module according to the system The default configuration and other information is used to parse the video image (or image plus speech), extract the image prototype and feature information and send it to the feature database, which is mainly used to create and maintain the feature database.
2、模拟通信:2, analog communication:
当信道质量变差,发送端或接收端不断监测信道质量,模式控制模块随 时获取发送端或接收端的信道反馈,根据信道反馈,随时控制系统进入模拟通信模式。When the channel quality deteriorates, the transmitter or receiver constantly monitors the channel quality, and the mode control module The channel feedback of the transmitting end or the receiving end is obtained, and the system enters the analog communication mode at any time according to the channel feedback.
发送端:此时上层视频应用向下递交的视频图像信号和语音信号在模式控制模块有不同的处理,视频图像(或图像加语音)传递给视频特征提取模块从而提取得到特征信息,另一方面,视频语音通过语音编码得到语音编码信息,然后图像特征信息与语音编码信息交给发送端并发送到信道上,此时发送端发送的视频图像信息将全部来自特征提取模块。The transmitting end: at this time, the video image signal and the voice signal submitted by the upper layer video application are processed differently in the mode control module, and the video image (or image plus voice) is transmitted to the video feature extraction module to extract the feature information. The video speech is obtained by speech coding to obtain speech coding information, and then the image feature information and the speech coding information are delivered to the transmitting end and transmitted to the channel. At this time, the video image information sent by the transmitting end will all come from the feature extraction module.
接收端:模式控制模块获得了视频图像特征信息后,将交给特征合成模块。特征合成模块利用接收到的视频图像特征信息分析视频此时的画面状态,然后根据接收到的图像特征信息从特征数据库中取得的预先保存的特征原型(图像模板),合成完整的视频画面。然后和经过语音解码后的语音信号一起发送到上层视频应用。此外,语音信号还可以被辅助输入到特征合成模块以便利用对语音的分析改善视频画面的合成,例如使视频画面(例如口形)和语音更匹配。Receiver: After the mode control module obtains the video image feature information, it will be handed over to the feature synthesis module. The feature synthesis module analyzes the picture state of the video at this time by using the received video image feature information, and then synthesizes the complete video picture according to the pre-saved feature prototype (image template) obtained from the feature database according to the received image feature information. It is then sent to the upper video application along with the speech decoded speech signal. In addition, the speech signal can also be assisted into the feature synthesis module to improve the synthesis of the video picture with analysis of the speech, such as to better match the video picture (eg, mouth shape) and speech.
3、混合通信:3. Mixed communication:
当信道质量不稳定,此时的信道状态不足以支持完全的正常通信,但是信道状态优于模拟通信的信道要求,或信道状态处于快速波动状态。此时模式控制模块将根据一个判断标准开启混合通信模式。When the channel quality is unstable, the channel state at this time is insufficient to support full normal communication, but the channel state is superior to the channel requirement of analog communication, or the channel state is in a fast fluctuation state. At this point, the mode control module will turn on the hybrid communication mode according to a judgment criterion.
发送端:模式控制模块按照一个时间参数配置在正常通信和模拟通信之间进行快速切换,上述时间参数配置可以根据信道状态或是人为规定来确定。Transmitter: The mode control module performs fast switching between normal communication and analog communication according to a time parameter configuration. The time parameter configuration can be determined according to channel status or artificial regulations.
接收端:接收端获得视频信息后,有两种子模式可选。第一种子模式是根据正常通信和模拟通信的控制信息切换视频图像的处理模式。另一种子模式即使在正常通信(即信道上传输的是视频图像的编码信息)的情况下,模式控制模块也不会开启主通道。通信解码后的视频图像送入视频图像特征提取模块,不断更新特征数据库。另一方面,特征提取模块会将特征信息发送给特征合成模块进行模拟通信视频画面合成,特征合成模块将合成后的视频图像送到上层视频应用。在模拟通信(即信道上传输的是视频图像的特征信息)的情况下,模式控制模块将接收到的图像特征信息送给特征合成模块进行模拟通信视频画面合成。这种子模式的目的是为用户提供始终如一画面质 量,避免正常通信和模拟通信的快速切换带来的不好的用户体验。Receiver: After the receiver receives the video information, there are two submodes available. The first seed mode is a processing mode of switching video images according to control information of normal communication and analog communication. Another seed mode does not open the main channel even in normal communication (i.e., the encoded information of the video image is transmitted on the channel). The video image after communication decoding is sent to the video image feature extraction module to continuously update the feature database. On the other hand, the feature extraction module sends the feature information to the feature synthesis module for analog communication video frame synthesis, and the feature synthesis module sends the synthesized video image to the upper layer video application. In the case of analog communication (ie, feature information of a video image transmitted on a channel), the mode control module sends the received image feature information to the feature synthesis module for analog communication video frame synthesis. The purpose of this sub-mode is to provide users with consistent picture quality. Amount to avoid a bad user experience caused by the fast switching of normal communication and analog communication.
特征数据库中预先保存的特征原型和特征信息可以是在前面正常通信时创建和维护的,也可以是针对不同用户或专有信道早已创建好的,例如以文件(包)的形式接收或者安装的。Feature prototypes and feature information pre-saved in the feature database may be created and maintained during normal communication, or may have been created for different users or proprietary channels, such as in the form of files (packages). .
4、极差信道通信:4. Very poor channel communication:
发送端:这种状态时,模式控制模块会完全关闭或忽略视频信号,仅将语音信号编码后通过信道发送到接收端。Transmitter: In this state, the mode control module will completely turn off or ignore the video signal, and only encode the voice signal and send it to the receiving end through the channel.
接收端:模式控制模块将接收到的语音信号(来自对语音编码信息的解码)传递给特征合成模块,通过语义分析,分析出视频画面此时可能的状态,利用数据库中的特征信息和图像原型直接合成视频画面,并和语音信号一起发送给视频应用,维持最低限度的视频通信。如果要支持极差信道通信,在正常通信、模拟通信或混合通信时,接收端在特征数据库的创建或维护中需要将语音信号也输入到特征提取模块中以便建立起基于语音分析的特征信息和图像特征原型之间的对应关系。Receiver: The mode control module passes the received voice signal (from the decoding of the voice coded information) to the feature synthesis module, and through semantic analysis, analyzes the possible state of the video frame at this time, and utilizes the feature information and image prototype in the database. The video picture is directly synthesized and sent to the video application along with the voice signal to maintain minimal video communication. If it is necessary to support the poor channel communication, in the normal communication, analog communication or hybrid communication, the receiving end needs to input the voice signal into the feature extraction module in the creation or maintenance of the feature database to establish the feature information based on the voice analysis and Correspondence between image feature prototypes.
上述功能,同样适用于GPRS(General Packet Radio Service,通用分组无线服务)/CDMA(Code Division Multiple Access,码分多址)/3G/4G/5G/WLAN(Wireless Local Area Network,无线局域网)等不同制式的无线网络之间的切换。接收端视频特征提取模块中,视频解析包括将视频画面中的人物特征、人物表情特征、背景特征等进行检测、然后提取对应的图像特征原型和特征信息送入到特征数据库中,同时对同一时间的语音内容进行语义分析,提取出语义特征与视频画面的特征信息进行一一对应存储。特征数据库中预先保存的特征信息可以是在前面正常通信时创建和维护的,也可以是针对不同用户或专有信道早已创建好的或获取到的特征信息。例如当用户从高吞吐量的网络切换到低吞吐量的网络之前,完成特征数据库的建模。可以根据通话方身份、地理位置、时间等信息或者图像识别结果来匹配/合并特征数据库,以进行特征数据库的维护,可用于在通信一发起时就需要进入模拟通信或极差通信状态的情况。The above functions are also applicable to GPRS (General Packet Radio Service)/CDMA (Code Division Multiple Access)/3G/4G/5G/WLAN (Wireless Local Area Network). Switching between standard wireless networks. In the video feature extraction module of the receiving end, the video parsing includes detecting character features, character expression features, background features, etc. in the video image, and then extracting corresponding image feature prototypes and feature information into the feature database, simultaneously at the same time The speech content is semantically analyzed, and the semantic features are extracted and stored in one-to-one correspondence with the feature information of the video image. The feature information pre-stored in the feature database may be created and maintained during normal communication before, or may be feature information that has been created or acquired for different users or proprietary channels. Modeling of the feature database is done, for example, before the user switches from a high throughput network to a low throughput network. The feature database can be matched/merged according to the identity of the caller, the geographical location, the time, or the image recognition result, to maintain the feature database, and can be used to enter the analog communication or the poor communication state when the communication is initiated.
如图7所示,在特征数据库已建立的情况下,各类通信模式可以灵活切换。 As shown in FIG. 7, in the case where the feature database has been established, various communication modes can be flexibly switched.
下面结合具体运用场景进行说明。The following describes the specific application scenarios.
场景1、scene 1,
假设该场景下,用户确定本次通信要求信息对没有特征数据库的其它通信方保密。Assume that in this scenario, the user determines that the current communication request information is confidential to other communication parties that do not have a feature database.
如图8所示,在本实施例中,本发明实施例提供的视频通信方法包括:As shown in FIG. 8, in the embodiment, the video communication method provided by the embodiment of the present invention includes:
步骤S501:用户设置通信模式;Step S501: The user sets a communication mode.
发送端的视频应用在正式建立视频通信前,用户确定本次通信要求信息对没有特征数据库的其它通信方保密(例如全部或指定人眼保密、全部或指定人脸保密、或全部或指定背景保密),建立的视频通信可能双人通信、视频会议,尤其是多人视频会议等情况。模式控制模块将用户配置传递到视频图像特征提取模块;摄像头等图像采集设备打开,开始采集视觉信号。视频先进入视频图像特征提取模块,不通过主通道进入发送端。同时,发送端向接收端发出连接请求,请求进行视频通信。Before the video communication is formally established, the user determines that the communication request information is confidential to other communication parties without the feature database (for example, all or specified eyes are kept secret, all or designated faces are kept secret, or all or specified background is kept secret) The established video communication may be double communication, video conferencing, especially multi-person video conferencing. The mode control module passes the user configuration to the video image feature extraction module; the image capture device such as the camera is turned on, and the visual signal is collected. The video first enters the video image feature extraction module and does not enter the transmitting end through the main channel. At the same time, the sender sends a connection request to the receiver to request video communication.
步骤S502:发送端对视频数据保密传输;Step S502: The sender transmits the video data in a secure manner;
发送端对每一帧画面进行检测,按照用户需求找到需要加密的特征后,对画面进行切割,提取出可传输的图像,然后将提取后的视频图像合并语音后作为最终的视频信息送到发送端进入信道。The transmitting end detects each frame of the picture, finds the feature to be encrypted according to the user's requirement, cuts the picture, extracts the transmittable image, and then combines the extracted video image and sends it as the final video information. The terminal enters the channel.
在实际应用中,具体包括以下2个视频图像处理方式:In practical applications, the following two video image processing methods are specifically included:
方式一:发送端的视频图像特征提取模块对每一帧画面进行检测,按照用户需求找到需要保密的特征后,对画面进行切割,通过隐藏或替换或模糊需要保密的特征的方法提取出可传输的图像,然后将提取后的视频图像进行编码。同时,图像特征提取模块也输出图像特征信息,然后将图像编码信息、图像特征信息和语音编码一起送到信道。Method 1: The video image feature extraction module at the transmitting end detects each frame of the image, and after finding the feature that needs to be kept secret according to the user's needs, cuts the image, and extracts the transmittable by hiding or replacing or blurring the feature that needs to be kept secret. The image is then encoded by the extracted video image. At the same time, the image feature extraction module also outputs image feature information, and then sends the image encoding information, the image feature information, and the speech encoding to the channel.
方式二:发送端的视频图像特征提取模块对视频图像进行特征信息提取,然后将与需要保密的特征相关的部分特征信息替换为无需保密的特征信息,然后将特征信息与语音编码送到发送端进入信道。Manner 2: The video image feature extraction module at the transmitting end extracts feature information from the video image, and then replaces part of the feature information related to the feature that needs to be kept secret into feature information that does not need to be kept secret, and then sends the feature information and the voice code to the transmitting end. channel.
步骤S503:接收端接收视频数据;Step S503: The receiving end receives the video data.
发送端和接收端在开机后要一直检测信道质量,及时反馈信道质量,模 式控制模块要根据信道质量反馈选择对应的通信模式。The sender and the receiver must always check the channel quality after powering on, and feedback the channel quality in time. The control module selects the corresponding communication mode according to the channel quality feedback.
接收端在接收到加密通信的连接请求,且信道质量良好的情况下,视频信号一方面在模式控制模块进行再修饰,由用户或者系统默认进行修饰工作,然后通过主通道传递上层视频应用,另一方面取视频信号副本送入接收端视频特征提取模块。When the receiving end receives the connection request of the encrypted communication and the channel quality is good, the video signal is re-modified on the one hand in the mode control module, and the user or the system performs the modification work by default, and then passes the upper layer video application through the main channel, and On the one hand, a copy of the video signal is sent to the video feature extraction module at the receiving end.
与发送端对应的,也包括以下2个方式:Corresponding to the sender, it also includes the following two methods:
方式一(对应于前述发送端的方式一):接收端对接收到的图像编码信息进行解码得到图像信号,并和图像特征信息一起在特征合成模块合成图像画面,另外也对接收到的语音编码信息进行解码得到语音信号,最终向上层应用或外部设备输出图像信号和语音信号。Method 1 (corresponding to the first method of the foregoing transmitting end): the receiving end decodes the received image encoding information to obtain an image signal, and synthesizes the image image together with the image feature information in the feature synthesizing module, and also receives the received speech encoding information. Decoding is performed to obtain a voice signal, and finally an image signal and a voice signal are output to an upper layer application or an external device.
方式二(对应于前述发送端的方式二):接收端将接收到的图像特征信息送到特征合成模块,基于特征数据库合成图像画面,另外也对接收到的语音编码信息进行解码得到语音信号,最终向上层应用或外部设备输出图像信号和语音信号。Method 2 (corresponding to the second method of the foregoing transmitting end): the receiving end sends the received image feature information to the feature synthesizing module, synthesizes the image image based on the feature database, and further decodes the received speech encoding information to obtain a speech signal, and finally Output image signals and voice signals to an upper layer application or an external device.
步骤S504:接收端建立特征数据库;Step S504: The receiving end establishes a feature database;
接收端得到视频信号副本后,根据信号内的控制信息判断目前的通信模式和特征提取模式。得知目前是加密通信后,模块开始进行特征提取操作,并将视频图像进行切割等。同时,对同一时间的语音信号进行语义分析,分析出此时用户的语气语态特征,与同一时间的图像特征进行匹配后,将图像特征和语义特征一对一配对,然后传递到特征数据库,完成特征数据库建模。After receiving the copy of the video signal, the receiving end determines the current communication mode and the feature extraction mode according to the control information in the signal. After learning that the communication is currently encrypted, the module starts the feature extraction operation and cuts the video image. At the same time, the semantic analysis of the speech signal at the same time is carried out, and the mood characteristics of the user at this time are analyzed. After matching with the image features at the same time, the image features and semantic features are paired one by one and then transmitted to the feature database. Complete feature database modeling.
接收端的视频应用直接接收视频信号,进行通信。The video application at the receiving end directly receives the video signal and communicates.
步骤S505:通信模式切换为模拟通信,继续视频通信;Step S505: the communication mode is switched to analog communication, and the video communication is continued;
信道质量变差,已经低于预设的阈值,模式控制模块将系统自动切换到模拟通信。The channel quality deteriorates and is already below the preset threshold. The mode control module automatically switches the system to analog communication.
发送端视频特征提取模块得到模拟通信指令后,结合语义分析和图像分析等手段,判断用户表情状态,提取出视频画面中用户的表情特征,然后将当前表情特征以事先约定好的特征代号或者特征表示法来代替,然后匹配好语音信号后传递到发送端。 After the video feature extraction module of the sender obtains the analog communication command, combined with semantic analysis and image analysis, the user's expression state is judged, the user's expression feature in the video image is extracted, and the current expression feature is pre-agreed with the feature code or feature. Representation instead, then match the voice signal and pass it to the sender.
发送端直接将特征提取模块传递来的压缩视频信号送入信道,此时发送端的主通道没有任何视频信息传递。The transmitting end directly sends the compressed video signal transmitted by the feature extraction module to the channel, and at this time, the main channel of the transmitting end does not transmit any video information.
接收端的模式选择模块在得到视频信号后将信号直接送入接收端视频特征提取模块,同时切断主通道。接收端特征提取模块根据信号中的代号或者特征表示法,从特征数据库中提取出在正常通信状态下保存的用户表情图像模板,送入特征合成模块进行图像合成。特征合成模块得到图像模板后按照特征信息进行图像合成,然后结合语音后直接送入视频应用,完成通信。The mode selection module at the receiving end sends the signal directly to the receiving end video feature extraction module after obtaining the video signal, and simultaneously cuts off the main channel. The receiver end feature extraction module extracts the user emoticon image template saved in the normal communication state from the feature database according to the code name or feature representation in the signal, and sends the feature synthesizing module to the image synthesizing module. After obtaining the image template, the feature synthesis module performs image synthesis according to the feature information, and then directly combines the voice into the video application to complete the communication.
步骤S506:通信模式切换为极差信道通信,继续视频通信;Step S506: The communication mode is switched to the poor channel communication, and the video communication is continued;
信道质量变差,已经低于预设的阈值,模式控制模块将系统自动切换到极差信道通信模式。The channel quality deteriorates and is already below the preset threshold. The mode control module automatically switches the system to the poor channel communication mode.
发送端的特征提取模块会将视频信号中的画面信息直接剥除;对语音信号进行大幅度压缩,直接通过发送端送入信道。The feature extraction module at the transmitting end directly strips the picture information in the video signal; the voice signal is greatly compressed and directly sent to the channel through the transmitting end.
接收端的模式控制模块接收到语音信号后直接送入视频特征提取模块,同时切断主通道。接收端视频特征处理模块对接收到的语音信号进行语义分析,提取出特征代号或者特征表示法,从特征数据库中提取出在正常通信状态下保存的用户表情图像模板,送入特征合成模块进行图像合成。特征合成模块得到图像模板后按照特征信息进行图像合成,然后结合语音后直接送入接收端的视频应用,完成通信。After receiving the voice signal, the mode control module of the receiving end directly sends the video feature extraction module, and simultaneously cuts off the main channel. The receiving end video feature processing module performs semantic analysis on the received speech signal, extracts a feature code or feature representation method, extracts a user emoticon image template saved in a normal communication state from the feature database, and sends the image to the feature synthesizing module to perform an image. synthesis. After the feature synthesis module obtains the image template, the image is synthesized according to the feature information, and then combined with the voice and directly sent to the video application of the receiving end to complete the communication.
场景2Scene 2
假设该场景下,用户已经建立特征数据库,并且用户确定本次通信要求部分信息对接收方保密。Assume that in this scenario, the user has established a feature database, and the user determines that the communication request part information is confidential to the recipient.
在本实施例中,本发明实施例提供的视频通信方法包括:In this embodiment, the video communication method provided by the embodiment of the present invention includes:
步骤S601:用户设置通信模式。Step S601: The user sets the communication mode.
发送端的视频应用在正式建立视频通信前,用户确定本次通信要求部分信息对接收方保密(例如全部或指定人眼保密、全部或指定人脸保密、或全部或指定背景保密),建立的视频通信可能双人通信、视频会议,尤其是多人视频会议等情况。模式控制模块将用户配置传递到视频图像特征提取模块;摄像头等图像采集设备打开,开始采集视觉信号。视频先进入视频图像特征 提取模块,不通过主通道进入发送端。同时,发送端向接收端发出连接请求,请求进行视频通信。Before the video communication is formally established, the user determines that the communication request part of the information is confidential to the recipient (for example, all or specified eyes are kept secret, all or specified faces are kept secret, or all or specified background is kept secret), the established video Communication may involve two-person communication, video conferencing, and especially multi-person video conferencing. The mode control module passes the user configuration to the video image feature extraction module; the image capture device such as the camera is turned on, and the visual signal is collected. Video first enters video image features The extraction module does not enter the sender through the main channel. At the same time, the sender sends a connection request to the receiver to request video communication.
步骤S602:发送端对视频数据加密传输;Step S602: The transmitting end encrypts and transmits the video data.
发送端对每一帧画面进行检测,按照用户需求找到需要加密的特征后,对画面进行切割,通过隐藏或替换或模糊需要加密的特征的方法提取出可传输的图像,然后将提取后的视频图像进行编码,然后和语音编码一起送到发送端进入信道。The sender detects each frame of the picture, finds the feature to be encrypted according to the user's needs, cuts the picture, extracts the transmittable image by hiding or replacing or blurring the feature that needs to be encrypted, and then extracts the extracted video. The image is encoded and then sent to the sender to enter the channel along with the speech code.
步骤S603:接收端接收视频数据。Step S603: The receiving end receives the video data.
接收端对接收到的图像编码信息进行解码得到图像信号,对接收到的语音编码信息进行解码得到语音信号,输出上层应用或外部设备输出图像信号和语音信号。The receiving end decodes the received image coding information to obtain an image signal, decodes the received voice coding information to obtain a voice signal, and outputs an upper layer application or an external device to output an image signal and a voice signal.
综上可知,通过本发明实施例的技术方案,至少存在以下有益效果:In summary, according to the technical solution of the embodiment of the present invention, at least the following beneficial effects exist:
本发明实施例提供了一种新的视频通信方法,发送端通过对采集到的视频数据进行分离,得到视频图像及语音信号,对视频图像进行图像语义特征处理处理得到图像语义特征信息,发送图像语义特征信息及语音编码信息,接收端调用图像语义特征数据库,根据图像语义特征信息还原视频图像,将其与语音信号输出完成视频接收。上述技术方案,由于在传输过程中,仅传输图像语义特征信息及语音编码信息,这样与直接传输视频数据的方式相比,大大降低了对通信资源的要求,在信道质量较差时,也可以继续正常视频图像,解决了相关技术中视频通信在信道质量差时存在的无法正常视频的问题,增强了用户的使用体验。The embodiment of the invention provides a new video communication method. The transmitting end separates the collected video data to obtain a video image and a voice signal, and performs image semantic feature processing on the video image to obtain image semantic feature information, and sends the image. Semantic feature information and speech coding information, the receiving end calls the image semantic feature database, restores the video image according to the image semantic feature information, and outputs the video signal with the speech signal output. In the above technical solution, since only the image semantic feature information and the voice coding information are transmitted during the transmission process, the requirement for communication resources is greatly reduced compared with the method of directly transmitting the video data, and when the channel quality is poor, The normal video image is continued, which solves the problem that the video communication in the related art has an abnormal video when the channel quality is poor, and enhances the user experience.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令被执行时实现发送端侧的视频通信方法。The embodiment of the invention further provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method on the transmitting end side is implemented.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令被执行时实现接收端侧的视频通信方法。The embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed, the video communication method at the receiving end side is implemented.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序 来指令相关硬件(例如处理器)完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,例如通过集成电路来实现其相应功能,也可以采用软件功能模块的形式实现,例如通过处理器执行存储于存储器中的程序/指令来实现其相应功能。本申请不限制于任何特定形式的硬件和软件的结合。本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或者等同替换,而不脱离本申请技术方案的精神和范围,均应涵盖在本申请的权利要求范围当中。One of ordinary skill in the art will appreciate that all or part of the steps in the above methods may be passed through the program. The instructions are related to hardware (eg, a processor) that can be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the above embodiment may be implemented in the form of hardware, for example, by implementing an integrated circuit to implement its corresponding function, or may be implemented in the form of a software function module, for example, executing a program stored in the memory by a processor. / instruction to achieve its corresponding function. This application is not limited to any specific combination of hardware and software. A person skilled in the art should understand that the technical solutions of the present application can be modified or equivalent, without departing from the spirit and scope of the technical solutions of the present application, and should be included in the scope of the claims of the present application.
工业实用性Industrial applicability
上述技术方案,大大降低了对通信资源的要求,在信道质量较差时,也可以继续正常视频图像,解决了相关技术中视频通信在信道质量差时存在的无法正常视频的问题,增强了用户的使用体验。 The above technical solution greatly reduces the requirement for communication resources, and can continue normal video images when the channel quality is poor, and solves the problem that the video communication has an abnormal video when the channel quality is poor in the related art, and the user is enhanced. The experience of using.

Claims (20)

  1. 一种视频通信方法,包括:A video communication method includes:
    采集视频图像及语音信号;Collecting video images and voice signals;
    对所述视频图像进行图像语义特征处理,获取图像语义特征信息;对所述语音信号进行编码,获取语音编码信息;Performing image semantic feature processing on the video image to acquire image semantic feature information; encoding the voice signal to obtain voice coding information;
    发送所述图像语义特征信息及所述语音编码信息。And transmitting the image semantic feature information and the voice coding information.
  2. 如权利要求1所述的视频通信方法,The video communication method according to claim 1,
    在对所述视频图像进行图像语义特征处理之前还包括:获取通信信道的信道信息,根据所述信道信息判断是否需要对所述视频图像进行图像语义特征处理;Before performing image semantic feature processing on the video image, the method further includes: acquiring channel information of the communication channel, and determining, according to the channel information, whether image semantic feature processing is required on the video image;
    若不需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行编码,获取图像编码信息,发送所述图像编码信息及所述语音编码信息;If the video image feature processing is not required to be performed on the video image, encoding the video image, acquiring image encoding information, and transmitting the image encoding information and the voice encoding information;
    若需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行图像语义特征处理。If image semantic feature processing is required on the video image, image semantic feature processing is performed on the video image.
  3. 如权利要求2所述的视频通信方法,The video communication method according to claim 2,
    所述发送所述图像语义特征信息及所述语音编码信息之前还包括:根据所述信道信息判断是否满足发送所述图像语义特征信息的条件;Before the sending the image semantic feature information and the voice encoding information, the method further includes: determining, according to the channel information, whether a condition for transmitting the image semantic feature information is met;
    若满足发送所述图像语义特征信息的条件,则发送所述图像语义特征信息及所述语音编码信息一起发送;Sending the image semantic feature information and the voice coding information together if the condition for transmitting the image semantic feature information is met;
    若不满足发送所述图像语义特征信息的条件,则仅发送所述语音编码信息;If the condition for transmitting the image semantic feature information is not satisfied, only the voice coding information is sent;
    所述发送所述图像编码信息及所述语音编码信息之前还包括:Before the sending the image encoding information and the voice encoding information, the method further includes:
    根据所述信道信息判断是否满足发送所述图像编码信息的条件;Determining, according to the channel information, whether a condition for transmitting the image encoding information is satisfied;
    若满足发送所述图像编码信息的条件,则发送所述图像编码信息及所述语音编码信息;And if the condition for transmitting the image coding information is met, transmitting the image coding information and the voice coding information;
    若不满足发送所述图像编码信息的条件,则仅发送所述语音编码信息。If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted.
  4. 如权利要求1所述的视频通信方法,在对所述视频图像进行图像语义 特征处理之前还包括:The video communication method according to claim 1, wherein image semantics are performed on said video image Before feature processing, it also includes:
    接收用户的控制操作,根据所述控制操作判断是否需要对用户的图像语义特征进行保密;Receiving a control operation of the user, and determining, according to the control operation, whether the image semantic feature of the user needs to be kept secret;
    若需要对用户的图像语义特征进行保密,则对所述视频图像进行图像语义特征处理,隐藏或替换或模糊所述用户的图像语义特征,生成所述图像语义特征信息;If the image semantic feature of the user needs to be kept secret, image semantic feature processing is performed on the video image, and the image semantic feature of the user is hidden or replaced or blurred, and the image semantic feature information is generated;
    若不需要对用户的图像语义特征进行保密,则对所述视频图像进行编码处理,获取图像编码信息,发送所述图像编码信息及所述语音编码信息。If the image semantic feature of the user is not required to be kept secret, the video image is encoded, the image encoding information is acquired, and the image encoding information and the voice encoding information are transmitted.
  5. 如权利要求1至4任一项所述的视频通信方法,所述方法还包括:将图像数据处理模式通过控制信息发送出去;The video communication method according to any one of claims 1 to 4, further comprising: transmitting the image data processing mode through the control information;
    所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。The image data processing mode includes processing based on image semantic features, or based on image encoding processing, or based on speech analysis processing.
  6. 一种视频通信方法,包括:A video communication method includes:
    接收图像语义特征信息及语音编码信息;Receiving image semantic feature information and voice coding information;
    调用图像语义特征数据库,根据所述图像语义特征信息生成视频图像;其中,所述图像语义特征数据库包括所述图像语义特征信息与视频图像碎片的映射关系;Calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment;
    根据所述语音编码信息生成语音信号;Generating a voice signal according to the voice encoded information;
    输出所述视频图像及所述语音信号。The video image and the voice signal are output.
  7. 如权利要求6所述的视频通信方法,所述方法还包括:The video communication method of claim 6, the method further comprising:
    接收并解析控制信息,获取图像数据处理模式;Receiving and parsing control information to obtain an image data processing mode;
    根据所述图像数据处理模式处理接收到的数据并输出;其中,所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。The received data is processed and output according to the image data processing mode; wherein the image data processing mode includes: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
  8. 如权利要求7所述的视频通信方法,若所述图像数据处理模式包括基于语音分析处理时,所述方法还包括:对所述语音编码信息进行语义分析,转化为图像语义特征信息,并根据所述图像语义特征数据库生成视频图像。 The video communication method according to claim 7, wherein if the image data processing mode comprises speech-based analysis processing, the method further comprises: performing semantic analysis on the speech encoded information, converting into image semantic feature information, and The image semantic feature database generates a video image.
  9. 如权利要求6至8任一项所述的视频通信方法,所述方法还包括:接收正常的视频数据,根据所述正常的视频数据建立所述图像语义特征数据库。The video communication method according to any one of claims 6 to 8, the method further comprising: receiving normal video data, and establishing the image semantic feature database based on the normal video data.
  10. 一种视频通信方法,包括:A video communication method includes:
    发送端采集视频图像及语音信号;对所述视频图像进行图像语义特征处理,获取图像语义特征信息;对所述语音信号进行编码,获取语音编码信息;发送所述图像语义特征信息及所述语音编码信息;The transmitting end acquires a video image and a voice signal; performs image semantic feature processing on the video image to obtain image semantic feature information; encodes the voice signal to obtain voice coding information; and sends the image semantic feature information and the voice Coded information;
    接收端接收图像语义特征信息及语音编码信息;调用图像语义特征数据库,根据所述图像语义特征信息生成视频图像;其中,所述图像语义特征数据库包括所述图像语义特征信息与视频图像碎片的映射关系;根据所述语音编码信息生成语音信号;输出所述视频图像及所述语音信号。Receiving, by the receiving end, image semantic feature information and voice encoding information; calling an image semantic feature database, and generating a video image according to the image semantic feature information; wherein the image semantic feature database includes mapping of the image semantic feature information and the video image fragment a relationship; generating a speech signal based on the speech encoded information; outputting the video image and the speech signal.
  11. 一种视频通信装置,包括:A video communication device comprising:
    采集模块,设置为采集视频图像及语音信号;The acquisition module is configured to collect video images and voice signals;
    处理模块,设置为对所述视频图像进行图像语义特征处理,获取图像语义特征信息;对所述语音信号进行编码,获取语音编码信息;a processing module, configured to perform image semantic feature processing on the video image, acquire image semantic feature information, and encode the voice signal to obtain voice coding information;
    发送模块,设置为发送所述图像语义特征信息及所述语音编码信息。And a sending module, configured to send the image semantic feature information and the voice encoding information.
  12. 如权利要求11所述的视频通信装置,所述装置还包括:The video communication device of claim 11 further comprising:
    判断模块,设置为获取通信信道的信道信息,根据所述信道信息判断是否需要对所述视频图像进行图像语义特征处理;若不需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行编码,获取图像编码信息,发送所述图像编码信息及所述语音编码信息;若需要对所述视频图像进行图像语义特征处理,则对所述视频图像进行图像语义特征处理,获取所述图像语义特征信息。The determining module is configured to obtain channel information of the communication channel, and determine, according to the channel information, whether image semantic feature processing is required for the video image; if image semantic feature processing is not required for the video image, the video is Encoding an image, acquiring image encoding information, and transmitting the image encoding information and the speech encoding information; if image semantic feature processing is required on the video image, performing image semantic feature processing on the video image to obtain the image Image semantic feature information.
  13. 如权利要求12所述的视频通信装置,所述判断模块还设置为:The video communication device of claim 12, wherein the determining module is further configured to:
    根据所述信道信息判断是否满足发送所述图像语义特征信息的条件,若满足发送所述图像语义特征信息的条件,则发送所述图像语义特征信息及所述语音编码信息一起发送,若不满足发送所述图像语义特征信息的条件,则仅发送所述语音编码信息;Determining, according to the channel information, whether the condition for transmitting the image semantic feature information is met, and if the condition for transmitting the image semantic feature information is met, sending the image semantic feature information and the voice coding information together, if not satisfied Sending the condition of the image semantic feature information, and transmitting only the voice coding information;
    根据所述信道信息判断是否满足发送所述图像编码信息的条件; Determining, according to the channel information, whether a condition for transmitting the image encoding information is satisfied;
    若满足发送所述图像编码信息的条件,则发送所述图像编码信息及所述语音编码信息;And if the condition for transmitting the image coding information is met, transmitting the image coding information and the voice coding information;
    若不满足发送所述图像编码信息的条件,则仅发送所述语音编码信息。If the condition for transmitting the image encoding information is not satisfied, only the speech encoding information is transmitted.
  14. 如权利要求11所述的视频通信装置,所述装置还包括:The video communication device of claim 11 further comprising:
    加密模块,设置为接收用户的控制操作,根据所述控制操作判断是否需要对用户的图像语义特征进行保密;若需要对用户的图像语义特征进行保密,则触发所述处理模块对所述视频图像进行图像语义特征处理,隐藏或替换或模糊所述用户的图像语义特征,生成所述图像语义特征信息;若不需要对用户的图像语义特征进行保密,则触发所述处理模块对所述视频图像进行编码处理,获取图像编码信息,发送所述图像编码信息及所述语音编码信息。The cryptographic module is configured to receive a control operation of the user, and determine, according to the control operation, whether the image semantic feature of the user needs to be kept secret; if the image semantic feature of the user needs to be kept secret, triggering, by the processing module, the video image Performing image semantic feature processing to hide or replace or blur the image semantic feature of the user to generate the image semantic feature information; if the image semantic feature of the user is not required to be kept secret, triggering the processing module to the video image Performing an encoding process, acquiring image encoding information, and transmitting the image encoding information and the speech encoding information.
  15. 如权利要求11至14任一项所述的视频通信装置,其中:所述发送模块还设置为将图像数据处理模式通过控制信息发送出去;其中,所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。The video communication device according to any one of claims 11 to 14, wherein: said transmitting module is further configured to transmit an image data processing mode through control information; wherein said image data processing mode comprises: based on image semantic features Processing, or based on image encoding processing, or based on speech analysis processing.
  16. 一种视频通信装置,包括:A video communication device comprising:
    接收模块,设置为接收图像语义特征信息及语音编码信息;a receiving module, configured to receive image semantic feature information and voice encoding information;
    还原模块,设置为调用图像语义特征数据库,根据所述图像语义特征信息生成视频图像;其中,所述图像语义特征数据库包括所述图像语义特征信息与视频图像碎片的映射关系;根据所述语音编码信息生成语音信号;a restoration module, configured to invoke an image semantic feature database, to generate a video image according to the image semantic feature information; wherein the image semantic feature database includes a mapping relationship between the image semantic feature information and the video image fragment; Information generating a voice signal;
    输出模块,设置为输出所述视频图像及所述语音信号。And an output module configured to output the video image and the voice signal.
  17. 如权利要求16所述的视频通信装置,其中:所述还原模块还设置为接收并解析控制信息,获取图像数据处理模式;根据所述图像数据处理模式处理接收到的数据并输出;其中,所述图像数据处理模式包括:基于图像语义特征处理,或者,基于图像编码处理,或者,基于语音分析处理。The video communication device according to claim 16, wherein: said restoration module is further configured to receive and parse control information, acquire an image data processing mode; process the received data according to said image data processing mode and output; The image data processing mode includes: based on image semantic feature processing, or based on image encoding processing, or based on speech analysis processing.
  18. 如权利要求17所述的视频通信装置,其中:所述还原模块还设置为若所述图像数据处理模式为基于语音分析处理,则对所述语音编码信息进行语义分析,转化为图像语义特征信息,并根据所述图像语义特征数据库生成视频图像。 The video communication device according to claim 17, wherein: said restoration module is further configured to perform semantic analysis on said speech coding information and convert it into image semantic feature information if said image data processing mode is based on speech analysis processing And generating a video image according to the image semantic feature database.
  19. 如权利要求16至18任一项所述的视频通信装置,所述装置还包括训练模块,设置为接收正常的视频数据,根据所述正常的视频数据建立所述图像语义特征数据库。The video communication device according to any one of claims 16 to 18, further comprising a training module configured to receive normal video data and to establish the image semantic feature database based on the normal video data.
  20. 一种视频通信系统,包括:发送端和接收端;A video communication system includes: a transmitting end and a receiving end;
    所述发送端包括如权利要求11至15任一项所述的视频通信装置,所述接收端包括如权利要求16至19任一项所述的视频通信装置。 The transmitting end comprises the video communication device according to any one of claims 11 to 15, and the receiving end comprises the video communication device according to any one of claims 16 to 19.
PCT/CN2016/095549 2015-09-25 2016-08-16 Video communication method, apparatus, and system WO2017050067A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510623739.2A CN106559636A (en) 2015-09-25 2015-09-25 A kind of video communication method, apparatus and system
CN201510623739.2 2015-09-25

Publications (1)

Publication Number Publication Date
WO2017050067A1 true WO2017050067A1 (en) 2017-03-30

Family

ID=58385849

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/095549 WO2017050067A1 (en) 2015-09-25 2016-08-16 Video communication method, apparatus, and system

Country Status (2)

Country Link
CN (1) CN106559636A (en)
WO (1) WO2017050067A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256751A (en) * 2021-06-01 2021-08-13 平安科技(深圳)有限公司 Voice-based image generation method, device, equipment and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109698850B (en) * 2017-10-23 2022-06-07 北京京东尚科信息技术有限公司 Processing method and system
CN109831638B (en) * 2019-01-23 2021-01-08 广州视源电子科技股份有限公司 Video image transmission method and device, interactive intelligent panel and storage medium
CN111934823B (en) * 2020-08-12 2022-08-02 中国联合网络通信集团有限公司 Data transmission method, radio access network equipment and user plane functional entity
US20220374637A1 (en) * 2021-05-20 2022-11-24 Nvidia Corporation Synthesizing video from audio using one or more neural networks
CN113246991B (en) * 2021-06-29 2021-11-30 新石器慧通(北京)科技有限公司 Data transmission method and device for remote driving end of unmanned vehicle
CN114866192A (en) * 2022-05-31 2022-08-05 电子科技大学 A signal transmission method based on features and related information
CN115223566A (en) * 2022-07-19 2022-10-21 中国电信股份有限公司 A voice transmission method, system and device
CN116029340B (en) * 2023-01-13 2023-06-02 香港中文大学(深圳) A method of image and semantic information transmission based on deep learning network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040025029A (en) * 2002-09-18 2004-03-24 (주)아이엠에이테크놀로지 Image Data Transmission Method through Inputting Data of Letters in Wired/Wireless Telecommunication Devices
US20090315974A1 (en) * 2008-06-23 2009-12-24 Lucent Technologies, Inc. Video conferencing device for a communications device and method of manufacturing and using the same
CN102271241A (en) * 2011-09-02 2011-12-07 北京邮电大学 Image communication method and system based on facial expression/action recognition
CN103841358A (en) * 2012-11-23 2014-06-04 中兴通讯股份有限公司 Low-code-stream video conference system and method, sending-end device and receiving-end device
CN104333730A (en) * 2014-11-26 2015-02-04 北京奇艺世纪科技有限公司 Video communication method and video communication device
CN104618721A (en) * 2015-01-28 2015-05-13 山东大学 Ultra-low code rate face video coding and decoding method based on feature modeling
CN104902212A (en) * 2015-04-30 2015-09-09 努比亚技术有限公司 Video communication method and apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8264521B2 (en) * 2007-04-30 2012-09-11 Cisco Technology, Inc. Media detection and packet distribution in a multipoint conference
CN101764987A (en) * 2008-12-08 2010-06-30 新奥特硅谷视频技术有限责任公司 Method of remote court trial and device thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040025029A (en) * 2002-09-18 2004-03-24 (주)아이엠에이테크놀로지 Image Data Transmission Method through Inputting Data of Letters in Wired/Wireless Telecommunication Devices
US20090315974A1 (en) * 2008-06-23 2009-12-24 Lucent Technologies, Inc. Video conferencing device for a communications device and method of manufacturing and using the same
CN102271241A (en) * 2011-09-02 2011-12-07 北京邮电大学 Image communication method and system based on facial expression/action recognition
CN103841358A (en) * 2012-11-23 2014-06-04 中兴通讯股份有限公司 Low-code-stream video conference system and method, sending-end device and receiving-end device
CN104333730A (en) * 2014-11-26 2015-02-04 北京奇艺世纪科技有限公司 Video communication method and video communication device
CN104618721A (en) * 2015-01-28 2015-05-13 山东大学 Ultra-low code rate face video coding and decoding method based on feature modeling
CN104902212A (en) * 2015-04-30 2015-09-09 努比亚技术有限公司 Video communication method and apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256751A (en) * 2021-06-01 2021-08-13 平安科技(深圳)有限公司 Voice-based image generation method, device, equipment and storage medium
CN113256751B (en) * 2021-06-01 2023-09-29 平安科技(深圳)有限公司 Voice-based image generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106559636A (en) 2017-04-05

Similar Documents

Publication Publication Date Title
WO2017050067A1 (en) Video communication method, apparatus, and system
US20230245661A1 (en) Video conference captioning
EP3008903B1 (en) Screen map and standards-based progressive codec for screen content coding
US20190342241A1 (en) Systems and methods for manipulating and/or concatenating videos
CN111935443B (en) Method and device for sharing instant messaging tool in real-time live broadcast of video conference
CN110430441B (en) Cloud mobile phone video acquisition method, system, device and storage medium
US7996540B2 (en) Method and system for replacing media stream in a communication process of a terminal
CN108932948B (en) Audio data processing method and device, computer equipment and computer readable storage medium
JP2006262484A (en) Image composition method and apparatus during image communication
JP2005033664A (en) Communication device and its operation control method
CN108040061A (en) A kind of cloud meeting live broadcasting method
US7508413B2 (en) Video conference data transmission device and data transmission method adapted for small display of mobile terminals
WO2014048352A1 (en) Method and terminal for transmitting information used in instant messaging applications
CN112135155A (en) Audio and video connecting and converging method and device, electronic equipment and storage medium
CN102223406A (en) System and method for network-based digitalized real-time transmission of video information
US11165989B2 (en) Gesture and prominence in video conferencing
WO2014155710A1 (en) Communication control system, communication control method, communication control program, terminal, and program for terminal
CN113709528B (en) Play control method, play configuration device, electronic equipment and storage medium
EP3985989A1 (en) Detection of modification of an item of content
KR100703354B1 (en) Image data transmission method in video call mode of mobile terminal
CN115967818B (en) Cloud device live broadcast method, system and computer readable storage medium
CN117880253B (en) Method and device for processing call captions, electronic equipment and storage medium
EP4375947A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
CN117675960A (en) Communication system and communication data processing method
Liang¹ et al. Layered System Architecture for Covert

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16847949

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16847949

Country of ref document: EP

Kind code of ref document: A1