CN112751849B - Conference control method, device, equipment and medium based on video networking - Google Patents
Conference control method, device, equipment and medium based on video networking Download PDFInfo
- Publication number
- CN112751849B CN112751849B CN202011590205.1A CN202011590205A CN112751849B CN 112751849 B CN112751849 B CN 112751849B CN 202011590205 A CN202011590205 A CN 202011590205A CN 112751849 B CN112751849 B CN 112751849B
- Authority
- CN
- China
- Prior art keywords
- audio stream
- terminal
- participant terminal
- video
- conference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000006855 networking Effects 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000006243 chemical reaction Methods 0.000 claims abstract description 25
- 238000003860 storage Methods 0.000 claims abstract description 11
- 230000005540 biological transmission Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 13
- 230000015654 memory Effects 0.000 claims description 12
- 230000004044 response Effects 0.000 claims description 10
- 238000011084 recovery Methods 0.000 claims description 5
- 230000006854 communication Effects 0.000 abstract description 14
- 238000004891 communication Methods 0.000 abstract description 13
- 238000005516 engineering process Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000000007 visual effect Effects 0.000 description 9
- 238000011144 upstream manufacturing Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000011664 signaling Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000002155 anti-virotic effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/08—Protocols for interworking; Protocol conversion
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the invention provides a conference control method, a device, electronic equipment and a storage medium based on video networking, which are applied to the technical field of communication and aim to improve the efficiency of conference control, and comprise a conference control end positioned in the Internet and a server connected with the conference control end; the method is applied to a server and comprises the following steps: receiving an encoder control instruction obtained after video networking protocol conversion is carried out on a mute request sent by the conference control terminal; determining a first reference terminal and at least one second reference terminal corresponding to the encoder control instruction; searching a target mixed audio stream sent to a first participant terminal from the current multipath mixed audio streams; updating the decoding marks of the audio streams belonging to the second participant terminals in the target mixed audio stream to preset marks so that the first participant terminal discards the audio streams belonging to the second participant terminals in the target mixed audio stream according to the preset marks.
Description
Technical Field
The present invention relates to the field of network communications technologies, and in particular, to a method, an apparatus, a device, and a medium for controlling a conference based on video networking.
Background
The video networking adopts the most advanced Vision Vera real-time high-definition video exchange technology worldwide, so that the real-time transmission of the full-network high-definition video which cannot be realized by the current Internet is realized, tens of services such as high-definition video conference, video monitoring, remote training, intelligent monitoring analysis, emergency command, video telephone, live broadcast, television mail, information release and the like are all integrated into one system platform, and the real-time interconnection and intercommunication of high-definition quality video communication are realized through various terminal equipment.
In the related art, in a conference held by the internet of view and the internet, during the progress of the conference, some participant terminals are required to be unable to hear some speaking contents of some important terminals or to be permitted to speak. Based on such a requirement, the microphone of the terminal that is not allowed to speak is usually turned off manually, or the audio channel between the important terminal and other terminals is cut off, however, this mode needs to be equipped with professional conference service personnel and has complicated operation, which affects the normal progress of the conference.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention provide a conference control method, apparatus, electronic device, and storage medium based on internet of view, so as to overcome or at least partially solve the foregoing problems.
The first aspect of the embodiment of the invention discloses a conference control method based on video networking, which comprises a conference control end positioned in the Internet and a server connected with the conference control end; the method is applied to the server and comprises the following steps:
receiving an encoder control instruction obtained after video networking protocol conversion is carried out on a mute request sent by the conference control terminal;
determining a first reference terminal and at least one second reference terminal corresponding to the encoder control instruction;
searching a target mixed audio stream sent to the first participant terminal from the current multipath mixed audio streams; each mixed audio stream in the multiple mixed audio streams is an audio stream obtained by mixing audio streams sent by a plurality of current speaking terminals;
updating the decoding mark of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a preset mark, so that the first participant terminal discards the audio stream belonging to the at least one second participant terminal in the target mixed audio stream according to the preset mark.
Optionally, updating the decoding flag of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a preset flag includes:
Determining a target audio stream belonging to the at least one second participant terminal among the target mixed audio streams;
setting the decoding mark of each target audio stream as a preset mark, and keeping the decoding marks of other audio streams except the target audio stream in the target mixed audio stream unchanged.
Optionally, the mute request further carries a mute duration, and the encoder control instruction includes the mute duration; the method further comprises the steps of:
generating an audio recovery instruction when determining that the transmission time length of the target mixed audio stream after updating the decoding mark reaches the mute time length;
and based on the audio restoration instruction, restoring the decoding mark of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a decodable mark so that the first participant terminal decodes and plays the audio stream belonging to the at least one second participant terminal in the target mixed audio stream.
Optionally, the method further comprises:
receiving a visual network speaking control instruction obtained after performing visual network protocol conversion on the speaking control instruction sent by the conference control end; the video networking speaking control instruction carries the identification and speaking duration of a third participant terminal to be controlled;
Responding to the video-on-internet speech control instruction, and recording the sending duration of the audio stream belonging to the third participant terminal in the multi-channel mixed audio stream from the current moment;
when the sending time of the audio stream of the third participant terminal reaches the speaking time, updating the decoding mark of the audio stream belonging to the third participant terminal in the multi-path mixed audio stream to the preset mark, so that other participant terminals in the video conference discard the audio stream belonging to the third participant terminal according to the preset mark.
Optionally, after updating the decoding flag of the audio stream of the multiplexed audio stream belonging to the third participant terminal to the preset flag, the method further includes:
receiving an Internet-of-view speaking instruction obtained after performing Internet-of-view protocol conversion on the speaking instruction sent by the conference control terminal, wherein the speaking instruction is sent to the conference control terminal by the third participant terminal;
and responding to the video networking speaking instruction, respectively recovering the decoding marks of the audio streams belonging to the third participant terminal in the multi-path mixed audio stream to be decodable marks so as to enable other participant terminals in the video conference to decode and play the audio streams belonging to the third participant terminal.
Optionally, the method further comprises:
receiving a video-networking conference exit instruction obtained after video-networking protocol conversion is carried out on the conference exit instruction sent by the conference control terminal, wherein the video-networking conference exit instruction carries an identification of a fourth participant terminal exiting the video conference;
and responding to the video-on-net meeting exit instruction, closing an audio channel between the video-on-net meeting exit instruction and the fourth participant terminal, so that each mixed audio stream does not contain the audio stream of the fourth participant terminal.
In a second aspect of the embodiments of the present application, a conference control device based on internet of view is provided, including a conference control end located in the internet, and a server connected to the conference control end, where the device is located in the server, and includes:
the request receiving module is used for receiving an encoder control instruction obtained after video networking protocol conversion is carried out on the mute request sent by the conference control terminal;
the determining module is used for determining a first reference terminal and at least one second reference terminal corresponding to the encoder control instruction;
the instruction response module is used for searching a target mixed audio stream sent to the first participant terminal from the current multipath mixed audio streams; each mixed audio stream in the multiple mixed audio streams is audio obtained by mixing audio streams sent by a plurality of current speaking terminals by the video networking server;
And the mark changing module is used for updating the decoding mark of the audio stream of the target mixed audio stream belonging to the at least one second participant terminal into a preset mark so that the first participant terminal discards the audio stream belonging to the at least one second participant terminal in the mixed audio stream according to the preset mark.
The embodiment of the invention also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the conference control method based on the video networking in the first aspect or the second aspect when executing.
The embodiment of the invention also discloses a computer readable storage medium, and a stored computer program causes a processor to execute the conference control method based on the video networking according to the first aspect or the second aspect of the embodiment of the invention.
The embodiment of the invention has the following advantages:
in this embodiment, the server may search for the target mixed audio stream sent to the first participant terminal from the current multiple mixed audio streams based on the encoder control instruction, and update the decoding flag of the audio stream belonging to the second participant terminal in the target mixed audio stream to a preset flag, so that the first participant terminal discards the audio stream belonging to each second participant terminal in the target mixed audio stream according to the preset flag. Therefore, the first participant terminal can not play the audio streams of the second participant terminals, and therefore the mute requirement of the second participant terminal on the first participant terminal is met. Because the decoding mark of the audio stream can be set to control whether the audio stream is mute relative to the receiving end, the whole mute process can be automatically carried out by each terminal, the problem of high labor cost caused by manually closing the audio channel is solved, and the conference control efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a networking of the present invention for a video networking;
FIG. 2 is a schematic diagram of the hardware architecture of a node server according to the present invention;
FIG. 3 is a schematic diagram of the hardware architecture of an access switch of the present invention;
fig. 4 is a schematic hardware structure of an ethernet corotation gateway according to the present invention;
FIG. 5 is a schematic diagram of a communication architecture of the present invention;
fig. 6 is a schematic flow chart of steps of a conference control method based on video networking of the present invention;
fig. 7 is a schematic structural diagram of a conference control device based on video networking in the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
For a better understanding of embodiments of the present invention, the following description of the video networking is presented to one skilled in the art:
the video networking is an important milestone for network development, is a real-time network, can realize real-time transmission of high-definition videos, and pushes numerous internet applications to high-definition videos, and the high definition faces.
The video networking adopts a real-time high-definition video exchange technology, and can integrate all required services such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delay television, network teaching, live broadcast, VOD on demand, television mail, personal record (PVR), intranet (self-processing) channel, intelligent video playing control, information release and other tens of services into one system platform, and realize high-definition quality video playing through television or computer.
For a better understanding of embodiments of the present invention, the following description of the video networking is presented to one skilled in the art:
the partial techniques applied by the video networking are as follows:
network technology (Network Technology)
The network technology innovation of the internet of vision improves on the traditional Ethernet (Ethernet) to face the potentially huge first video traffic on the network. Unlike pure network Packet Switching (Packet Switching) or network circuit Switching (Circuit Switching), the technology of video networking employs Packet Switching to meet Streaming requirements. The video networking technology has the flexibility, simplicity and low price of packet switching, and simultaneously has the quality and the safety guarantee of circuit switching, thereby realizing the seamless connection of the whole network switching type virtual circuit and the data format.
Exchange technology (Switching Technology)
The video network adopts the two advantages of the asynchronization and the packet switching of the Ethernet, eliminates the Ethernet defect on the premise of full compatibility, has full-network end-to-end seamless connection, and is directly connected with the user terminal to directly bear the IP data packet. The user data does not need any format conversion in the whole network. The video networking is a higher-level form of Ethernet, is a real-time exchange platform, can realize real-time transmission of full-network large-scale high-definition video which cannot be realized by the current Internet, and pushes numerous network video applications to high definition and unification.
Server technology (Server Technology)
The server technology on the video networking and unified video platform is different from the server in the traditional sense, the streaming media transmission is based on connection-oriented basis, the data processing capability is irrelevant to the flow and the communication time, and a single network layer can contain signaling and data transmission. For voice and video services, the complexity of video networking and unified video platform streaming media processing is much simpler than that of data processing, and the efficiency is greatly improved by more than hundred times than that of a traditional server.
Accumulator technology (Storage Technology)
The ultra-high-speed storage technology of the unified video platform adopts the most advanced real-time operating system for adapting to the ultra-large capacity and ultra-large flow media content, the program information in the server instruction is mapped to a specific hard disk space, the media content does not pass through the server any more, the media content is instantly and directly delivered to a user terminal, and the waiting time of the user is generally less than 0.2 seconds. The optimized sector distribution greatly reduces the mechanical motion of magnetic head seek of the hard disk, the resource consumption only accounts for 20% of the IP Internet of the same grade, but the concurrent flow which is 3 times greater than that of the traditional hard disk array is generated, and the comprehensive efficiency is improved by more than 10 times.
Network security technology (Network Security Technology)
The structural design of the video networking thoroughly structurally solves the network security problem puzzling the Internet by means of independent permission of each service, complete isolation of equipment and user data and the like, generally does not need antivirus programs or firewalls, eliminates attacks of hackers and viruses, and provides a structural carefree security network for users.
Service innovation technology (Service Innovation Technology)
The unified video platform fuses services with transmissions, whether a single user, private network users or a network aggregate, but automatically connects at a time. The user terminal, the set top box or the PC is directly connected to the unified video platform, so that various multimedia video services are obtained. The unified video platform adopts a menu type table allocation mode to replace the traditional complex application programming, and can realize complex application by using very few codes, thereby realizing 'infinite' new business innovation.
Networking of the video networking is as follows:
the video networking is a centrally controlled network structure, which may be of the tree network, star network, ring network, etc., but on the basis of this there is a need for a centralized control node in the network to control the whole network.
As shown in fig. 1, the view network is divided into an access network and a metropolitan area network.
The devices of the access network part can be mainly divided into 3 classes: node server, access switch, terminal (including various set-top boxes, code boards, memories, etc.). The node server is connected with an access switch, which can be connected with a plurality of terminals and can be connected with an Ethernet.
The node server is a node with a centralized control function in the access network, and can control the access switch and the terminal. The node server may be directly connected to the access switch or may be directly connected to the terminal.
Similarly, devices of the metropolitan area network portion can also be classified into 3 categories: metropolitan area server, node switch, node server. The metro server is connected to a node switch, which may be connected to a plurality of node servers.
The node server is the node server of the access network part, namely the node server belongs to the access network part and also belongs to the metropolitan area network part.
The metropolitan area server is a node with centralized control function in the metropolitan area network, and can control a node switch and a node server. The metropolitan area server may be directly connected to the node switch or directly connected to the node server.
Thus, the whole video network is a hierarchical centralized control network structure, and the network controlled by the node server and the metropolitan area server can be in various structures such as tree, star, ring and the like.
The access network part can be vividly called as a unified video platform (part in a dotted circle), and a plurality of unified video platforms can form a video network; each unified video platform can be interconnected and intercommunicated through metropolitan area and wide area video networking.
View networking device classification
1.1 devices in the visual network according to the embodiment of the present invention may be mainly classified into 3 types: a server, a switch (including an ethernet corotation gateway), and a terminal (including various set-top boxes, a code board, a memory, etc.). The view networking can be divided into metropolitan area networks (or national networks, global networks, etc.) and access networks as a whole.
1.2 devices in the access network part can be mainly classified into 3 classes: node server, access switch (including Ethernet corotation gateway), terminal (including various set-top boxes, coding board, memory, etc.).
The specific hardware structure of each access network device is as follows:
the node server:
as shown in fig. 2, the device mainly comprises a network interface module 201, a switching engine module 202, a CPU module 203 and a disk array module 204;
Wherein, the network interface module 201, the cpu module 203 and the disk array module 204 all enter the switching engine module 202; the switching engine module 202 performs an operation of looking up the address table 205 on the incoming packet, thereby obtaining packet guiding information; and stores the packet into a corresponding queue of the packet buffer 206 according to the packet's guiding information; discarding if the queue of the packet buffer 206 is nearly full; the switch engine module 202 polls all packet buffer queues for forwarding if the following conditions are met: 1) The port sending buffer is not full; 2) The queue packet counter is greater than zero. The disk array module 204 mainly controls the hard disk, including initializing, reading and writing operations on the hard disk; the CPU module 203 is mainly responsible for protocol processing with access switches and terminals (not shown), configuration of the address table 205 (including a downstream protocol packet address table, an upstream protocol packet address table, and a data packet address table), and configuration of the disk array module 204.
Access switch:
as shown in fig. 3, mainly includes a network interface module (a downstream network interface module 301, an upstream network interface module 302), a switching engine module 303, and a CPU module 304;
Wherein, the packet (uplink data) coming in from the downlink network interface module 301 enters the packet detection module 305; the packet detection module 305 detects whether the Destination Address (DA), source Address (SA), packet type, and packet length of the packet meet the requirements, if so, allocates a corresponding stream identifier (stream-id), and enters the switching engine module 303, otherwise, discards the packet; the packets (downstream data) coming in from the upstream network interface module 302 enter the switching engine module 303; the data packet coming in from the CPU module 304 enters the switching engine module 303; the switching engine module 303 performs an operation of looking up an address table 306 on an incoming packet, thereby obtaining packet guiding information; if a packet entering the switching engine module 303 is sent from the downstream network interface to the upstream network interface, the packet is stored in the queue of the corresponding packet buffer 307 in association with a stream identifier (stream-id); discarding if the queue of the packet buffer 307 is nearly full; if the packet entering the switching engine module 303 is not sent from the downlink network interface to the uplink network interface, storing the data packet into the queue of the corresponding packet buffer 307 according to the packet guiding information; if the queue of the packet buffer 307 is nearly full, it is discarded.
The switch engine module 303 polls all packet buffer queues, which may include two scenarios:
If the queue is sent from the downlink network interface to the uplink network interface, the following conditions are satisfied for forwarding: 1) The port sending buffer is not full; 2) The queue packet counter is greater than zero; 3) Obtaining a token generated by a code rate control module;
if the queue is not addressed by the downstream network interface to the upstream network interface, the following condition is satisfied for forwarding: 1) The port sending buffer is not full; 2) The queue packet counter is greater than zero.
The rate control module 308 is configured by the CPU module 304 to generate tokens for all packet buffer queues from the downstream network interface to the upstream network interface at programmable intervals to control the rate of upstream forwarding.
The CPU module 304 is mainly responsible for protocol processing with the node server, configuration of the address table 306, and configuration of the rate control module 308.
Ethernet corotation gateway:
As shown in fig. 4, the device mainly includes a network interface module (a downstream network interface module 401 and an upstream network interface module 402), a switching engine module 403, a CPU module 404, a packet detection module 405, a rate control module 408, an address table 406, a packet buffer 407, a MAC adding module 409, and a MAC deleting module 410.
Wherein, the data packet coming in from the downlink network interface module 401 enters the packet detection module 405; the packet detection module 405 detects whether the ethernet MAC DA, ethernet MAC SA, ethernet length or frame type, video network destination address DA, video network source address SA, video network packet type and packet length of the data packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id); then, the MAC DA, MAC SA, length or frame type (2 byte) are subtracted by the MAC delete module 410 and enter the corresponding receive buffer, otherwise discarded;
The downlink network interface module 401 detects the sending buffer of the port, if there is a packet, acquires the ethernet MAC DA of the corresponding terminal according to the destination address DA of the packet's internet of view, adds the ethernet MAC DA of the terminal, the MAC SA of the ethernet cooperative gateway, and the ethernet length or frame type, and sends.
The function of the other modules in the ethernet corotation gateway is similar to that of the access switch.
And (3) a terminal:
the system mainly comprises a network interface module, a service processing module and a CPU module; for example, the set top box mainly comprises a network interface module, a video/audio encoding/decoding engine module and a CPU module; the coding board mainly comprises a network interface module, a video and audio coding engine module and a CPU module; the memory mainly comprises a network interface module, a CPU module and a disk array module.
1.3 devices of the metropolitan area network part can be mainly classified into 2 classes: node server, node switch, metropolitan area server. The node switch mainly comprises a network interface module, a switching engine module and a CPU module; the metropolitan area server mainly comprises a network interface module, a switching engine module and a CPU module.
2. View networking data packet definition
2.1 Access network packet definition
The data packet of the access network mainly comprises the following parts: destination Address (DA), source Address (SA), reserved bytes, payload (PDU), CRC.
As shown in the following table, the data packet of the access network mainly includes the following parts:
DA | SA | Reserved | Payload | CRC |
wherein:
the Destination Address (DA) is composed of 8 bytes (byte), the first byte represents the type of data packet (such as various protocol packets, multicast data packets, unicast data packets, etc.), 256 possibilities are at most provided, the second byte to the sixth byte are metropolitan area network addresses, and the seventh and eighth bytes are access network addresses;
the Source Address (SA) is also composed of 8 bytes (bytes), defined identically to the Destination Address (DA);
the reserved bytes consist of 2 bytes;
the payload portion has different lengths according to the types of the different datagrams, and is 64 bytes if it is various protocol packets, and 32+1024=1056 bytes if it is a unicast packet, and is of course not limited to the above 2 types;
the CRC consists of 4 bytes and its calculation method follows the standard ethernet CRC algorithm.
2.2 metropolitan area network packet definition
The topology of the metropolitan area network is a pattern, there may be 2 or even more than 2 connections between two devices, i.e. there may be more than 2 connections between node switches and node servers, node switches and node switches, node switches and node servers. However, the metropolitan area network address of the metropolitan area network device is unique, and in order to accurately describe the connection relationship between metropolitan area network devices, parameters are introduced in the embodiment of the present invention: a tag to uniquely describe a metropolitan area network device.
The definition of label in this specification is similar to that of MPLS (Multi-Protocol Label Switch, multiprotocol label switching), and assuming that there are two connections between device a and device B, there are 2 labels for packets from device a to device B and 2 labels for packets from device B to device a. The label is split into label and label out, and assuming that the label (in label) of the packet entering the device a is 0x0000, the label (out label) of the packet when leaving the device a may become 0x0001. The network access process of the metropolitan area network is a network access process under centralized control, that is, the address allocation and label allocation of the metropolitan area network are all led by the metropolitan area server, the node switch and the node server are all passively executed, which is different from the label allocation of the MPLS, which is the result of mutual negotiation between the switch and the server.
As shown in the following table, the data packet of the metropolitan area network mainly includes the following parts:
DA | SA | Reserved | label (Label) | Payload | CRC |
I.e. Destination Address (DA), source Address (SA), reserved bytes (Reserved), labels, payload (PDU), CRC. Wherein the format of the tag may be defined with reference to the following: the tag is 32 bits, with the high 16bit reservation, with only the low 16bit, and its position is between the reserved bytes and payload of the packet.
Based on the characteristics of the video networking, the applicant proposes the following core technical ideas for solving the technical problems: the method comprises the steps that a mute request sent by a conference control end is converted into an encoder control instruction of the video network, and a server updates decoding marks of audio streams belonging to second participant terminals in target mixed audio streams sent to the first participant terminal into preset marks according to the encoder control instruction, so that the first participant terminal discards the audio streams belonging to the second participant terminals in the target mixed audio streams, and accordingly the second participant terminals mute the first participant terminal.
First, an implementation environment of the embodiment of the present application will be described, which specifically includes a conference control end located in the internet, and a server connected to the conference control end. The server has the function of supporting the video-networking conference controlled by the conference control end, and specifically, the server can be a single server or a server cluster formed by a plurality of servers, wherein in the server cluster, different servers can execute different tasks in the video-networking conference.
Referring to fig. 5, a schematic diagram of a communication architecture in an embodiment of the present invention is shown, where, as shown in fig. 5, the communication architecture includes a conference control end located in the internet, and as shown in fig. 5, a server connected to the conference control end is a server cluster, and may specifically include a collaboration server and a video networking server, where the collaboration server is configured to convert various information sent by the conference control terminal into a video networking protocol and then send the video networking protocol to the video networking server.
The conference control terminal can be a terminal provided with a conference control client, can be understood as a multipoint control unit of the internet traditional protocol, and is used for creating and controlling conferences organized by video conference terminals of the H323 protocol, and can perform conference control operations such as muting, mixing screens and the like in the conferences. The conference control terminal may control a plurality of participant terminals participating in the conference, as shown in fig. 5, and the conference control terminal may control the participant terminals 1 to 5.
Among a plurality of participant terminals participating in the conference, a part of participant terminals may be located in the internet, the rest of participant terminals may be located in the internet, as shown in fig. 5, participant terminals 1 and 2 of participant terminals 1 to 5 are located in the internet, and the rest of participant terminals are located in the internet. The participant terminal can be a set top box, a smart phone, a network television, a camera, a microphone and other devices.
When the server includes the co-rotation server, the co-rotation server is responsible for communication between the internet of view and the internet, that is, is responsible for communicating devices in the internet of view and the internet, in the communication process, the function of converting a streaming media protocol and packaging audio and video data packets can be executed, and in the embodiment of the application, the conversion of an H245 part protocol in an H323 protocol cluster can be responsible for converting the H245 part protocol in the internet into the internet of view protocol.
When the server comprises a corotation server, a participant terminal in the internet can be in communication link with the corotation server so as to send own audio to the video networking server through the corotation server.
When the server comprises an internet of view server, the internet of view server can be used for processing audio and video sent by each participant terminal in the conference, and then transmitting audio and video data among the participant terminals so as to maintain normal operation of the conference. For example, the video networking server may mix the audio received from the speaking terminal in the participant terminals and send the mixed audio to the rest of the participant terminals, or may receive the video from each participant terminal and send the received video to the rest of the participant terminals.
Next, referring to fig. 6, a step flowchart of the video-networking-based conference control method according to the embodiment of the present application is shown, and the step flowchart is specifically applied to a server, and may include the following steps:
step S601: and receiving an encoder control instruction obtained after video networking protocol conversion of the mute request sent by the conference control terminal.
In an example, as shown in the communication architecture of fig. 5, the co-rotating server may receive a mute request sent by the conference control end, and then, the co-rotating server performs the video networking protocol conversion on the received mute request to obtain an encoder control instruction, and then sends the encoder control instruction to the video networking server.
Of course, in one example, if the server is a separate server, the server may receive a mute request sent by the conference control end and convert the mute request into an encoder control instruction of the video networking protocol.
In this embodiment, the mute request sent by the conference control terminal may be generated according to a mute operation performed by the user on the conference control terminal. In practice, a terminal list of the participant terminals currently participating in the conference may be set on the conference control terminal, and the user may select a first participant terminal to be muted and a second participant terminal in the terminal list, where the first participant terminal to be muted cannot play the audio stream of the second participant terminal, so that the second participant terminal is muted with respect to the first participant terminal.
The mute request sent by the conference control end can be a request signaling of an H245 protocol, and the H245 protocol is a multimedia communication control protocol, which is a control signaling protocol in an h.323 multimedia communication system, and is mainly used for end-to-end h.245 information exchange between an h.323 endpoint or a terminal in communication.
In this embodiment, since the mute request is sent by the conference control end in the internet, the mute request is a mute request conforming to the internet protocol, and the mute request is in accordance with the H245 protocol, in order to enable a server in the video network to respond to the mute request, the coordination server needs to convert the mute request into an encoder control instruction of the video network protocol, and send the encoder control instruction to the video network server, so that the video network server can respond to the mute request. The encoder control instruction may instruct the video networking server to change the codec flag of the corresponding audio stream in the mixed audio stream sent to the first participant terminal.
Step S602: and determining a first reference terminal and at least one second reference terminal corresponding to the encoder control instruction.
In this embodiment, the mute request may carry the identifier of the first participant terminal and the identifiers of the second participant terminals, and further the encoder control instruction may also carry the identifiers of the first participant terminal and the identifiers of the second participant terminals. In this embodiment, the view network server in the server cluster may determine the first participant terminal and the second participant terminal according to the identifier of the first participant terminal and the identifiers of the second participant terminals.
In the video network, the identification of the reference terminal can be a MAC address or a video network number.
Step S603: and searching the target mixed audio stream sent to the first reference terminal from the current multipath mixed audio streams.
Each mixed audio stream in the multiple mixed audio streams is audio obtained by mixing audio streams sent by a plurality of current speaking terminals.
In this embodiment, the view networking server in the server cluster may determine, based on the encoder control instruction, that each second participant terminal in the current conference needs to be mute with respect to the first participant terminal, that is, the first participant terminal needs to hear no sound of each second participant terminal. In this way, the video networking server can search the current multi-path mixed audio stream for the target mixed audio stream sent to the first participant terminal.
In the current conference, there are typically multiple speaking terminals, and an audio stream sent by each speaking terminal needs to be listened to by other participant terminals. Thus, for each participant terminal there is a mixed audio stream sent to that participant terminal, and thus there are multiple mixed audio streams. For each participant terminal, the video network server side may mix other audio streams except the audio stream of the participant terminal (transmission object) among the received audio streams of the plurality of speaking terminals, so as to obtain a mixed audio stream for the participant terminal.
For example, there are 5 current speaking terminals, namely, terminal 1, terminal 2, terminal 3, terminal 4 and terminal 5, and if the terminal that needs to receive audio is terminal 3, the video networking server mixes the audio streams of terminal 1, terminal 2, terminal 4 and terminal 5 and sends the mixed audio streams to terminal 3. Similarly, mixed audio streams for terminal 1, terminal 2, terminal 4 and terminal 5 can be obtained.
In this way, in the server cluster, when the video networking server receives the encoder control instruction, the video networking server can search the target mixed audio stream sent to the first participant terminal, and then process the codec marks of the corresponding audio stream in the target mixed audio stream.
Step S604: updating the decoding mark of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a preset mark, so that the first participant terminal discards the audio stream belonging to the at least one second participant terminal in the target mixed audio stream according to the preset mark.
In this embodiment, since the first participant terminal is required to be unable to play the audio stream of each second participant terminal, so as to mute the first participant terminal by the second participant terminal, the decoding flag of the audio stream belonging to each second participant terminal in the target mixed audio stream can be updated to the preset flag. Wherein a decoding flag of an audio stream is used to characterize whether the audio stream is prohibited from being decoded, and if it is prohibited from being decoded, the audio cannot be decoded and can be discarded.
In implementation, each mixed audio stream may carry an own audio decoder instruction, where the decoder instruction stores a decoding address and a decoding switch flag of each audio in the mixed audio stream, where the decoding switch flag is a decoding flag, and when a reference terminal is expected to receive a certain audio, the decoder instruction includes the decoding address of the audio and a flag for starting decoding, where the flag for starting decoding may be 1, and if the flag for starting decoding is 0, it indicates that decoding is prohibited. Thus, the receiving end can adaptively decode and play the received mixed audio stream according to the instruction of the audio decoder.
In this embodiment, the video networking server in the server cluster may update the decoding flag of the audio stream belonging to each second participant terminal to a preset flag, where the preset flag indicates that the audio stream of the second participant terminal is prohibited from being decoded. Therefore, the first participant terminal can discard the audio stream which is decoded and marked as the preset mark in the target mixed audio stream according to the preset mark, so that the audio stream of the second participant terminal cannot be played at the first participant terminal, and the mute of the second participant terminal relative to the first participant terminal is realized.
It will be appreciated that the present application relates to altering the decoding flags of the audio streams belonging to the second participant terminal in the target mixed audio stream sent to the first participant terminal, while for mixed audio streams sent to other participant terminals than the first participant terminal, the decoding flags of the audio streams belonging to the second participant terminal in these other mixed audio streams are still allowed to be decoded, so that the other participant terminals will not be affected to normally listen to the audio streams of the second participant terminal.
For example, as shown in fig. 5, there are 5 current speaking terminals, namely, terminal 1, terminal 2, terminal 3, terminal 4 and terminal 5, wherein it is required to make the terminal 1 unable to hear the audio of terminal 2 and terminal 3, so that in the mixed audio stream sent to terminal 1, terminal 1 cannot decode and play the audio of terminal 2 and terminal 3, but can play the audio of terminal 4 and terminal 5 normally, and as for other terminals 4 and terminal 5, they can hear the audio of terminal 2 and terminal 3 normally.
In this embodiment, the decoding flag may be carried in the mixed audio stream, so, by setting the decoding flag of each audio stream, the mute effect of a part of speaking terminals on the designated participant terminal can be achieved, and normal listening of other participant terminals which are not designated for mute is ensured, so that the problems of high labor cost and low efficiency caused by manually closing the corresponding audio channel in order to realize mute of a part of speaking terminals on the participant terminal in the conference process can be avoided, thereby improving the conference control efficiency.
In combination with the above embodiments, in one example, when updating the decoding flag of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to the preset flag, the target audio stream belonging to the at least one second participant terminal may be determined in the target mixed audio stream; and keeping the decoding marks of other audio streams in the target mixed audio stream unchanged, and setting the decoding mark of each target audio stream as a preset mark. Wherein the other audio streams are audio streams other than the target audio stream in the target mixed audio stream.
In this embodiment, generally speaking, the mixed audio stream includes audio streams generated by all speaking terminals in the current conference, and at least one second participant terminal may be all speaking terminals or may be a part of speaking terminals, where in the case that at least one second participant terminal is a part of speaking terminals, the first participant terminal needs to normally listen to the audio of other speaking terminals except the second participant terminal, so that the decoding flags of the audio streams of the other speaking terminals can be kept unchanged, that is, the decoding flags of the audio streams of the other speaking terminals are kept as the flags allowed to be decoded, so that it is ensured that in the target mixed audio stream, the audio streams of the second participant terminal are not listened to by the first participant terminal, and the rest of the audio streams can be listened to by the first participant terminal.
In combination with the above embodiment, in one example, the mute periods of the second participant terminals to the first participant terminal may be set, so that the first participant terminal may be controlled not to decode and play the audio of each second participant terminal during the mute periods.
Correspondingly, when the server cluster comprises the video networking server and the corotation server, the video networking server can send the target mixed audio stream after updating the decoding mark to the corotation server, so that the corotation server sends the target mixed audio stream after updating the decoding mark to the first reference terminal. In this case, based on the mute request, the co-rotating server in the server cluster may record a transmission duration of the target mixed audio stream after receiving the update decoding flag to the first reference terminal, and generate an audio restoration instruction when the transmission duration reaches the mute duration.
In this way, when receiving the audio restoration instruction, the video networking server in the server cluster may restore the decoding flag of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a decodable flag based on the audio restoration instruction, so that the first participant terminal decodes and plays the audio stream belonging to the at least one second participant terminal in the target mixed audio stream.
In this embodiment, the mute duration may refer to a mute duration of the second participant terminal relative to the first participant terminal, that is, a duration during which the first participant terminal cannot listen to the audio stream of the second participant terminal. In practice, when the user sets that the first participant terminal cannot listen to the audio of the second participant terminal through the conference control terminal, the user can set that the user cannot listen in a designated period, so that the flexibility of mute control can be improved.
Specifically, the transmitting time length of the target mixed audio stream after updating the decoding mark to the first participant terminal can be recorded through the cooperative server, and the mute audio stream of the second participant terminal to the first participant terminal can be obtained by recording the transmitting time length of the target mixed audio stream because the target mixed audio stream after changing the decoding mark is the mute audio stream of the second participant terminal to the first participant terminal.
When the decoding mark is changed to the preset mark and the sending time of the target mixed audio stream reaches the mute time, the mute of the second participant terminal to the first participant terminal can be canceled, so that the first participant terminal can listen to the audio stream in the second participant. In this way, the co-rotating server may generate an audio restoration instruction of the video networking protocol, and send the audio restoration instruction of the video networking protocol to the video networking server, so that the video networking server restores the decoding flag of the audio stream belonging to each second participant terminal in the target mixed audio stream currently to be sent to the first participant terminal to a decodable flag, for example, restores the decoding flag from "0" to "1", and after that, after receiving the target mixed audio stream, the first participant terminal may decode and play the audio stream of each second participant terminal.
In combination with the above embodiments, in one embodiment, in a conference, the speaking duration of each speaking terminal may also be defined, when the speaking duration of the speaking terminal reaches the specified duration, the speaking terminal is controlled to be mute with respect to other terminals, and accordingly, the following procedure may also be included:
firstly, a visual network speaking control instruction obtained after performing visual network protocol conversion on the speaking control instruction sent by the conference control terminal can be received.
The video networking speaking control instruction carries the identification and speaking duration of the third participant terminal to be controlled.
In this embodiment of the present application, as shown in fig. 5, a cooperative server in a server cluster may perform a video-networking protocol conversion on a speech control instruction sent by a conference control end, so as to obtain a video-networking speech control instruction, and then send the video-networking speech control instruction to a video-networking server.
Then, the internet of view server in the server cluster may record, from the current moment, a transmission duration of an audio stream belonging to the third participant terminal in each mixed audio stream in response to the internet of view floor control instruction.
And when the sending time of the audio stream of the third participant terminal reaches the speaking time, the video networking server in the server cluster can update the decoding mark of the audio stream belonging to the third participant terminal in the multi-path mixed audio stream to a preset mark, so that other participant terminals in the video conference discard the audio stream belonging to the third participant terminal according to the preset mark.
In practice, in an audio/video conference, the speaking duration needs to be set for the speaking terminal, for example, the speaking duration of the participant terminal is set to be within 5 minutes, and in general, the limitation of the speaking duration is controlled by a conference site presenter, so that the workload of the conference site presenter is increased, and in the case of strict control requirements on the speaking duration, the error of the manually controlled speaking duration is relatively large, and the problem that the follow-up conference progress caused by active speaking of the speaker is influenced after the speaking duration is reached cannot be avoided.
Therefore, the conference control terminal of the present application may set the speaking time length of each participant terminal, and of course, different participant terminals may have different speaking time lengths, or may have the same speaking time length, which is not limited herein. After the speaking duration is set, the conference control end can generate a speaking control instruction, and send the speaking control instruction to a cooperative server in the server cluster, and the cooperative server further converts the speaking control instruction of the Internet into a visual networking speaking control instruction of the visual networking. The speech control instruction carries the identifier of the third participant terminal to be controlled in the speech duration and the allowed speech duration of the third participant terminal, and it can be understood that the identifier of the third participant terminal to be controlled and the speech duration are also carried in the internet-of-view speech control instruction.
Then, when receiving the internet-of-view speech control instruction, the internet-of-view server in the server cluster can record the sending duration of the audio stream belonging to the third participant terminal in the current mixed audio streams, and when implementing, the sending duration of the multiple mixed audio streams can be recorded because the audio stream of the third participant terminal can be contained in the multiple mixed audio streams.
When the sending duration of the mixed audio streams reaches the speaking duration, the decoding flag of the audio stream belonging to the third participant terminal in each mixed audio stream can be updated to a preset flag, for example, the decoding flag is changed from "1" to "0", so that other participant terminals in the video conference discard the audio stream belonging to the third participant terminal according to the preset flag. Therefore, when the speaking time of the third participant terminal reaches the appointed time, even though the speaker of the third participant terminal still speaks, other participants cannot listen to the audio, so that the normal speaking of the participant terminal for subsequent speaking can be ensured, and the conference is ensured to be carried out according to the normal progress.
Of course, in an example, after updating the decoding flag of the audio stream belonging to the third participant terminal in each of the multiple mixed audio streams to the preset flag, the video networking server in the server cluster may send the muted information to the third participant terminal, so that the third participant terminal displays the muted information, and thus the user of the third participant terminal may know that the voice sent by the user cannot be listened to by other terminals.
In combination with the above embodiment, in one example, if the speaking duration reaches the specified duration and the participant terminal wants to speak later, the participant terminal may send a speaking request to the conference control terminal to reapply speaking.
Accordingly, after updating the decoding flag of the audio stream belonging to the third participant terminal in each path of mixed audio stream to the preset flag, the speech of the third participant terminal can be recovered under the control of the conference control terminal, i.e. the speech of the third participant terminal can be received by other participant terminals.
Specifically, a talk request command sent by the conference control terminal may be received and converted into a talk request protocol, where the talk request command is sent to the conference control terminal by the third participant terminal. In this embodiment, the cooperative servers in the server cluster perform a video networking protocol conversion on the request speaking instruction sent by the conference control end, and then send the converted video networking speaking instruction to the video networking server.
Then, the video networking server in the server cluster can respond to the video networking speaking instruction, and restore the decoding mark of the audio stream belonging to the third participant terminal in each mixed audio stream of the multiple mixed audio streams to the decodable mark, so that other participant terminals in the video conference decode and play the audio stream belonging to the third participant terminal.
In this embodiment, after the third participant terminal is muted, if the third participant terminal wants to continue speaking, the third participant terminal needs to apply for speaking to the conference control terminal, and in specific implementation, the third participant terminal needs to send a command for requesting to speak to the conference control terminal, so that the conference control terminal forwards the command for requesting to speak to the co-forwarding server.
In one scenario, the third participant terminal may be a participant terminal in the internet, and the collaboration server needs to convert the request to speak instruction to a view networking speech instruction of a view networking protocol to send to the view networking server.
In yet another scenario, the third participant terminal may also be a participant terminal in the internet of view, and the coordination server may directly send the request to speak instruction to the internet of view server.
The video networking server can respond to the video networking speaking instruction and restore the decoding mark of the audio stream belonging to the third participant terminal in each mixed audio stream to a decodable mark so that other participant terminals in the video conference decode and play the audio stream belonging to the third participant terminal. For example, the decoding flag of the audio stream of the third participant terminal in the mixed audio stream is changed from "0" to "1", so that the other participant terminals can listen to the speech of the third participant terminal.
In combination with the above embodiment, in an example, if a certain participant terminal needs to join in or exit from the conference in the conference, an internet-of-view conference exit instruction may also be sent to the server to exit from the conference, and specifically, an internet-of-view conference exit instruction may be sent to the internet-of-view server by the collaboration server to exit from the conference.
Correspondingly, the server can receive a video-networking conference exit instruction obtained after video-networking protocol conversion is carried out on the conference exit instruction sent by the conference control end, wherein the video-networking conference exit instruction carries the identification of a fourth participant terminal exiting the video conference.
In this embodiment of the present application, the co-rotating server may perform a video-networking protocol conversion on a conference exit instruction sent by the conference control end, and send the video-networking conference exit instruction obtained after the conversion to the video-networking server, where the video-networking server may close an audio channel between the video-networking conference exit instruction and a fourth participant terminal in response to the video-networking conference exit instruction, so that each mixed audio stream does not include an audio stream of the fourth participant terminal, and the fourth participant terminal may be a terminal to exit the video-networking conference. When the embodiment is adopted, when the participant terminal needs to exit the video conference, the network resources allocated to the participant terminal can be released in time, so that the resources can be allocated to the participant terminal newly joining the video conference, and the utilization rate of the video networking resources is improved.
In this embodiment, the co-rotating server may pull some of the participant terminals into the conference or withdraw some of the participant terminals from the conference in response to the instruction of the conference control terminal. When some participant terminals exit the conference, as the participant terminals do not need to allocate video networking resources for the participant terminals after exiting the video networking conference, the video networking server can close the audio channel between the participant terminals and the fourth participant terminal, and after closing the audio channel between the participant terminals and the fourth participant terminal, the video networking server cannot receive the audio sent by the fourth participant terminal, so that each mixed audio stream does not contain the audio stream of the fourth participant terminal.
In one scenario, the fourth participant terminal may actively send a request for exiting the conference to the conference control terminal, and the conference control terminal forwards the request for exiting the conference to the coordination server.
In still another scenario, the conference control end may designate the fourth participant terminal to exit the conference, so the conference control end may generate a conference exit instruction according to the operation of the user to exit the conference from the fourth participant terminal, and then send the conference exit instruction to the coordination server.
By adopting the technical scheme of the embodiment, under the condition that the audio channels of the participant terminals participating in the conference are ensured to exist in the audio-video conference, the speaking audio of part of the participant terminals can be set not to be listened by one participant terminal, and further, the speaking duration of each participant terminal can also be set, so that the speaking of each participant terminal can be flexibly controlled in the conference without manual control, and the efficiency of the video conference is improved.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Referring to fig. 7, there is shown a video networking-based conference control device, including a conference control terminal located in the internet, and a server connected to the conference control terminal; the device is located in the server and may specifically include the following modules:
a request receiving module 701, configured to receive the encoder control instruction obtained after performing the video networking protocol conversion on the mute request sent by the conference control end;
a determining module 702, configured to determine a first reference terminal and at least one second reference terminal corresponding to the encoder control instruction;
an instruction response module 703, configured to find a target mixed audio stream sent to the first participant terminal from the current multiple mixed audio streams; each mixed audio stream in the multiple mixed audio streams is audio obtained by mixing audio streams sent by a plurality of current speaking terminals by the video networking server;
And a flag changing module 704, configured to update a decoding flag of an audio stream belonging to the at least one second participant terminal to a preset flag, so that the first participant terminal discards an audio stream belonging to the at least one second participant terminal in the mixed audio stream according to the preset flag.
Optionally, the tag changing module 704 may specifically include the following units:
a determining unit configured to determine a target audio stream belonging to the at least one second participant terminal among the target mixed audio streams;
and the changing unit is used for setting the decoding mark of the target audio stream as a preset mark and keeping the decoding marks of other audio streams except the target audio stream in the target mixed audio stream unchanged.
Optionally, the mute request further carries a mute duration, and the encoder control instruction also includes the mute duration; the apparatus may further comprise the following modules:
the first instruction receiving module is used for generating an audio recovery instruction when determining that the transmission duration of the target mixed audio stream after updating the decoding mark reaches the mute duration;
and the mark recovery module is used for recovering the decoding mark of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream into a decodable mark based on the audio recovery instruction, so that the first participant terminal decodes and plays the audio stream belonging to the at least one second participant terminal in the target mixed audio stream.
Optionally, the apparatus may further include the following modules:
the second instruction receiving module is used for receiving a visual network speaking control instruction obtained after the speaking control instruction sent by the conference control end is subjected to visual network protocol conversion; the video networking speaking control instruction carries the identification and speaking duration of a third participant terminal to be controlled;
the instruction response module 703 is further configured to record, in response to the internet-of-view floor control instruction, a transmission duration of an audio stream belonging to the third participant terminal in the multiple mixed audio streams from a current time;
the flag changing module 704 is specifically further configured to update, when the sending duration of the audio stream of the third participant terminal reaches the speaking duration, a decoding flag of the audio stream belonging to the third participant terminal in each of the multiple mixed audio streams to a preset flag, so that other participant terminals in the video conference discard the audio stream belonging to the third participant terminal according to the preset flag.
Optionally, the apparatus may further include the following modules:
the third instruction receiving module is used for receiving a video networking speaking instruction obtained after video networking protocol conversion is carried out on the request speaking instruction sent by the conference control terminal, wherein the request speaking instruction is sent to the conference control terminal by the third participant terminal;
And the speaking instruction response module is used for responding to the video networking speaking instruction, and respectively recovering the decoding marks of the audio streams belonging to the third participant terminal in each mixed audio stream in the multiple mixed audio streams to be decodable marks so as to enable other participant terminals in the video conference to decode and play the audio streams belonging to the third participant terminal.
Optionally, the apparatus may further include the following modules:
the fourth instruction receiving module is used for receiving a video-on-network conference exit instruction obtained after video-on-network protocol conversion is carried out on the conference exit instruction sent by the conference control end; the video conference exit instruction carries an identifier of a fourth participant terminal exiting the video conference;
and the conference instruction response module is used for responding to the video-on-net conference exit instruction and closing an audio channel between the video-on-net conference exit instruction and the fourth participant terminal so that each mixed audio stream does not contain the audio stream of the fourth participant terminal.
It should be noted that, the device embodiment is similar to the method embodiment, so the description is simpler, and the relevant places refer to the method embodiment.
The embodiment of the invention also provides electronic equipment, which comprises:
The method comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor is used for realizing the conference control method based on the video networking according to the embodiment of the invention when being executed.
The embodiment of the invention also provides a computer readable storage medium, and a stored computer program causes a processor to execute the conference control method based on the video networking according to any one of the above embodiments of the invention.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart 1 flowchart and/or block diagram 1 or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart 1 flowchart and/or block diagram 1 or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart 1 flowchart and/or block diagram 1 or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing describes in detail a video networking-based conference control method, apparatus, device and storage medium, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the description of the foregoing examples is only for helping to understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (9)
1. The conference control method based on the video networking is characterized by being applied to a server, wherein the server is connected with a conference control end positioned in the Internet, and the conference control method comprises the following steps:
receiving an encoder control instruction obtained after video networking protocol conversion is carried out on a mute request sent by the conference control terminal;
determining a first reference terminal and at least one second reference terminal corresponding to the encoder control instruction;
searching a target mixed audio stream sent to the first participant terminal from the current multipath mixed audio streams; each mixed audio stream in the multiple mixed audio streams is an audio stream obtained by mixing audio streams sent by a plurality of current speaking terminals;
Updating the decoding mark of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a preset mark, so that the first participant terminal discards the audio stream belonging to the at least one second participant terminal in the target mixed audio stream according to the preset mark.
2. The method according to claim 1, wherein updating the decoding flag of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a preset flag comprises:
determining a target audio stream belonging to the at least one second participant terminal among the target mixed audio streams;
setting the decoding mark of the target audio stream as a preset mark, and keeping the decoding marks of other audio streams in the target mixed audio stream unchanged; wherein the other audio streams are audio streams other than the target audio stream in the target mixed audio stream.
3. The method of claim 1, wherein the mute request further carries a mute duration, and wherein the encoder control instruction includes the mute duration; the method further comprises the steps of:
generating an audio recovery instruction when determining that the transmission time length of the target mixed audio stream after updating the decoding mark reaches the mute time length;
And based on the audio restoration instruction, restoring the decoding mark of the audio stream belonging to the at least one second participant terminal in the target mixed audio stream to a decodable mark so that the first participant terminal decodes and plays the audio stream belonging to the at least one second participant terminal in the target mixed audio stream.
4. The method according to claim 1, wherein the method further comprises:
receiving an Internet-of-view speech control instruction obtained after the Internet-of-view protocol conversion of the speech control instruction sent by the conference control end; the video networking speaking control instruction carries the identification and speaking duration of a third participant terminal to be controlled;
responding to the video-on-internet speech control instruction, and recording the sending duration of the audio stream belonging to the third participant terminal in the multi-channel mixed audio stream from the current moment;
when the sending time of the audio streams of the third participant terminal reaches the speaking time, updating the decoding mark of the audio streams belonging to the third participant terminal in each mixed audio stream in the multiple mixed audio streams to the preset mark, so that other participant terminals in the video conference discard the audio streams belonging to the third participant terminal according to the preset mark.
5. The method according to claim 4, wherein after updating the decoding flag of the audio stream of the multiplexed audio stream belonging to the third participant terminal to the preset flag, the method further comprises:
receiving an Internet-of-view speaking instruction obtained after performing Internet-of-view protocol conversion on the speaking instruction sent by the conference control terminal, wherein the speaking instruction is sent to the conference control terminal by the third participant terminal;
and responding to the video networking speaking instruction, respectively recovering the decoding marks of the audio streams belonging to the third participant terminal in each mixed audio stream in the multiple mixed audio streams to be decodable marks so as to enable other participant terminals in the video conference to decode and play the audio streams belonging to the third participant terminal.
6. The method according to claim 1, wherein the method further comprises:
receiving a video-networking conference exit instruction obtained after video-networking protocol conversion is carried out on the conference exit instruction sent by the conference control terminal, wherein the video-networking conference exit instruction carries the identification of a fourth participant terminal exiting the video conference;
and responding to the video-on-net meeting exit instruction, closing an audio channel between the video-on-net meeting exit instruction and the fourth participant terminal, so that each mixed audio stream does not contain the audio stream of the fourth participant terminal.
7. A conference control device based on the internet of view, wherein the device is located in a server, and the server is connected with a conference control terminal located in the internet, and the conference control device comprises:
the request receiving module is used for receiving an encoder control instruction obtained after video networking protocol conversion is carried out on the mute request sent by the conference control terminal;
the determining module is used for determining a first reference terminal and at least one second reference terminal corresponding to the encoder control instruction;
the instruction response module is used for searching a target mixed audio stream sent to the first participant terminal from the current multipath mixed audio streams; each mixed audio stream in the multiple mixed audio streams is audio obtained by mixing audio streams sent by a plurality of current speaking terminals by the video networking server;
and the mark changing module is used for updating the decoding mark of the audio stream of the target mixed audio stream belonging to the at least one second participant terminal into a preset mark so that the first participant terminal discards the audio stream belonging to the at least one second participant terminal in the mixed audio stream according to the preset mark.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executed implementing the video networking-based conference control method of any of claims 1-6.
9. A computer-readable storage medium, characterized in that a computer program stored therein causes a processor to execute the video networking-based conference control method according to any one of claims 1 to 6 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011590205.1A CN112751849B (en) | 2020-12-28 | 2020-12-28 | Conference control method, device, equipment and medium based on video networking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011590205.1A CN112751849B (en) | 2020-12-28 | 2020-12-28 | Conference control method, device, equipment and medium based on video networking |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112751849A CN112751849A (en) | 2021-05-04 |
CN112751849B true CN112751849B (en) | 2024-04-09 |
Family
ID=75646556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011590205.1A Active CN112751849B (en) | 2020-12-28 | 2020-12-28 | Conference control method, device, equipment and medium based on video networking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112751849B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113489736A (en) * | 2021-07-14 | 2021-10-08 | 海南视联通信技术有限公司 | Method, device, equipment and storage medium for realizing multimedia conference |
CN117459672A (en) * | 2022-07-18 | 2024-01-26 | 中兴通讯股份有限公司 | Video conference canceling method, device and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102025972A (en) * | 2010-12-16 | 2011-04-20 | 中兴通讯股份有限公司 | Mute indication method and device applied for video conference |
CN111182258A (en) * | 2020-02-11 | 2020-05-19 | 视联动力信息技术股份有限公司 | Data transmission method and device for network conference |
CN111225177A (en) * | 2020-02-20 | 2020-06-02 | 视联动力信息技术股份有限公司 | Video conference processing method and device, electronic equipment and storage medium |
CN112104836A (en) * | 2020-11-23 | 2020-12-18 | 全时云商务服务股份有限公司 | Audio mixing method, system, storage medium and equipment for audio server |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050062843A1 (en) * | 2003-09-22 | 2005-03-24 | Bowers Richard D. | Client-side audio mixing for conferencing |
-
2020
- 2020-12-28 CN CN202011590205.1A patent/CN112751849B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102025972A (en) * | 2010-12-16 | 2011-04-20 | 中兴通讯股份有限公司 | Mute indication method and device applied for video conference |
CN111182258A (en) * | 2020-02-11 | 2020-05-19 | 视联动力信息技术股份有限公司 | Data transmission method and device for network conference |
CN111225177A (en) * | 2020-02-20 | 2020-06-02 | 视联动力信息技术股份有限公司 | Video conference processing method and device, electronic equipment and storage medium |
CN112104836A (en) * | 2020-11-23 | 2020-12-18 | 全时云商务服务股份有限公司 | Audio mixing method, system, storage medium and equipment for audio server |
Non-Patent Citations (2)
Title |
---|
基于单服务器实现大容量会议服务的系统及方法研究;张龙等;电子设计工程;第25卷(第6期);全文 * |
张龙等.基于单服务器实现大容量会议服务的系统及方法研究.电子设计工程.2017,第25卷(第6期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN112751849A (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108574688B (en) | Method and device for displaying participant information | |
CN110049271B (en) | Video networking conference information display method and device | |
CN109640028B (en) | Method and device for carrying out conference combining on multiple video networking terminals and multiple Internet terminals | |
CN109302576B (en) | Conference processing method and device | |
CN110475090B (en) | Conference control method and system | |
CN109120879B (en) | Video conference processing method and system | |
CN110417877B (en) | Audio and video service processing method and device | |
CN109547728B (en) | Recorded broadcast source conference entering and conference recorded broadcast method and system | |
CN109246486B (en) | Method and device for framing | |
CN110545395B (en) | Video networking conference switching method and device | |
CN110460804B (en) | Conference data transmitting method, system, device and computer readable storage medium | |
CN109862307B (en) | Video conference initiating method and device | |
CN109194902B (en) | Hierarchical conference scheduling method and system | |
CN111327868B (en) | Method, terminal, server, equipment and medium for setting conference speaking party roles | |
CN112751849B (en) | Conference control method, device, equipment and medium based on video networking | |
CN111478884B (en) | Method, device and storage medium for data transmission among different video networks | |
CN111131760B (en) | Video recording method and device | |
CN110149305B (en) | Video network-based multi-party audio and video playing method and transfer server | |
CN109286775B (en) | Multi-person conference control method and system | |
CN111641800A (en) | Method and device for realizing conference | |
CN109005378B (en) | Video conference processing method and system | |
CN109587433B (en) | Point adjusting method and point adjusting device | |
CN110719432A (en) | Data transmission method and device, electronic equipment and storage medium | |
CN111654659B (en) | Conference control method and device | |
CN110769179A (en) | Audio and video data stream processing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 33rd Floor, No.1 Huasheng Road, Yuzhong District, Chongqing 400013 Patentee after: VISIONVERA INFORMATION TECHNOLOGY Co.,Ltd. Country or region after: China Address before: 100000 Beijing Dongcheng District Qinglong Hutong 1 Song Hua Building A1103-1113 Patentee before: VISIONVERA INFORMATION TECHNOLOGY Co.,Ltd. Country or region before: China |