WO2002025908A2 - Packet-based conferencing - Google Patents
Packet-based conferencing Download PDFInfo
- Publication number
- WO2002025908A2 WO2002025908A2 PCT/CA2001/001298 CA0101298W WO0225908A2 WO 2002025908 A2 WO2002025908 A2 WO 2002025908A2 CA 0101298 W CA0101298 W CA 0101298W WO 0225908 A2 WO0225908 A2 WO 0225908A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- media
- packet
- signals
- talkers
- talker
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
- H04L65/4046—Arrangements for multi-party communication, e.g. for conferences with distributed floor control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/764—Media network packet handling at the destination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M7/00—Arrangements for interconnection between switching centres
- H04M7/006—Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/568—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
- H04M3/569—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants using the instant speaker's algorithm
Definitions
- This invention relates generally to packet-based media communications and more specifically to media conferencing within a packet-based communication network.
- a standard telephone switch 20 is coupled to a plurality of telephone terminals 22 to be included within a conference session as well as a central conference bridge 24. It is noted that these telephone terminals 22 are coupled to the telephone switch 20 via numerous other telephone switches (not shown) . The telephone switch 20 forwards any voice communications received from the terminals 22 to the central conference bridge 24, which then utilizes a standard algorithm to control the conference session.
- PCM Pulse Code Modulation
- One such algorithm used to control a conference session comprises the steps of mixing the voice communications received from each telephone terminal 22 within the conference session and further distributing the result to each of the telephone terminals 22 for broadcasting.
- a problem with this algorithm is -the amount of noise that is combined during the mixing step, this noise comprising a background noise source corresponding to each of the telephone terminals 22 within the conference session.
- An improved algorithm for controlling a conference session is disclosed within U.S. patent application 08/987216 entitled “Method of Providing Conferencing in Telephony" by Dal Farra et al, filed on December 9, 1997, assigned to the assignee of the present invention, and herein incorporated by reference.
- This algorithm comprises the steps of selecting primary and secondary talkers, mixing the voice communications from these two talkers and forwarding the result of the mixing to all the participants within the conference session except for the primary and secondary talkers.
- the primary and secondary talkers receive the voice communications corresponding to the secondary and primary talkers respectively.
- the selection and mixing of only two talkers at any one time can reduce the background noise level* within the conference session when compared to the "party line" approach described above.
- VoIP handsets 26 are coupled to a packet-based network, an IP network 28 in this case.
- IP network 28 in this case.
- a packet-based voice communication central bridge in this case a VoIP central conference bridge 30, must be coupled to the IP network 28.
- This VoIP central conference bridge 30 has a number of problems. These problems include the latency inherently created within the conference bridge 30, the considerable amount of signal processing power required, the cost of the conference bridge, the provisioning of the conference bridges within a network and the maintenance and management of the conference bridges that are required.
- FIGURE 3A is a logical block diagram of a well- known VoIP central conference bridge design while FIGURE 3B is a logical block diagram of a well-known VoIP terminal design.
- the conference bridge 30 comprises an inputting block 32, a talker selection and mixing block 34, and an outputting block 36. Typically all three of these blocks are implemented in software.
- the inputting block 32 comprises, for each participant within the voice conference, a protocol stack (P.S.) 38 coupled in series with a jitter buffer (J.B.) 40* and a decompression block (DECOMP.) 42, each of the decompression blocks 42 further being coupled to the talker selection and mixing block 34.
- the protocol stacks 38 in this design perform numerous functions including receiving packets comprising compressed voice signals, hereinafter referred to as voice data packets; stripping off the packet overhead required for transmitting the voice data packet through the IP network 28; and outputting the compressed voice signals contained within the packets to the respective jitter buffer 40.
- the jitter buffers 40 receive these compressed voice signals, ensure that the compressed voice signals are within the proper sequence, (i.e.
- each of the jitter buffers 40 is a series of compressed voice signals within the proper order that are then fed into the respective decompression block 42.
- the decompression blocks 42 receive these compressed voice signals, convert them into standard PCM format and output the resulting voice signals (that are in Pulse Code Modulation) to the talker selection and mixing block 34.
- the talker selection and mixing block 34 preferably performs almost identical functionality to the central conference bridge 24 within FIGURE 1.
- the key to the design of a VoIP central conference bridge 30 as depicted in FIGURE 3A is the inputting block 32 transforming the packet-based voice communications into PCM voice communications so the well-known conferencing algorithms can be utilized within the block 34.
- the resulting output from the talker selection and mixing block 34 is a voice communication consisting of a mix between the voice communications received from a primary talker and a secondary talker, the primary and secondary talkers being determined within the block 34. Further outputs from the talker selection and mixing block 34 include the unmixed voice communications of the primary and secondary talkers that are to be forwarded, as described previously, to the secondary and primary talkers respectively.
- the outputting block 36 comprises three compression blocks 44 and a plurality of transmitters 46.
- the compression blocks 44 receive respective ones of the three outputs from the talker selection and mixing block 34, , compress the received voice signals, and independently output the results to the appropriate transmitters 46.
- the mixed voice signals after being compressed, are forwarded to all the transmitters 46 with the exception of the transmitters directed to the primary and secondary talkers.
- the transmitters directed to the primary and secondary talkers receive the appropriate unmixed voice signals.
- Each of the transmitters 46 after receiving a compressed voice signal, subsequently performs a protocol stack operation on the compressed voice signal, encapsulates the compressed voice signal within the packet-based format required for transmission on the IP network 28 and transmits a voice data packet comprising the compressed voice signal to the appropriate VoIP terminal 26 within the conference session.
- Voice data packets sent from the central conference bridge 30 are received at the protocol stack 47 which subsequently removes the packet overhead from the received voice data packets, leaving only the compressed voice signal sent from the packet-based central conference bridge 30.
- the jitter buffer 48 next performs numerous functions similar to those performed by the jitter buffers 40 including ensuring that the compressed voice signals are within the proper sequence, buffering the compressed voice signals to ensure smooth playback, and ideally implementing packet loss concealment.
- the decompression block 49 receives the compressed voice signals, decompresses them into PCM format, and forwards the voice signals to the speaker within the particular terminal 26 for broadcasting the voice signals audibly.
- a further problem results from the considerable latency that the processing within the VoIP central conference bridge 30 and the processing within the individual terminals 26 create.
- the combined latency of this processing can result in a significant delay between when the talker (s) speaks and when the other participants in the conference session hear the speech. This delay can be noticeable to the participants if it is beyond the perceived real-time limits of human hearing. This could result in participants talking while not realizing that another participant is speaking.
- FIGURES 3A and 3B Yet another key problem with the design depicted in FIGURES 3A and 3B is the considerable amount of signal processing power that is required to implement the conference bridge 30. As stated previously, each of the components shown within FIGURE 3A are normally simply software algorithms being run on DSP components (s) . This considerable amount of required signal processing power is expensive.
- the present invention is directed to methods and apparatus that can be utilized within a packet-based media communication system for media conferences.
- Packet-based apparatus are described that can be coupled within a packet- based network such that a media conference can be established without the use of a conference bridge.
- These packet-based apparatus can receive media data packets from each of the other packet-based apparatus within the media conference, determine a set of talkers within the media conference and process the received media data packets appropriately for the selected set of talkers so as to output media signals corresponding to the talkers.
- the removal of the conference bridge can allow the packet-based apparatus to become independent, from the packet-based network administration. Further, the removal of the conference bridge allows a reduction in transcoding and hence, allows a better quality signal to be received at the individual apparatus.
- the present invention is a packet-based apparatus including a receiver capable of being coupled to a network, an energy detection and talker selection unit and an output unit.
- the receiver operates to receive a media data packet from at least two sources forming a media conference, each media data packet defining a media signal.
- the energy detection and talker selection unit operates to process the media signals including selecting a set of the sources within the media conference as talkers.
- the output unit operates to output media signals that correspond to the talkers.
- the present invention is a method for outputting media signals within a media conference. In this aspect, the method initially receives a media data packet from at least two sources forming a media conference, each media data packet defining a media signal.
- the method includes processing the received media data packets including selecting a set of the sources within the media conference as talkers. Finally, the method includes outputting media signals that correspond to the talkers.
- the present invention is a packet-based network comprising a plurality of packet-based apparatus. In this aspect, at least two of the plurality of packet-based apparatus operates to output media data packets comprising media signals. These packet-based apparatus together form a media conference.
- At least one of the packet-based apparatus within the media conference operates to receive the media data packets from the packet-based apparatus within the media conference; to process the media signals corresponding to the received media data packets including selecting a set of the packet-based apparatus within the media conference as talkers; and to output media signals that correspond to the talkers.
- media data packets or media signals corresponding to the packet-based apparatus are received at the energy detection and talker selection unit and are considered as a source when selecting a set of the sources within the media conference as talkers .
- the packet-based apparatus according to the above described aspects is a packet-based terminal while, in other embodiments, the packet-based apparatus is a packet-based network interface arranged to be coupled, via a non-packet- based network, to a non-packet-based terminal.
- FIGURE 1 is a simplified block diagram illustrating a well-known circuit switched network with a voice conferencing capability
- FIGURE 2 is a simplified block diagram illustrating a well-known packet-based network with a voice conferencing capability
- FIGURES 3A and 3B are logical block diagrams illustrating a well-known packet-based central conference bridge and a well-known packet-based terminal respectively implemented within the packet-based network of FIGURE 2;
- FIGURE 4 is a simplified functional block diagram illustrating a packet-based terminal according to an embodiment of the present invention.
- FIGURE 5 is a flow chart illustrating the operations performed by a packet receipt block and an energy detection and talker selection block implemented within the packet-based terminal of FIGURE 4;
- FIGURE 6 is a flow chart illustrating the operations performed by an output generator implemented within the packet-based terminal of FIGURE 4;
- FIGURE 7 is a more detailed functional block diagram of "the block diagram of FIGURE 4 during a sample operation
- FIGURE 8 is a detailed functional block diagram illustrating an alternative embodiment of the packet-based terminal of FIGURE 4 during a sample operation.
- FIGURE 9 is a ' simplified block diagram illustrating a well-known packet-based network coupled to a well-known PCM telephone network with a voice conferencing capability.
- control plane that performs administrative functions such as access approval and buildup/tear-down of telephone sessions and/or conference sessions
- media plane which performs the signal processing required on media (voice or video) streams such as format conversions and mixing operations.
- the present invention is applicable to modifications within the media plane which could be implemented with a variety of different control planes while remaining within the scope of the present invention.
- Embodiments of the present invention described herein below are directed to packet-based apparatus coupled within a packet-based network that enable media conferences between numerous sources of media signals. These sources of media signals can be any device in which a person can output media data for transmission to the packet-based apparatus.
- the packet-based apparatus are packet- based terminals coupled together within a packet-based network, each of the packet-based terminals being a source for media signals for the other packet-based apparatus.
- one or more of the packet- based apparatus are packet-based network interfaces which couple standard non-packet-based terminals, such as PCM or analog telephone terminals, to a packet-based network, each of the non-packet-based terminals being a source for media signals for the packet-based apparatus.
- PCM non-packet-based terminals
- IP network 28 a packet-based network interface, in this case IP Gateway 152.
- a number of standard PCM telephone handsets 154 are coupled to the PCM telephone network 150, these PCM telephone terminals 154 possibly being considered as sources of media signals within embodiments of the present invention. Further, sources of media signals could be other devices that allow for the outputting of media data, this media data being in the form of media data packets when it is received at the packet-based apparatus described for preferred embodiments of the present invention.
- FIGURES 4 through 8 A packet-based network, according to some embodiments of the present invention, that is capable of establishing voice conferences is now described with reference to FIGURES 4 through 8.
- conference sessions are initiated and maintained without the use of a central conference bridge, with each participant within the voice conference forwarding voice data packets generated at its particular packet-based terminal to all of the other participants within the voice conference.
- This forwarding of voice data packets from one point to multipoint could be done with a plurality of unicast transmissions or with a single multicast transmission in which each participant within the voice conference tunes in to.
- FIGURE 4 illustrates a simplified block diagram of a packet-based terminal according to some embodiments of the present invention.
- This packet-based terminal preferably replaces within FIGURE 2, the well-known packet-based terminal depicted within FIGURE 3B.
- the packet-based terminal depicted in FIGURE 4 comprises a packet receipt block 50, an energy detection and talker selection block 60 and an output generator 70.
- the blocks within FIGURE 4 are depicted as separate components, these blocks are meant to be logical representations of algorithms which are hereinafter referred to collectively as conference processing logic.
- some or all of the conference processing logic is essentially software algorithms operating within a single control component such as a DSP.
- some or all of the conference processing logic is comprised of hard logic and/or discrete components.
- the operations of the packet receipt block 50 and the energy detection and talker selection block 60 will be described with reference to FIGURE 5.
- the operation of the output generator 70 will be described with reference to FIGURE 6.
- FIGURE 5 is a flow chart that depicts the steps performed by the packet receipt block 50 and the energy detection and talker selection block 60. This flow chart depicts the processing that occurs for a single voice data packet received by the packet-based terminal. It should be understood that multiple packets could proceed through this procedure at any one time which could possibly result in more than one packet being processed at the same step at the same time. Since these steps are preferably software operations, the situation in which a multiple number of packets operate at a common step within the procedure simply indicates that the software is being used by different packets in parallel.
- the first step 80 has the packet receipt block 50 receive a voice data packet from the packet-based network coupled to the packet-based terminal.
- This packet may be an IP packet or a packet of another format that can be transported on the packet-based network.
- the packet is sent from another packet-based terminal being used within a voice conference (more generally referred to as a source for media signals) and contains a compressed voice signal that corresponds to a participant that is speaking at the particular terminal.
- the packet receipt block 50 removes the packet overhead from the received voice data packet.
- This overhead may include the actual packet header and footer utilized, as well as any other transport protocol wrapper.
- the removal of the packet overhead results in only the compressed voice signal within the received packet being forwarded on for further processing.
- information contained within the packet overhead such as the source address, is still preferably used by the control plane to identify the source terminal and the voice conference that this particular voice signal corresponds.
- a time stamp within an RTP header of the packet header is preferably extracted and used in later processing within the media plane as described below.
- the compressed voice signal is subsequently processed by the energy detection and talker selection block 60 as depicted at steps 82 through 90. Firstly within this processing, the block 60 determines if the compressed voice signal contains speech at step 82 by performing an energy detection operation. A compressed voice signal containing speech indicates that the source of the corresponding voice data packet has a speaking participant local.
- a Voice Activity Detection (VAD) operation is enabled at the packet-based terminal that sent the voice data packet.
- the VAD operation alternatively is enabled at the packet-based network interface if the source of media signals is a non-packet-based telephone terminal.
- packets (and therefore compressed voice signals) that can contain speech can be distinguished from packets that do not by the number of bytes contained within the packet.
- the size of the compressed voice signal can determine whether it contains speech. For example, in the case that the G.723.1 VoIP standard is utilized, voice data packets containing voice would contain a compressed voice signal of 24 bytes while voice data packets containing essentially silence would contain a compressed voice signal of 4 bytes.
- the block 60 determines if there is speech within the compressed voice signal by monitoring a pitch-related sector within the corresponding voice data .packet.
- the pitch sector is an 18-bit field that contains pitch lag information for all subframes.
- the block 60 uses the pitch sector to generate a pitch value for each subframe. If the pitch value is within a particular predetermined range, the corresponding compressed voice signal is said to contain speech. If not, the compressed voice signal is said to not contain speech.
- This predetermined range can be determined by experimentation or alternatively calculated mathematically. It is noted that many current VoIP standard codecs include pitch information as part of the transmitted packet and a similar comparison of pitch values with a predetermined range can be used with these standards. It is further noted that the energy determination operations which determine whether a particular compressed voice signal contains speech should not be limited to the above described embodiments. If the compressed voice signal at step 82 is deemed to not contain speech, the particular signal is discarded at step 83. The frequency in which signals are discarded from a signal source based upon their lack of speech affects the deselection of talkers for the voice conference as will be described herein below.
- the energy detection and talker selection block 60 proceeds to determine at step 84 whether the compressed voice signal is from a packet-based terminal (more generally a source of media data packets) selected to be a talker, voice signals from talkers being the only voice signals heard by the participant (s) at the particular packet- based, terminal.
- a packet-based terminal more generally a source of media data packets
- the selection and de-selection of terminals as talkers is performed by a talker selection algorithm within the block 60.
- the terminal that is referenced as the source for the voice data packets containing speech, for simplicity herein below, the description will refer to the talker selection algorithm determining which participants are speaking rather than referring to which terminals have participants that are speaking. It should be recognized that a reference to a participant speaking indicates that the voice data packet received from the terminal corresponding to the particular participant has been deemed to contain speech. There are preferably three main situations which would result in different operations for the talker selection algorithm, these situations being no participants speaking, only one participant speaking, and two or more participants speaking at once.
- the talker selection algorithm preferably has no terminals selected as talkers, thus removing the need for any further processing to take place.
- the talker selection algorithm preferably has only one terminal selected as a talker, that terminal being the one corresponding to the speaking participant. In this situation, the single talker is hereinafter referred to as a "lone talker".
- the talker selection algorithm preferably has one terminal selected as a "primary talker" and a second terminal selected as a "secondary talker" for the voice conference.
- the talker selection algorithm selects the primary and secondary talkers using a predetermined selection parameter.
- this selection parameter is the order in which the participants began to speak.
- the selection parameter takes into consideration the volume level of the participants (i.e. comparing the energy levels of the talkers) .
- a control mechanism is in place that automatically selects a participant to be the primary or secondary talker.
- control mechanism could be utilized in cases that there is a moderator and/or a scheduled speaker for the voice conference.
- a control mechanism is in place that allows a user of a packet-based terminal to customize his/her personal settings in order to block out a particular participant or always select a particular participant as a talker.
- selection parameters are not meant to limit the scope of the present invention.
- the key to this portion of the preferable packet-based apparatus is the selection of talkers while the parameter used for this selection and the number of talkers selected is not directly relevant to the present invention.
- the talker selection algorithm comprises a software algorithm that is continuously operating during a voice conference with the determination of those speaking and the selection of no talkers, a lone talker, or primary and secondary talkers being dynamic during the receiving of voice data packets as will be described with reference to steps 84 through 90.
- the talker selection algorithm preferably performs operations to deselect talkers continuously during the voice conference. These de-selection operations preferably including the steps of determining the length of time between voice data packets containing speech coming from the talker (s) and de-selecting any talker if the length of time between voice data packets containing speech exceeds a threshold level.
- the above described talker selection algorithm for the case that the talker selection parameter is the order in which the participants begin to speak and a maximum of two talkers are selected at once, is implemented in steps 84 through 90.
- the energy detection and talker selection block 60 determines if the compressed voice signal is from a participant selected as a talker. If the compressed signal is from a talker, the talker selection algorithm determines, as depicted at step 85, if the talker is a lone talker, a primary talker, or a secondary talker.
- the output generator 70 processes the compressed voice signal differently depending on the "type" of talker it corresponds to.
- the talker selection algorithm proceeds to determine if there are currently two talkers selected at step 86. If there are two talkers already selected, the compressed voice signal is discarded at step 83. If there are not two talkers already selected at step 86, the talker selection algorithm determines if there is currently a lone talker selected at step 87. If there is not a lone talker already selected at step 87, the talker selection algorithm selects the participant corresponding to the particular compressed voice signal as the lone talker at step 88.
- the talker selection algorithm proceeds to set the participant corresponding to the particular compressed voice signal as the secondary talker at step 89 and to set the lone talker as the primary talker at step 90.
- the procedure that occurs within the output generator 70 if the compressed voice signal corresponds to one of a lone talker, a primary talker, and a secondary talker will now be described with reference to FIGURE 6.
- the output generator 70 proceeds to perform jitter buffer operations on the compressed voice signal, hereinafter referred to as a secondary voice signal, as were previously described for jitter buffers 38,47 within FIGURES 3A and 3B respectively.
- jitter buffer operations preferably include ensuring that the voice signals are within the proper sequence (i.e. time ordering signals) and buffering the signals to ensure smooth playback.
- the output generator determines whether the secondary voice signal has previously been regenerated for at step 96 by monitoring the time stamp associated with the secondary voice signal and comparing it to the time stamps associated with previously received secondary voice signals. If it is found that the voice signal was previously regenerated for, the secondary voice signal is discarded at step 98 and the conference processing logic returns to step 80.
- the secondary voice signal is decompressed (converting it into a decompressed voice signal that is preferably a PCM signal) and preferably temporarily saved within the output generator 70 in both compressed and decompressed formats.
- the secondary voice signal is saved within only one of the compressed and decompressed formats. Saving in only the decompressed format would result in the need for a decompression operation at a subsequent step.
- the output generator 70 proceeds to perform jitter buffer operations on the compressed voice signal, hereinafter referred to as a primary voice signal, in similar fashion to that described above for step 94. Subsequently, at step 104, it is determined whether there is a secondary voice signal currently saved within the output generator 70 with a corresponding time stamp.
- a predetermined time T is a waiting period in which the output generator 70 will not utilize the primary voice signal as the procedure returns to step 104. This compensates for minor delays caused in the network by providing the voice data packets arriving from the secondary talker a limited amount of leeway after the arrival of a voice data packet corresponding to the primary talker. Preferably, if no voice data packets arrive from the secondary talker after the time T expires, the voice data packets corresponding to the primary talker are not subsequently delayed by this delay mechanism.
- a voice signal is generated for the secondary talker at step 108 with the use of a packet loss concealment algorithm.
- This generated voice signal is an approximation of what the secondary talker is saying based upon previous secondary voice data packets that were received.
- One such packet loss * concealment algorithm is disclosed within U.S. patent application 09/353906 entitled “Apparatus and Method of Regenerating a Lost Audio Segment" by Gunduzhan, filed on July 15, 1999, assigned to the assignee of the present invention and herein incorporated by reference.
- a number of operations are preferably performed by the output generator 70. These operations include decompressing the compressed primary voice signal (and secondary voice signal if previously not done) , hence converting it into an uncompressed voice signal that is preferably a PCM signal; mixing the primary voice signal with the secondary voice signal using a well-known mixing algorithm as is currently used for combining two uncompressed voice signals such as PCM signals, the primary and secondary voice signals being combined into a single uncompressed voice signal (preferably a PCM signal) ; and sending the result of the mixing operation to a speaker within the terminal for conversion into an audible form.
- the packet- based apparatus is a packet-based network interface
- the result of the mixing operation would in fact be forwarded via the non-packet-based network, such as PCM telephone network 150, to a non-packet-based terminal, such as PCM terminal 154, for broadcasting on a speaker.
- the conference processing logic returns to step 80 within FIGURE 5. If the compressed voice signal was determined to correspond to a lone talker, the output generator 70 preferably, as depicted at step 112, performs jitter buffer operations in the same manner as is done in steps 94,102.
- the compressed voice signal is decompressed, hence converting it into an uncompressed voice signal that is preferably a PCM signal, and the result is sent to a speaker within the terminal for conversion into an audible form.
- the uncompressed voice signal would be forwarded, via a non-packet-based network to a non- packet based terminal for broadcasting on a speaker.
- the conference processing logic returns to step 80 within FIGURE 5.
- FIGURE 7 is a more detailed functional block diagram of the block diagram of FIGURE 4 for the case that the talker selection algorithm determines that there are two or more speakers and further selects primary and secondary talkers.
- the terminal of FIGURE 4 logically comprises protocol stacks 52 for receiving voice data packets from each of the other participants within a voice conference (in this case participants B through Z) , energy detection blocks 62 that are each coupled to one of the protocol stack 52 and a talker selection block 64- coupled to all of the energy detection blocks 62.
- voice data packets from each of the participants, participants A through Z in this case are input to a respective protocol stack 52.
- these protocol stacks 52 are the only logical component within the packet receipt block 50.
- the protocol stacks 52 remove the packet overhead from the received voice data packets and output voice signals in compressed format.
- the protocol stacks 52 together comprise a single software algorithm that is run for each received packet.
- the software algorithm is possibly run multiple times in parallel as numerous packets from different participants can be received at one time.
- the compressed voice signal output from each of the protocol stacks 52 is subsequently received by a corresponding energy detection block 62.
- These energy detection blocks 62 are one of the logical components within the energy detection and talker selection block 60 of FIGURE 4, with the energy detection blocks 62 together comprising a single software algorithm that is run for each compressed voice signal. It is determined for each of the voice signals within the received voice data packets whether the voice signal contains speech with use of the energy detection blocks 62, these determinations being forwarded to the talker selection block 64.
- the talker selection block 64 preferably receives the determinations of which of the received voice signals contain speech and, in the case of two or more speakers, determine who are the primary and secondary talkers.
- FIGURE 7 depicts the case that there are at least two current talkers in the voice conference and the talker selection block 64 has selected two participants to be the primary and secondary talkers.
- jitter buffers 72 independently coupled to the talker selection block 64, two decompression blocks 74 coupled to respective ones of the jitter buffers 72, and a mixer 76 coupled to both the decompression blocks 74.
- the jitter buffers 72 operate, as described in steps 94,102, to ensure that the voice signals are within the proper sequence (i.e. time ordering voice signals) and to buffer the voice signals to ensure smooth playback.
- the primary and secondary compressed voice signals are decompressed such that they are preferably in PCM format at decompression blocks 74 and mixed together at mixer 76.
- the mixer 76 then subsequently sends the mixed signal to a speaker (not shown) within the terminal that converts the voice signal into an audible form.
- the packet-based apparatus is a packet- based network interface rather than a packet-based terminal, the mixed signal will be sent via a non-packet-based network to a non-packet-based terminal for broadcasting on a speaker.
- FIGURE 8 has any voice signals generated at the particular terminal possibly effecting the selection of primary and secondary talkers.
- voice signals in uncompressed format such as PCM format are output from a microphone (not shown) and received at a compression block 120 which compresses the voice signals and outputs them to an energy detection block 122' and a transmitter 124.
- the energy level determined by the energy detection block 122 which is preferably simply a software algorithm the same as blocks 62, outputs this energy information to the talker selection block 64.
- the participant at the particular terminal is considered a source of media signals and could be selected as the primary or secondary talker for the terminal.
- the output generator 70 treats the other participant selected to be a talker, if any, as a lone talker. If the participant at the terminal in question is selected to be a lone talker, the terminal discards all received voice signals and no voice signals are sent to the speaker. It should be understood that this alternative embodiment could also apply to the case in which the packet- based apparatus is a packet-based network interface.
- the compressed voice signal could be encapsulated within transmitter 124 and then subsequently received at the packet receipt block 50 of FIGURE 4.
- the voice data packet output from the transmitter 124 would be received at a protocol stack 62 and would be treated in similar fashion to packets from other packet-based terminals within the voice conference.
- the compression block 120 and the transmitter 124 combined can be considered a media data packet generation unit.
- a participant that is selected as the primary or secondary speaker on most other terminals within the voice conference may be hearing two other speakers. This may allow the particular participant to hear another participant that is effectively muted by some terminals. This inconsistency could cause confusion; for instance, if the particular participant replies to a comment made by the muted participant.
- the compression block 120 and the transmitter 124 would preferably be included in these packet-based terminals. They are left off these figures for simplicity.
- Yet another alternative embodiment within the packet-based terminal is the moving of the jitter buffer and/or the decompression operations to another position within the conference processing logic.
- the advantage of having the jitter buffer and decompression operations after the talker selection block 64 is the reduced number of jitter buffer and decompression operations that are required to be performed as they only must be performed on the voice signals corresponding to the primary and secondary talkers.
- the jitter buffer and/or decompression operations occur within the packet receipt block 50 directly after the protocol stack operation. In this case, the jitter buffer and/or decompression operations are required to be performed for everyone of the participants in the voice conference.
- the decompression operation is moved to the packet receipt block 50, the alternative depicted within FIGURE 8 could still be implemented with a slight modification.
- the compression block 120 is not necessary and uncompressed voice signals output from the microphone (not shown) would be received at the energy detection and talker selection block 60 along with the uncompressed voice signals output from the packet receipt block 50.
- the packet-based terminal of embodiments as described herein above is not specific to any one packet- based voice communications standard (such as VoIP G.711, G.729, G.723, etc), as it can be modified such that it can be used for numerous different standards.
- the packet-based terminal is a multi-mode terminal that allows for voice conferences of a number of different standards to utilize the single packet-based terminal.
- Another possible advantage of the present invention is the reduced number of compression and decompression operations that are required. Only a single decompression operation is required in the packet-based terminal of FIGURE 4 with no compression operations. Hence, no transcoding is required and an improved signal quality is possible.
- the traditional voice conferencing techniques have a decompression and compression operation within the central conference bridge as well as a further decompression operation within the individual terminals.
- Yet another possible advantage of the present invention is the increased bandwidth distribution within the packet-based network due to the lack of a central point at which all voice data packets within a voice conference must meet, that central point traditionally being the conference bridge.
- the preferable implementation described above entails having a conference processing logic that is distributed amongst the packet-based terminals of the voice conference.
- Even another possible advantage of the present invention is a possible reduction in latency due to a possible reduction in equipment that voice data packets must traverse.
- a voice data packet from a talker must traverse a first set of equipment to reach the conference bridge and, after being processed by the conference bridge, must traverse a second set of equipment to reach other packet-based terminals within the voice conference.
- the implementation of the present invention if any of the equipment of the first and second set are the same, it may be possible to reduce the amount of equipment a voice data packet traverses, hence reducing its latency. This advantage is especially important over implementations in which the conference bridge is either physically remote from the packet-based terminals of the voice conference or implemented on a separate network than the packet-based terminals.
- the embodiments of the present invention described above are specific to packet-based networks comprising a plurality of packet-based terminals (or packet- based apparatus in general) that each perform talker selection operations, the present invention should not be limited to such embodiments.
- less than all of the packet-based terminals within a packet- based network perform talker selection and, if necessary, mixing operations.
- only one packet- based terminal performs talker selection and mixing operations, this packet-based terminal acting as a conference bridge for the other packet-based terminals.
- the packet-based terminal performing the talker selection and mixing operations outputs compressed voice signals respective of the selected talker (s) to the other packet-based terminals similar to the operation of a conference bridge.
- a modified control plane could be used such that a number of operations could be controlled with the transmission of control packets between participants and possibly a moderator.
- One such operation could have a moderator established as a permanent talker throughout the voice conference, possibly as a permanent secondary talker or possibly as a third selected talker.
- a modified control plane Another operation that could be controlled through use of a modified control plane is the manual selection of primary and/or secondary talkers. This may be useful in cases where a particular participant is scheduled to speak.
- Yet another possible operation that could be maintained with use of a modified control plane is a sidebar operation.
- a sidebar operation at least two of the participants within a voice conference can form a subset of participants smaller than the set that defines the entire voice conference. With this setup, one participant within the subset can choose to communicate with the entire voice conference or with only the members of the subset.
- Another feature that could be added to the present invention described herein above is the sending of video streams via video data packets within the packet-based network.
- the video data packets would replace or supplement the voice data packets within the above described implementations.
- the operation of an embodiment with this feature would operate the same as described herein above with these video signals preferably corresponding to the primary talker.
- a manual control within the control plane could be added so that each participant or a moderator could select which video stream to view.
- a picture-in-picture feature could be used such that two or more video streams could be shown at once. In the case of there being primary and secondary talkers, the picture-in-picture operation could be equivalent to the mixing of the corresponding voice signals.
- voice data packets and voice signals these packets and signals can be referred to broadly as media data packets and media signals respectively.
- media data packets are any data packets that are transmitted via the media plane, these media data packets preferably being either audio or audio/video data packets.
- voice data packets are specific to the preferred embodiments in which the audio signals are voice.
- video data packets may incorporate audio data packets.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Telephonic Communication Services (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01973878A EP1323286A2 (en) | 2000-09-18 | 2001-09-13 | Packet-based conferencing |
CA002422448A CA2422448A1 (en) | 2000-09-18 | 2001-09-13 | Packet-based conferencing |
AU2001293542A AU2001293542A1 (en) | 2000-09-18 | 2001-09-13 | Packet-based conferencing |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US66445000A | 2000-09-18 | 2000-09-18 | |
US09/664,450 | 2000-09-18 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002025908A2 true WO2002025908A2 (en) | 2002-03-28 |
WO2002025908A3 WO2002025908A3 (en) | 2003-04-10 |
Family
ID=24666017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2001/001298 WO2002025908A2 (en) | 2000-09-18 | 2001-09-13 | Packet-based conferencing |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1323286A2 (en) |
AU (1) | AU2001293542A1 (en) |
CA (1) | CA2422448A1 (en) |
WO (1) | WO2002025908A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008112002A2 (en) | 2007-03-14 | 2008-09-18 | Hewlett-Packard Development Company, L.P. | Connecting collaboration nodes |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2323246A (en) * | 1997-03-15 | 1998-09-16 | Ibm | Internet telephony signal conversion |
EP0969687A1 (en) * | 1998-07-02 | 2000-01-05 | AT&T Corp. | Internet based IP multicast conferencing and reservation system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2224541C (en) * | 1997-01-07 | 2008-06-17 | Northern Telecom Limited | Method of providing conferencing in telephony |
EP1001596B1 (en) * | 1998-11-16 | 2013-10-02 | Siemens Enterprise Communications GmbH & Co. KG | Multimedia terminal for telephony enabling multipoint connections |
-
2001
- 2001-09-13 WO PCT/CA2001/001298 patent/WO2002025908A2/en not_active Application Discontinuation
- 2001-09-13 AU AU2001293542A patent/AU2001293542A1/en not_active Abandoned
- 2001-09-13 EP EP01973878A patent/EP1323286A2/en not_active Withdrawn
- 2001-09-13 CA CA002422448A patent/CA2422448A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2323246A (en) * | 1997-03-15 | 1998-09-16 | Ibm | Internet telephony signal conversion |
EP0969687A1 (en) * | 1998-07-02 | 2000-01-05 | AT&T Corp. | Internet based IP multicast conferencing and reservation system |
Non-Patent Citations (2)
Title |
---|
LINDBERGH D: "THE H.324 MULTIMEDIA COMMUNICATION STANDARD" IEEE COMMUNICATIONS MAGAZINE, IEEE SERVICE CENTER. PISCATAWAY, N.J, US, vol. 34, no. 12, 1 December 1996 (1996-12-01), pages 46-51, XP000636453 ISSN: 0163-6804 * |
See also references of EP1323286A2 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008112002A2 (en) | 2007-03-14 | 2008-09-18 | Hewlett-Packard Development Company, L.P. | Connecting collaboration nodes |
WO2008112002A3 (en) * | 2007-03-14 | 2008-12-11 | Hewlett Packard Development Co | Connecting collaboration nodes |
US8024486B2 (en) | 2007-03-14 | 2011-09-20 | Hewlett-Packard Development Company, L.P. | Converting data from a first network format to non-network format and from the non-network format to a second network format |
Also Published As
Publication number | Publication date |
---|---|
AU2001293542A1 (en) | 2002-04-02 |
WO2002025908A3 (en) | 2003-04-10 |
CA2422448A1 (en) | 2002-03-28 |
EP1323286A2 (en) | 2003-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6940826B1 (en) | Apparatus and method for packet-based media communications | |
US6956828B2 (en) | Apparatus and method for packet-based media communications | |
US6463414B1 (en) | Conference bridge processing of speech in a packet network environment | |
US6792092B1 (en) | Method and system for independent participant control of audio during multiparty communication sessions | |
US7054820B2 (en) | Control unit for multipoint multimedia/audio conference | |
US7349352B2 (en) | Method for handling larger number of people per conference in voice conferencing over packetized networks | |
US7978688B2 (en) | System and method for converting packet payload size | |
US8433050B1 (en) | Optimizing conference quality with diverse codecs | |
US20120134301A1 (en) | Wide area voice environment multi-channel communications system and method | |
US6697342B1 (en) | Conference circuit for encoded digital audio | |
EP1668953B1 (en) | Managing multicast conference calls | |
US8515039B2 (en) | Method for carrying out a voice conference and voice conference system | |
Smith et al. | Tandem-free VoIP conferencing: A bridge to next-generation networks | |
US7058026B1 (en) | Internet teleconferencing | |
Smith et al. | Speaker selection for tandem-free operation VoIP conference bridges | |
EP1323286A2 (en) | Packet-based conferencing | |
JP2003023499A (en) | Conference server device and conference system | |
EP1298903A2 (en) | Method for handling larger number of people per conference in voice conferencing over packetized networks | |
JP2000092218A (en) | Technology for effectively mixing audio signal for telephone conference | |
Karlsson | Voice Transmission over Internet | |
JP2005045740A (en) | Device, method and system for voice communication | |
KR20020014067A (en) | Conference call system by using compressed voice data and method for constructing call environment therefor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2422448 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001973878 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2001973878 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001973878 Country of ref document: EP |