[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2003098598A1 - Transcoding of speech in a packet network environment - Google Patents

Transcoding of speech in a packet network environment Download PDF

Info

Publication number
WO2003098598A1
WO2003098598A1 PCT/US2003/006335 US0306335W WO03098598A1 WO 2003098598 A1 WO2003098598 A1 WO 2003098598A1 US 0306335 W US0306335 W US 0306335W WO 03098598 A1 WO03098598 A1 WO 03098598A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
parameters
bit
transcoder
stream
Prior art date
Application number
PCT/US2003/006335
Other languages
French (fr)
Inventor
Adil Benyassine
Eyal Shlomot
Huan-Yu Su
Jes Thyssen
Yang Gao
Original Assignee
Conexant Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Conexant Systems, Inc. filed Critical Conexant Systems, Inc.
Priority to KR10-2004-7017694A priority Critical patent/KR20040104701A/en
Priority to JP2004506009A priority patent/JP2005531017A/en
Priority to EP03713828A priority patent/EP1504441A4/en
Priority to AU2003217859A priority patent/AU2003217859A1/en
Publication of WO2003098598A1 publication Critical patent/WO2003098598A1/en
Priority to IL16514704A priority patent/IL165147A0/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/18Service support devices; Network management devices
    • H04W88/181Transcoding devices; Rate adaptation devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/66Arrangements for connecting between networks having differing types of switching systems, e.g. gateways

Definitions

  • the present invention relates generally to the field of speech coding and, more particularly, to transcoding of speech in a packet network environment.
  • VoP voice-over- packet
  • VoP Voice data encoded according to one standard from a transmitting participant communicating in one network
  • voice data encoded according to one standard from a transmitting participant communicating in one network has to be converted to the standard used by the receiving participant communicating under the guidelines of another network.
  • a transmitting participant's speech may be encoded according to G.723.1 specifications while the receiving participant uses G.729.
  • the bit-stream from the transmitting participant has to be converted from G.723.1 format to G.729 format.
  • encoded data from the transmitting participant is decoded according to the coding method used by the transmitting participant.
  • the decoded data is then re-encoded in accordance with the coding method used by the receiving participant.
  • the data is transmitted to the receiving participant.
  • Known transcoding schemes suffer numerous serious inadequacies.
  • the decoding and re-encoding of the speech signal reduces the quality of the speech.
  • the tandem operation of the post-filter common in low bit-rate speech decoders, can generate objectionable spectral distortion and degrade the speech quality significantly.
  • a speech transcoder capable of transcoding a first bit-stream generated from a speech signal.
  • the transcoder includes a decoder configured to receive the first bit- stream, which has been encoded based on a first coding scheme.
  • the speech signal may have been encoded according to G.711, G.723.1, G726 or G.729, and may be parametric or non- parametric.
  • the decoder extracts a plurality of first speech parameters from the first bit-stream, which may include, for example, parameters relating to spectral characteristics, energy, pitch and/or pitch gain of the speech signal.
  • the decoder also decodes the first bit-stream according to the first coding scheme and generates a plurality of first speech samples.
  • the decoder may include a post-filter element, which may be disabled to reduce system complexity and to improve the speech-quality of a speech signal generated by a subsequent re-encoding process.
  • the plurality of first speech samples and the plurality of first speech parameters are then transmitted to a converter capable of converting the plurality first speech samples and plurality of first speech parameters to a plurality of second speech samples and a plurality of second speech parameters for use according to a second coding scheme.
  • the second coding scheme may be G.711, G.723.1, G726 or G.729, for example, and may be parametric or non-parametric.
  • the plurality of second speech samples and the plurality of second speech parameters are transmitted to an encoder.
  • the encoder receives the plurality of second speech samples and plurality of speech parameters and generates a second bit-stream, the second bit-stream being encoded based on the second coding scheme.
  • the encoder may include a noise suppressor element, which may be disabled to reduce system complexity and to improve the speech-quality of a speech signal. It is appreciated that by extracting the speech parameters from the first bit-stream, converting the speech parameters, and providing the converted speech parameters to the encoder avoids a re-evaluation of speech parameters during the encoding process, achieving many advantageous results, such as reduced system complexity and less delay.
  • FIG. 1 illustrates a block diagram of a packet-based network in which various aspects of the present invention may be implemented
  • FIG. 2 illustrates a block diagram of a transcoding system in accordance with one embodiment
  • FIG. 3 illustrates a block diagram of a conference bridge utilizing a transcoding system in accordance with one embodiment
  • FIG. 4 illustrates a block diagram of a component of a conference bridge utilizing a transcoding system in accordance with one embodiment
  • FIG. 5 illustrates an exemplary flow diagram of a transcoding method utilizing the transcoding system of FIG. 2.
  • the present invention may be described herein in terms of functional block components and various processing steps. It should be appreciated that such functional blocks may be realized by any number of hardware components and/or software components configured to perform the specified functions.
  • the present invention may employ various integrated circuit components, e.g., memory elements, digital signal processing elements, logic elements, and the like, which may carry out a variety of functions under the control of one or more microprocessors or other control devices.
  • the present invention may employ any number of conventional techniques for data transmission, signaling, signal processing and conditioning, tone generation and detection and the like. Such general techniques that may be known to those skilled in the art are not described in detail herein.
  • FIG. 1 depicts an exemplary communication environment 100 that is capable of supporting the transmission of packetized voice information.
  • a packet network 102 e.g., a network conforming to the Internet Protocol ("IP") , may support Internet telephony applications that enable a number of participants to conduct voice calls in accordance with conventional voice-over-packet techniques.
  • packet network 102 may communicate with conventional telephone networks, local area networks, wide area networks, public branch exchanges, and/or home networks in a manner that enables participation by users that may have different communication devices and different communication service providers.
  • Participant 1 and Participant 2 communicate with packet network 102 (either directly or indirectly) via the transmission of packets that contain voice data.
  • Participant 3 communicates with packet network 102 via a gateway 104
  • Participant 4 communicates with packet network 102 via a gateway 106.
  • a gateway is a functional element that converts voice data into packet data.
  • a gateway may be considered to be a conversion element that converts conventional voice information into a packetized form that can be transmitted over a packet network.
  • a gateway may be implemented in a central office, in a peripheral device (such as a telephone), in a local switch (e.g., one associated with a public branch exchange), or the like. The functionality and operation of such gateways are well known to those skilled in the art and will therefore not be described in detail. It will be appreciated that the present invention can be implemented in conjunction with a variety of conventional gateway designs.
  • a transcoder 108 may be included in packet network 102.
  • Transcoder 108 may be implemented in a central office or maintained by an Internet service provider ("ISP").
  • ISP Internet service provider
  • the voice data from a number of packet-based participants, e.g., Participants 1 and 2 can be processed by transcoder 108 without having to perform the conversions normally performed by gateways.
  • a transcoder 110 may be associated with or included in a gateway, e.g., gateway 104.
  • transcoder 110 may be capable of receiving and processing voice-over-packet data and conventional voice signals.
  • gateway 104 enables Participant 3, through transcoder 110, to communicate with packet network 102 and a participant coupled to packet network 102, e.g., Participant 1 or 2.
  • a packet-based transcoder may be deployed in a telephony system to facilitate communication between participants using different standards or techniques of speech coding.
  • a given packet-based voice channel may employ one of a number of different speech coding/compression standards.
  • Various speech coding standards are generally known to those skilled in the art and may include, for example, G.711, G.726, G.728, G.729CA), G.723.1, Global System for Mobile Communication (“GSM”), selectable mode vocoder (“SMV”), and adaptive multi rate (“AMR”) coding, the specifications for which are hereby incorporated by reference.
  • transcoder 108 or 110 may be capable of handling speech that has been encoded by various standards.
  • transcoder should be capable of handling speech that has not been encoded.
  • FIG. 2 illustrates exemplary communication system 200 for transcoding in accordance with one embodiment of the present invention.
  • a first participant i.e., Participant 1
  • a second participant i.e., Participant 2
  • Transcoder 206 a first participant
  • Participant 1 is coupled to transcoder 206 via channel 204
  • Participant 2 is coupled to transcoder 206 via channel 216.
  • voice data from Participant 1 may be encoded by encoder 202 and sent to transcoder 206 via channel 204.
  • the voice data from participant 1 may need to be compressed and encoded by encoder 202 using a suitable coding standard.
  • channel 204 may be an Internet-based packet network, in which case encoder 202 may use a suitable packet format to packetize the voice data.
  • the output data from encoder 202 transmitted over channel 204 will include encoded digital data, in the form of a bit-stream, in accordance with one or more encoding standards, e.g., G.723.1 or G.729.
  • channel 204 may function as a local link, coupling Participant 1 to transcoder 206, in which case encoder 202 may digitize the voice data from Participant 1 without encoding, and digitized data is transmitted over channel 204.
  • the bit-stream from Participant 1 arriving at transcoder 206 via channel 204 is initially inputted into, and processed by decoder 208, which is configured to decode the bit-stream according to the coding method for the transmitting participant, i.e., Participant 1.
  • decoder 208 would decode the bit-stream accordingly.
  • post-filter element of decoder 208 may be disabled or its capabilities reduced to minimize the degradation frequently found with conventional decoding algorithms utilizing post-filtering.
  • decoder 208 is also configured to extract certain speech parameters from the bit-stream.
  • the speech parameters which are also referred to as "side information" in the present application, may include, for example, the energy, the spectral characteristics, the pitch and pitch gain of the speech signal.
  • the speech parameters are transmitted by decoder 208 to converter 212.
  • the speech samples and speech parameters inputted into converter 212 are suitably processed and converted for eventual encoding by an encoder according to the standard suitable for the receiving participant.
  • the conversion performed by converter 212 may be based on the speech samples and/or at least one of the parameters, for example, received from decoder 208.
  • the speech samples may be modified into a format suitable for re-encoding by encoder 214. For example, in instances where Participants 1 and 2 are using coding standards having different frame structures, converter 212 may resize the frames, to provide the speech samples according to a proper frame size for use by encoder 214.
  • the speech information comprising the converted speech samples and speech parameters, is transmitted to encoder 214.
  • decoder 208 may only provide the speech samples to converter 212 and no speech parameters (or side information).
  • converter 212 receives the speech samples from decoder 208 and converts the speech samples to provide them according to a proper frame size for use by encoder 214.
  • Encoder 214 is configured to encode the speech information according to the standard used by the receiving participant, i.e., Participant 2 in the present example. Thus, if Participant 2 utilizes a selectable mode vocoder ("SMV"), for example, then encoder 214 would encode the bit-stream according to the SMV standard.
  • encoder 214 can be configured to encode speech information using the speech parameters extracted by decoder 208 and processed by converter 212. In this manner, parameters such as the energy, the spectral characteristics, the pitch and pitch gain of the speech signal, which are conventionally needed to re-encode the speech information by encoder 214, do not have to be re-extracted from the speech samples by encoder 214.
  • encoder 214 does not have to perform such parameter estimation tasks as spectral analysis, pitch analysis, and the like, or encoder 214 may only have to perform lower complexity parameter estimation tasks.
  • the transcoding scheme of various embodiments of the present invention substantially reduces the processing power, minimizes delay, and reduces overall system complexity when compared to conventional transcoding schemes.
  • the noise suppression capability of encoder 214 may be disabled in order to further reduce the system's complexity.
  • the speech parameters are extracted during the initial decoding step for use during the re-encoding step, degradation of the signal resulting from, for example, spectral and pitch re-evaluation is avoided.
  • the bit-stream is transmitted to the receiving participant, i.e., Participant 2, via channel 216 in the format suitable for use by decoder 218, which then decodes the bit-stream.
  • exemplary communication system 300 is used to illustrate a conference bridge using transcoding techniques of the present invention, in accordance with one embodiment. More particularly, communication system 300 shows how the present invention can be used to transcode and mix speech signals from two or more transmitting participants to a receiving participant, where each transmitting participant may be using a different coding scheme from the other.
  • Participants 1, 2 and 3 are coupled to conference bridge 306 via channels 304, 316 and 322, respectively. It is appreciated that in the present example, Participants 1 and 3 are both communicating with Participant 2 at the same time.
  • speech from participant 1 is encoded by encoder 302 into a format suitable for transmission over channel 304 to decoder 308.
  • encoder 320 encodes speech from Participant 3 into a format suitable for transmission over channel 322 to decoder 324.
  • Both decoders 308 and 324 can be configured to decode the incoming bit-streams, such as those coming from Participants 1 and 3, according to the coding schemes used by the transmitting participants and to generate speech samples from the bit-streams. Decoders 308 and 324 may also extract speech parameters, from the bit-stream, or generate the speech parameters if the speech was originally encoded according to a non-parametric standard.
  • Converter/mixer 312 can be configured to convert, combine and mix the inputted speech samples and the speech parameters to generate a single speech information suitable for encoding according to the coding scheme used by the receiving participant, i.e., Participant 2.
  • converter/mixer 312 may need to take into account frame size and other factors in order to generate a bit-stream suitable for encoding by the receiving participant.
  • G.723.1 uses a frame size of 30 ms
  • G.729 uses a frame size of 10 ms.
  • a common frame structure may be established to enable effective mixing of the speech samples from decoders 308 and 324. For example, if at least one of the input channels is encoded using G.723.1, then a 30 ms frame may be established. Alternatively, a frame size equal to the least common multiple might be used.
  • a 60 ms frame may be established. Once a frame size is determined, the speech samples and the speech parameters can be properly interpolated and aligned during mixing.
  • converter/mixer 312 Once converter/mixer 312 has converted and mixed the signal from decoder 308 with the signal from decoder 324 to generate a combined bit-stream, the bit-stream is transmitted to encoder 314.
  • Converter/mixer 312 can also provide encoder 314 with the speech parameters extracted from the inputted speech signals.
  • Encoder 314 can be configured to re-encode the bit-stream according to the same coding standard used by Participant 2. For example, if Participant 2 uses G.726, then encoder 314 would re-encode the speech information according to G.726.
  • Encoder 314 may use the parameters extracted by decoders 308 and 324 in order to re-encode the speech information, thus bypassing the need for spectral and pitch re-evaluation during the re-encoding process. In this manner, the complexity, processing demands, and time delay associated with such re-evaluation steps are avoided.
  • the speech signal is transmitted via channel 316 to Participant 2, where decoder 318 decodes the signal.
  • exemplary communication system 400 is used to illustrate a component of a conference bridge using transcoding techniques of the present invention, in accordance with one embodiment. More particularly, communication system 400 shows how the present invention provides an effective means for transcoding inputted speech signals having been encoded according to a non-parametric coding standard, such as G.711, G.726, and G.728, for example. As shown in FIG. 4, communication system 400 includes channel 404, conference bridge 406 and channel 416. It is appreciated that channels 404 and 416 are respectively equivalent to channels 204 and 216 of communication system 200 illustrated in FIG. 2.
  • a speech signal transmitted to conference bridge 406 via channel 404 is decoded by decoder 408 to generate speech samples from the incoming bit-stream.
  • Decoder 408 may also extract speech parameters from the bit-stream to generate the speech parameters in instances where the speech was encoded originally using a parametric standard, such as G.729 or G723.1.
  • a parametric standard such as G.729 or G723.1.
  • non-parametric speech coding standards for example G.711, G.726 and G.728, typically do not quantize various speech-related parameters, such as the signal pitch and spectrum. As a result, these parameters may not be extracted by decoder 408 directly from the bit-stream during the decoding process. In such instances, as shown in FIG.
  • parameter extraction module 410 which extracts the desired speech-related parameters (or the side information) for subsequent use by encoder 414, as described below.
  • parameter extraction module 410 can be configured to extract data regarding the signal energy, spectral characteristics, pitch and pitch gain, and the like, and to provide such parameters to converter/mixer 412.
  • converter/mixer 412 receives speech samples and the speech parameters (or side information) 420 from other decoding devices (not shown).
  • Converter/mixer 412 can be configured to combine and mix the speech samples and the speech parameters from decoder 408 and parameter extraction module 410 with speech samples and speech parameters 420 into a combined bit-stream suitable for use by encoder 414 in the re-encoding process. For example, in order to combine and mix the signals, converter/mixer may resize the frames of the speech samples in order to establish a common frame structure suitable for encoder 414.
  • Converter/mixer 412 can also provide encoder 414 with the speech parameters (or side information) for use in re-encoding the bit-stream.
  • the combined speech samples and extracted parameters provided by converter/mixer 412 can be used by encoder 414 to re-encode the speech signal according to the coding standard used by the receiving participant (not shown). Accordingly, by using the speech parameters (or side information) provided by converter/mixer 412, encoder 414 bypasses the need for spectral and pitch re-evaluation during the re-encoding process. In this manner, the complexity, processing demands, and time delay associated with such re-evaluation steps are avoided.
  • the encoded signal is transmitted to the receiving participant via channel 416.
  • FIG. 5, illustrates exemplary transcoding method 500 in accordance with one embodiment. It is appreciated that transcoding method 500 can be performed by a transcoder such as transcoder 206 in FIG. 1, for example. As shown, transcoding method 500 begins at step 510 and continues to step 512 where the bit-stream from a first participant is received.
  • a parameter set is extracted from the bit-stream.
  • the parameter set may include the signal energy, spectral characteristics, pitch and pitch gain, and the like.
  • the bit-stream is decoded according to the coding scheme used by the first participant and speech samples are generated.
  • the received bit-stream may be encoded according to G.723.1, in which case the bit-stream is decoded at step 516 according to G.723.1
  • transcoding method 500 proceeds to step 518 where the speech samples and parameter set are converted into a suitable form for re- encoding.
  • the form to which the speech samples and parameter set are converted may depend on the particular coding scheme used by the receiving participant.
  • the converted speech samples are re-encoded in accordance with the coding scheme used by the receiving participant, i.e., the second participant in the present example. As such, if the second participant in the present description uses G.729, for example, then the re-encoding performed at step 520 would be done according to G.729.
  • the re-encoding performed at step 520 can utilize the parameter set extracted from the bit-stream at step 516.
  • transcoding method 500 provides a number of advantages over conventional transcoding approaches, including lower processing needs, minimal delay, and a reduction in overall system complexity.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Telephonic Communication Services (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

There is provided transcoding of speech in a packet network environment. A decoder configured to receive a first bit-stream encoded according to a first coding scheme. The decoder decodes the bit-stream according to the first coding scheme, generates a plurality of first speech samples, and extracts a plurality of first speech parameters, which may include spectral characteristics, energy, pitch and/or pitch gain. A converter then converts the plurality first speech samples and plurality of first speech parameters to a plurality of second speech samples and a plurality of second speech parameters for use according to a second coding scheme. The first and second coding schemes may be, for example, G.711, G.723.1, G.726 or G.729, and may be parametric or non-parametric. An encoder receives the plurality of second speech samples and plurality of second speech parameters and generates a second bit-stream according to the second coding scheme.

Description

TRANSCODING OF SPEECH IN A PACKET NETWORK ENVIRONMENT
RELATED APPLICATIONS The present application is a continuation-in-part of United States application serial number 09/547,832, filed April 12, 2000, which claims the benefit of provisional United States application serial number 60/128,873, filed April 12, 1999, which are hereby fully incorporated by reference in the present application.
BACKGROUND OF THE INVENTION
1. FIELD OF THE INVENTION
The present invention relates generally to the field of speech coding and, more particularly, to transcoding of speech in a packet network environment.
2. RELATED ART
The explosive growth of the Internet has been accompanied by a growing interest in using this traditionally data-oriented network for voice communication in accordance with voice-over- packet ("VoP") . The packetizing of voice signals for transmission over a packet network has been recognized as a less expensive, yet effective, alternative to traditional telephone service. The term VoP is an umbrella term that can include, for example, VoIP and other types of services utilizing packetized voice data.
One challenge facing the expansion of VoP is the need to connect diverse types of networks with greater effectiveness. More specifically, because different networks may be using different standards to encode, compress and packetize speech, a transcoding procedure has to be performed in order for a meaningful connection between networks to be achieved. Typically, voice data encoded according to one standard from a transmitting participant communicating in one network has to be converted to the standard used by the receiving participant communicating under the guidelines of another network. For example, a transmitting participant's speech may be encoded according to G.723.1 specifications while the receiving participant uses G.729. In order for the data from the transmitting participant to be understood by the receiving participant, the bit-stream from the transmitting participant has to be converted from G.723.1 format to G.729 format.
In conventional transcoding approaches, encoded data from the transmitting participant is decoded according to the coding method used by the transmitting participant. The decoded data is then re-encoded in accordance with the coding method used by the receiving participant. In the re- encoded form, the data is transmitted to the receiving participant. Known transcoding schemes, however, suffer numerous serious inadequacies. For example, the decoding and re-encoding of the speech signal (a "tandem" process), reduces the quality of the speech. More particularly, the tandem operation of the post-filter, common in low bit-rate speech decoders, can generate objectionable spectral distortion and degrade the speech quality significantly.
Another drawback of known transcoding schemes is the undesirable delay resulting from the re-encoding step. Typically, re-encoding of the decoded bit-stream requires that the speech signal characteristics be evaluated. As such, parameters including energy, spectral characteristics and pitch, for example, have to be extracted from the bit-stream and used to re-encode the signal. Furthermore, in addition to delay, the need to extract these parameters as part of the re-encoding step introduces greater complexity to the system.
Thus, there is an intense need in the art for a transcoding method, and related system, which can overcome the shortcomings of known transcoding schemes and provide for more effective means by which transcoding between networks can be achieved.
SUMMARY OF THE INVENTION
In accordance with the purpose of the present invention as broadly described herein, there is provided transcoding of speech in a packet network environment. In one exemplary aspect of the present invention, a speech transcoder capable of transcoding a first bit-stream generated from a speech signal is disclosed. The transcoder includes a decoder configured to receive the first bit- stream, which has been encoded based on a first coding scheme. For example, the speech signal may have been encoded according to G.711, G.723.1, G726 or G.729, and may be parametric or non- parametric. The decoder extracts a plurality of first speech parameters from the first bit-stream, which may include, for example, parameters relating to spectral characteristics, energy, pitch and/or pitch gain of the speech signal. The decoder also decodes the first bit-stream according to the first coding scheme and generates a plurality of first speech samples. In certain configurations, the decoder may include a post-filter element, which may be disabled to reduce system complexity and to improve the speech-quality of a speech signal generated by a subsequent re-encoding process.
The plurality of first speech samples and the plurality of first speech parameters are then transmitted to a converter capable of converting the plurality first speech samples and plurality of first speech parameters to a plurality of second speech samples and a plurality of second speech parameters for use according to a second coding scheme. The second coding scheme may be G.711, G.723.1, G726 or G.729, for example, and may be parametric or non-parametric. Following conversion by the converter, the plurality of second speech samples and the plurality of second speech parameters are transmitted to an encoder. The encoder receives the plurality of second speech samples and plurality of speech parameters and generates a second bit-stream, the second bit-stream being encoded based on the second coding scheme. In certain configurations, the encoder may include a noise suppressor element, which may be disabled to reduce system complexity and to improve the speech-quality of a speech signal. It is appreciated that by extracting the speech parameters from the first bit-stream, converting the speech parameters, and providing the converted speech parameters to the encoder avoids a re-evaluation of speech parameters during the encoding process, achieving many advantageous results, such as reduced system complexity and less delay.
These and other aspects of the present invention will become apparent with further reference to the drawings and specification, which follow. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The features and advantages of the present invention will become more readily apparent to those ordinarily skilled in the art after reviewing the following detailed description and accompanying drawings, wherein:
FIG. 1 illustrates a block diagram of a packet-based network in which various aspects of the present invention may be implemented;
FIG. 2 illustrates a block diagram of a transcoding system in accordance with one embodiment;
FIG. 3 illustrates a block diagram of a conference bridge utilizing a transcoding system in accordance with one embodiment;
FIG. 4 illustrates a block diagram of a component of a conference bridge utilizing a transcoding system in accordance with one embodiment; and
FIG. 5 illustrates an exemplary flow diagram of a transcoding method utilizing the transcoding system of FIG. 2.
DESCRIPTION OF EXEMPLARY EMBODIMENTS
The present invention may be described herein in terms of functional block components and various processing steps. It should be appreciated that such functional blocks may be realized by any number of hardware components and/or software components configured to perform the specified functions. For example, the present invention may employ various integrated circuit components, e.g., memory elements, digital signal processing elements, logic elements, and the like, which may carry out a variety of functions under the control of one or more microprocessors or other control devices. Further, it should be noted that the present invention may employ any number of conventional techniques for data transmission, signaling, signal processing and conditioning, tone generation and detection and the like. Such general techniques that may be known to those skilled in the art are not described in detail herein.
It should be appreciated that the particular implementations shown and described herein are merely exemplary and are not intended to limit the scope of the present invention in any way. Indeed, for the sake of brevity, conventional data transmission, signaling and signal processing and other functional and technical aspects of the communication system (and components of the individual operating components of the system) may not be described in detail herein. Furthermore, the connecting lines shown in the various figures contained herein are intended to represent exemplary functional relationships and/or physical couplings between the various elements. It should be noted that many alternative or additional functional relationships or physical connections may be present in a practical communication system.
FIG. 1 depicts an exemplary communication environment 100 that is capable of supporting the transmission of packetized voice information. A packet network 102, e.g., a network conforming to the Internet Protocol ("IP") , may support Internet telephony applications that enable a number of participants to conduct voice calls in accordance with conventional voice-over-packet techniques. In a practical environment 100, packet network 102 may communicate with conventional telephone networks, local area networks, wide area networks, public branch exchanges, and/or home networks in a manner that enables participation by users that may have different communication devices and different communication service providers. For example, in FIG. 1, Participant 1 and Participant 2 communicate with packet network 102 (either directly or indirectly) via the transmission of packets that contain voice data. Participant 3 communicates with packet network 102 via a gateway 104, while Participant 4 communicates with packet network 102 via a gateway 106.
In the context of this description, a gateway is a functional element that converts voice data into packet data. Thus, a gateway may be considered to be a conversion element that converts conventional voice information into a packetized form that can be transmitted over a packet network. A gateway may be implemented in a central office, in a peripheral device (such as a telephone), in a local switch (e.g., one associated with a public branch exchange), or the like. The functionality and operation of such gateways are well known to those skilled in the art and will therefore not be described in detail. It will be appreciated that the present invention can be implemented in conjunction with a variety of conventional gateway designs.
Environment 100 may include any number of transcoders that enable communication between participants using different speech coding standards. For example, a transcoder 108 may be included in packet network 102. Transcoder 108 may be implemented in a central office or maintained by an Internet service provider ("ISP"). In this manner, the voice data from a number of packet-based participants, e.g., Participants 1 and 2, can be processed by transcoder 108 without having to perform the conversions normally performed by gateways.
As another example, a transcoder 110 may be associated with or included in a gateway, e.g., gateway 104. In this configuration, transcoder 110 may be capable of receiving and processing voice-over-packet data and conventional voice signals. Eventually, gateway 104 enables Participant 3, through transcoder 110, to communicate with packet network 102 and a participant coupled to packet network 102, e.g., Participant 1 or 2.
In accordance with the present invention, a packet-based transcoder may be deployed in a telephony system to facilitate communication between participants using different standards or techniques of speech coding. As is known, a given packet-based voice channel, for example, may employ one of a number of different speech coding/compression standards. Various speech coding standards are generally known to those skilled in the art and may include, for example, G.711, G.726, G.728, G.729CA), G.723.1, Global System for Mobile Communication ("GSM"), selectable mode vocoder ("SMV"), and adaptive multi rate ("AMR") coding, the specifications for which are hereby incorporated by reference.
The particular standard utilized for a given call may depend on the participant's Internet service provider, telephone service provider, design of the participant's peripheral device, and other factors. Consequently, a practical transcoder, such as transcoder 108 or 110, may be capable of handling speech that has been encoded by various standards. In addition, such a transcoder should be capable of handling speech that has not been encoded.
FIG. 2 illustrates exemplary communication system 200 for transcoding in accordance with one embodiment of the present invention. As shown in communication system 200, a first participant (i.e., Participant 1) is communicating with a second participant (i.e., Participant 2) through transcoder 206. Participant 1 is coupled to transcoder 206 via channel 204, and Participant 2 is coupled to transcoder 206 via channel 216.
In the illustrated embodiment, voice data from Participant 1 may be encoded by encoder 202 and sent to transcoder 206 via channel 204. As discussed above, depending on such factors as the participant's Internet service or telephony service, for example, the voice data from participant 1 may need to be compressed and encoded by encoder 202 using a suitable coding standard. For example, channel 204 may be an Internet-based packet network, in which case encoder 202 may use a suitable packet format to packetize the voice data. In such case, the output data from encoder 202 transmitted over channel 204 will include encoded digital data, in the form of a bit-stream, in accordance with one or more encoding standards, e.g., G.723.1 or G.729. Alternatively, channel 204 may function as a local link, coupling Participant 1 to transcoder 206, in which case encoder 202 may digitize the voice data from Participant 1 without encoding, and digitized data is transmitted over channel 204.
The bit-stream from Participant 1 arriving at transcoder 206 via channel 204 is initially inputted into, and processed by decoder 208, which is configured to decode the bit-stream according to the coding method for the transmitting participant, i.e., Participant 1. Thus, if the voice data from Participant 1 was encoded by encoder 202 using G.723.1, for instance, then decoder 208 would decode the bit-stream accordingly. In one embodiment, post-filter element of decoder 208 (not shown) may be disabled or its capabilities reduced to minimize the degradation frequently found with conventional decoding algorithms utilizing post-filtering.
In addition to generating speech samples from the bit-stream (i.e., the decoded bit-stream), decoder 208 is also configured to extract certain speech parameters from the bit-stream. The speech parameters, which are also referred to as "side information" in the present application, may include, for example, the energy, the spectral characteristics, the pitch and pitch gain of the speech signal. Thereafter, in addition to the speech samples, the speech parameters (or the side information) are transmitted by decoder 208 to converter 212.
Continuing with FIG. 2, the speech samples and speech parameters inputted into converter 212 are suitably processed and converted for eventual encoding by an encoder according to the standard suitable for the receiving participant. The conversion performed by converter 212 may be based on the speech samples and/or at least one of the parameters, for example, received from decoder 208. As part of the conversion process, the speech samples may be modified into a format suitable for re-encoding by encoder 214. For example, in instances where Participants 1 and 2 are using coding standards having different frame structures, converter 212 may resize the frames, to provide the speech samples according to a proper frame size for use by encoder 214. Following conversion by converter 212, the speech information, comprising the converted speech samples and speech parameters, is transmitted to encoder 214. It should be noted that in some embodiments, decoder 208 may only provide the speech samples to converter 212 and no speech parameters (or side information). For example, when the speech signal is coded according to a non-parametric coding scheme, such as G.711, G.726, G.728, etc., converter 212 receives the speech samples from decoder 208 and converts the speech samples to provide them according to a proper frame size for use by encoder 214.
Encoder 214 is configured to encode the speech information according to the standard used by the receiving participant, i.e., Participant 2 in the present example. Thus, if Participant 2 utilizes a selectable mode vocoder ("SMV"), for example, then encoder 214 would encode the bit-stream according to the SMV standard. According to the present invention, encoder 214 can be configured to encode speech information using the speech parameters extracted by decoder 208 and processed by converter 212. In this manner, parameters such as the energy, the spectral characteristics, the pitch and pitch gain of the speech signal, which are conventionally needed to re-encode the speech information by encoder 214, do not have to be re-extracted from the speech samples by encoder 214. Thus, encoder 214 does not have to perform such parameter estimation tasks as spectral analysis, pitch analysis, and the like, or encoder 214 may only have to perform lower complexity parameter estimation tasks. As a result, the transcoding scheme of various embodiments of the present invention substantially reduces the processing power, minimizes delay, and reduces overall system complexity when compared to conventional transcoding schemes. In one embodiment, the noise suppression capability of encoder 214 may be disabled in order to further reduce the system's complexity. Additionally, because the speech parameters are extracted during the initial decoding step for use during the re-encoding step, degradation of the signal resulting from, for example, spectral and pitch re-evaluation is avoided. Following coding by encoder 214, the bit-stream is transmitted to the receiving participant, i.e., Participant 2, via channel 216 in the format suitable for use by decoder 218, which then decodes the bit-stream.
Referring now to FIG. 3, exemplary communication system 300 is used to illustrate a conference bridge using transcoding techniques of the present invention, in accordance with one embodiment. More particularly, communication system 300 shows how the present invention can be used to transcode and mix speech signals from two or more transmitting participants to a receiving participant, where each transmitting participant may be using a different coding scheme from the other. In communication system 300, Participants 1, 2 and 3 are coupled to conference bridge 306 via channels 304, 316 and 322, respectively. It is appreciated that in the present example, Participants 1 and 3 are both communicating with Participant 2 at the same time.
Continuing with FIG. 3, speech from participant 1 is encoded by encoder 302 into a format suitable for transmission over channel 304 to decoder 308. Similarly, encoder 320 encodes speech from Participant 3 into a format suitable for transmission over channel 322 to decoder 324. Both decoders 308 and 324 can be configured to decode the incoming bit-streams, such as those coming from Participants 1 and 3, according to the coding schemes used by the transmitting participants and to generate speech samples from the bit-streams. Decoders 308 and 324 may also extract speech parameters, from the bit-stream, or generate the speech parameters if the speech was originally encoded according to a non-parametric standard.
Following decoding, the speech samples and the speech parameters for both Participants 1 and 3 are inputted into converter/mixer 312. Converter/mixer 312 can be configured to convert, combine and mix the inputted speech samples and the speech parameters to generate a single speech information suitable for encoding according to the coding scheme used by the receiving participant, i.e., Participant 2.
Depending on the various coding methods used by the transmitting participants, converter/mixer 312 may need to take into account frame size and other factors in order to generate a bit-stream suitable for encoding by the receiving participant. For example, G.723.1 uses a frame size of 30 ms, and G.729 uses a frame size of 10 ms. Thus, a common frame structure may be established to enable effective mixing of the speech samples from decoders 308 and 324. For example, if at least one of the input channels is encoded using G.723.1, then a 30 ms frame may be established. Alternatively, a frame size equal to the least common multiple might be used. In a case where one channel is encoded using G.723.1 (30 ms frame), for example, and another encoded using G.4k (20 ms frame) , a 60 ms frame may be established. Once a frame size is determined, the speech samples and the speech parameters can be properly interpolated and aligned during mixing.
Application serial number 09/547,832, filed April 12, 2000, which is incorporated by reference into the present application, discloses methods by which speech parameters are mixed and interpolated are known and may be used by converter/mixer 312 to mix the speech parameters inputted from decoders 308 and 324. For example, the spectrums of two signals may be summed using a weighted addition. A similar method may be used to mix other parameters, such as pitch and energy.
Once converter/mixer 312 has converted and mixed the signal from decoder 308 with the signal from decoder 324 to generate a combined bit-stream, the bit-stream is transmitted to encoder 314. Converter/mixer 312 can also provide encoder 314 with the speech parameters extracted from the inputted speech signals. Encoder 314 can be configured to re-encode the bit-stream according to the same coding standard used by Participant 2. For example, if Participant 2 uses G.726, then encoder 314 would re-encode the speech information according to G.726. Encoder 314 may use the parameters extracted by decoders 308 and 324 in order to re-encode the speech information, thus bypassing the need for spectral and pitch re-evaluation during the re-encoding process. In this manner, the complexity, processing demands, and time delay associated with such re-evaluation steps are avoided. Following re-encoding by encoder 314, the speech signal is transmitted via channel 316 to Participant 2, where decoder 318 decodes the signal.
Referring now to FIG. 4, exemplary communication system 400 is used to illustrate a component of a conference bridge using transcoding techniques of the present invention, in accordance with one embodiment. More particularly, communication system 400 shows how the present invention provides an effective means for transcoding inputted speech signals having been encoded according to a non-parametric coding standard, such as G.711, G.726, and G.728, for example. As shown in FIG. 4, communication system 400 includes channel 404, conference bridge 406 and channel 416. It is appreciated that channels 404 and 416 are respectively equivalent to channels 204 and 216 of communication system 200 illustrated in FIG. 2.
As shown, a speech signal transmitted to conference bridge 406 via channel 404 is decoded by decoder 408 to generate speech samples from the incoming bit-stream. Decoder 408 may also extract speech parameters from the bit-stream to generate the speech parameters in instances where the speech was encoded originally using a parametric standard, such as G.729 or G723.1. However, it is appreciated that non-parametric speech coding standards, for example G.711, G.726 and G.728, typically do not quantize various speech-related parameters, such as the signal pitch and spectrum. As a result, these parameters may not be extracted by decoder 408 directly from the bit-stream during the decoding process. In such instances, as shown in FIG. 4, the speech samples may be diverted to parameter extraction module 410, which extracts the desired speech-related parameters (or the side information) for subsequent use by encoder 414, as described below. Thus, parameter extraction module 410 can be configured to extract data regarding the signal energy, spectral characteristics, pitch and pitch gain, and the like, and to provide such parameters to converter/mixer 412.
The decoded speech samples from decoder 408 and the speech parameters from either decoder 408 or parameter extraction module 410 are inputted into converter/mixer 412. As shown in FIG. 4, converter/mixer 412 also receives speech samples and the speech parameters (or side information) 420 from other decoding devices (not shown). Converter/mixer 412 can be configured to combine and mix the speech samples and the speech parameters from decoder 408 and parameter extraction module 410 with speech samples and speech parameters 420 into a combined bit-stream suitable for use by encoder 414 in the re-encoding process. For example, in order to combine and mix the signals, converter/mixer may resize the frames of the speech samples in order to establish a common frame structure suitable for encoder 414. Converter/mixer 412 can also provide encoder 414 with the speech parameters (or side information) for use in re-encoding the bit-stream.
The combined speech samples and extracted parameters provided by converter/mixer 412 can be used by encoder 414 to re-encode the speech signal according to the coding standard used by the receiving participant (not shown). Accordingly, by using the speech parameters (or side information) provided by converter/mixer 412, encoder 414 bypasses the need for spectral and pitch re-evaluation during the re-encoding process. In this manner, the complexity, processing demands, and time delay associated with such re-evaluation steps are avoided. Following the coding step, the encoded signal is transmitted to the receiving participant via channel 416. Reference is now made to FIG. 5, which illustrates exemplary transcoding method 500 in accordance with one embodiment. It is appreciated that transcoding method 500 can be performed by a transcoder such as transcoder 206 in FIG. 1, for example. As shown, transcoding method 500 begins at step 510 and continues to step 512 where the bit-stream from a first participant is received.
Following, at step 514, a parameter set is extracted from the bit-stream. For example, the parameter set may include the signal energy, spectral characteristics, pitch and pitch gain, and the like. Next, at step 516, the bit-stream is decoded according to the coding scheme used by the first participant and speech samples are generated. For example, the received bit-stream may be encoded according to G.723.1, in which case the bit-stream is decoded at step 516 according to G.723.1
After the speech samples have been generated at step 516, transcoding method 500 proceeds to step 518 where the speech samples and parameter set are converted into a suitable form for re- encoding. The form to which the speech samples and parameter set are converted may depend on the particular coding scheme used by the receiving participant. At step 520, the converted speech samples are re-encoded in accordance with the coding scheme used by the receiving participant, i.e., the second participant in the present example. As such, if the second participant in the present description uses G.729, for example, then the re-encoding performed at step 520 would be done according to G.729. The re-encoding performed at step 520 can utilize the parameter set extracted from the bit-stream at step 516. Therefore, at step 520, re-encoding can be effectively achieved without having to perform, for example, spectral and pitch re-evaluation, since the information is already available. In this manner, transcoding method 500 provides a number of advantages over conventional transcoding approaches, including lower processing needs, minimal delay, and a reduction in overall system complexity.
The methods and systems presented above may reside in software, hardware, or firmware on the device, which can be implemented on a microprocessor, digital signal processor, application specific IC, or field programmable gate array ("FPGA"), or any combination thereof, without departing from the spirit of the invention. Furthermore, the present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive.

Claims

CLAIMS What is claimed is:
1. A speech transcoder capable of transcoding a first bit-stream generated from a speech signal, said speech transcoder comprising: a decoder configured to receive said first bit-stream encoded based on a first coding scheme, wherein said decoder extracts a first plurality speech parameters from said first bit-stream, and wherein said decoder decodes said first bit-stream according to said first coding scheme and generates a plurality of first speech samples; a converter configured to receive said plurality of first speech samples and said plurality of first speech parameters, wherein said converter converts said plurality of first speech samples to a plurality of second speech samples and converts said plurality of first speech parameters to a plurality of second speech parameters for use according to a second coding scheme; and an encoder configured to receive said plurality of second speech samples and said plurality of second speech parameters, wherein said encoder generates a second bit-stream encoded based on said second coding scheme.
2. The transcoder of claim 1, wherein said converter converts a first frame size of said plurality of first speech samples to a second frame size, wherein said encoder uses said second frame size to generate said second bit-stream according to said second coding scheme.
3. The transcoder of claim 1, wherein said converter transmits said plurality of second speech parameters to said encoder to avoid a re-evaluation of parameters by said encoder and thereby to reduce delay.
4. The transcoder of claim 1, wherein said decoder includes a post-filter element, and wherein said post-filter element is disabled.
5. The transcoder of claim 1, wherein said encoder includes a noise suppressor, and wherein said noise suppressor is disabled.
6. The transcoder of claim 1, wherein said plurality of second speech parameters includes at least one parameter relating to an energy of said speech signal.
7. The transcoder of claim 1, wherein said plurality of first speech parameters includes at least one parameter relating to spectral characteristics of said speech signal.
8. The transcoder of claim 1, wherein said plurality of first speech parameters includes at least one parameter relating to a pitch of said speech signal.
9. The transcoder of claim 1, wherein said plurality of first speech parameters includes at least one parameter relating to a pitch gain of said speech signal.
10. The transcoder of claim 1, wherein said converter transmits said plurality of second speech parameters to said encoder to avoid a re-evaluation of parameters by said encoder and thereby reduce degradation of a speech signal generated from said second bit-stream.
11. A method for transcoding a first bit-stream generated from a speech signal, said speech method comprising: extracting a plurality of first speech parameters from said first bit-stream; decoding said first bit-stream according to a first coding scheme to generate a plurality of first speech samples; converting said plurality of first speech samples to a plurality of second speech samples for use according to a second coding scheme; converting said plurality of first speech parameters to a plurality of second speech parameters for use according to a second coding scheme; and encoding said plurality of second speech samples based on said plurality of second speech parameters to generate a second bit-stream encoded based on said second coding scheme.
12. The method of claim 11 further comprising: converting a first frame size of said plurality of first speech samples to a second frame size for use according to said second coding scheme.
13. The method of claim 11, wherein said converting said plurality of first speech parameters to said plurality of second speech parameters is performed to avoid a re-evaluation of parameters during said encoding to reduce delay and complexity.
14. The method of claim 11 further comprising: disabling post-filtering during said decoding.
15. The method of claim 11 further comprising: disabling noise suppression during said encoding.
16. The method of claim 11, wherein said plurality of second speech parameters includes at least one parameter relating to an energy of said speech signal.
17. The method of claim 11, wherein said plurality of first speech parameters includes at least one parameter relating to spectral characteristics of said speech signal.
18. The method of claim 11, wherein said plurality of first speech parameters includes at least one parameter relating to a pitch of said speech signal.
19. The method of claim 11, wherein said plurality of first speech parameters includes at least one parameter relating to a pitch gain of said speech signal.
20. The method of claim 11, wherein said converting said plurality of first speech parameters to said plurality of second speech parameters is performed to avoid a re-evaluation of parameters during said encoding and thereby reduce degradation of a speech signal generated from said second bit-stream.
21. A speech transcoder capable of transcoding a first bit-stream generated from a speech signal, said speech transcoder comprising: a decoder configured to receive said first bit-stream encoded based on a first coding scheme, wherein said decoder decodes said first bit-stream according to said first coding scheme and generates a plurality of first speech samples; a parameter extractor module configured to receive said plurality of first speech samples, wherein said parameter extractor module extracts a first plurality speech parameters from said plurality of first speech samples; a converter/mixer configured to receive said plurality of first speech samples and said plurality of first speech parameters, wherein said converter converts and mixes said plurality of first speech samples to generate a plurality of second speech samples and converts and mixes said plurality of first speech parameters to generate a plurality of second speech parameters for use according to a second coding scheme; and an encoder configured to receive said plurality of second speech samples and said plurality of second speech parameters, wherein said encoder generates a second bit-stream encoded based on said second coding scheme.
22. The transcoder of claim 21, wherein said converter transmits said plurality of second speech parameters to said encoder to avoid a re-evaluation of parameters by said encoder and thereby to reduce delay.
23. The transcoder of claim 21, wherein said decoder includes a post-filter element, and wherein said post-filter element is disabled.
24. The transcoder of claim 21, wherein said encoder includes a noise suppressor, and said noise suppressor is disabled.
25. The transcoder of claim 21, wherein said plurality of second speech parameters includes at least one parameter relating to an energy of said speech signal.
26. The transcoder of claim 21, wherein said plurality of first speech parameters includes at least one parameter relating to spectral characteristics of said speech signal.
27. The transcoder of claim 21, wherein said plurality of first speech parameters includes at least one parameter relating to a pitch of said speech signal.
28. The transcoder of claim 21, wherein said plurality of first speech parameters includes at least one parameter relating to a pitch gain of said speech signal.
29. The transcoder of claim 21, wherein said converter transmits said plurality of second speech parameters to said encoder to avoid a re-evaluation of parameters by said encoder and thereby reduce degradation of a speech signal generated from said second bit-stream.
30. A speech transcoder capable of transcoding a first bit-stream generated from a speech signal, said speech transcoder comprising: a decoder configured to receive said first bit-stream encoded based on a first coding scheme, wherein said decoder decodes said first bit-stream according to said first coding scheme and generates a plurality of first speech samples from said bit-stream; a converter configured to receive said plurality of first speech samples, wherein said converter converts said plurality of first speech samples to a plurality of second speech samples for use according to a second coding scheme; and an encoder configured to receive said plurality of second speech samples, wherein said encoder generates a second bit-stream encoded based on said second coding scheme.
31. The transcoder of claim 30, wherein said converter converts a first frame size of said plurality of first speech samples to a second frame size, wherein said encoder uses said second frame size to generate said second bit-stream according to said second coding scheme.
PCT/US2003/006335 2002-05-13 2003-02-26 Transcoding of speech in a packet network environment WO2003098598A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR10-2004-7017694A KR20040104701A (en) 2002-05-13 2003-02-26 Transcoding of speech in a packet network environment
JP2004506009A JP2005531017A (en) 2002-05-13 2003-02-26 Voice transcoding in packet network environments.
EP03713828A EP1504441A4 (en) 2002-05-13 2003-02-26 Transcoding of speech in a packet network environment
AU2003217859A AU2003217859A1 (en) 2002-05-13 2003-02-26 Transcoding of speech in a packet network environment
IL16514704A IL165147A0 (en) 2002-05-13 2004-11-10 Transcoding of speech in a packet network environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14553302A 2002-05-13 2002-05-13
US10/145,533 2002-05-13

Publications (1)

Publication Number Publication Date
WO2003098598A1 true WO2003098598A1 (en) 2003-11-27

Family

ID=29548267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/006335 WO2003098598A1 (en) 2002-05-13 2003-02-26 Transcoding of speech in a packet network environment

Country Status (7)

Country Link
EP (1) EP1504441A4 (en)
JP (1) JP2005531017A (en)
KR (1) KR20040104701A (en)
CN (1) CN1653515A (en)
AU (1) AU2003217859A1 (en)
IL (1) IL165147A0 (en)
WO (1) WO2003098598A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1544848A2 (en) * 2003-12-18 2005-06-22 Nokia Corporation Audio enhancement in coded domain
JP2008026372A (en) * 2006-07-18 2008-02-07 Kddi Corp Coding rule conversion method and apparatus for coded data
CN100369108C (en) * 2003-12-18 2008-02-13 诺基亚公司 Method and device for audio enhancement in the coding domain
US7613607B2 (en) 2003-12-18 2009-11-03 Nokia Corporation Audio enhancement in coded domain
US8416962B2 (en) 2007-12-28 2013-04-09 Panasonic Corporation Audio mixing/reproducing device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100945245B1 (en) * 2007-08-10 2010-03-03 한국전자통신연구원 Method and apparatus for secure and efficient voice packet partial encryption

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5497396A (en) * 1992-12-30 1996-03-05 Societe Dite Alcatel N.V. Method of transmitting data between communication equipments connected to a communication infrastructure
US5771452A (en) * 1995-10-25 1998-06-23 Northern Telecom Limited System and method for providing cellular communication services using a transcoder
US6006178A (en) * 1995-07-27 1999-12-21 Nec Corporation Speech encoder capable of substantially increasing a codebook size without increasing the number of transmitted bits

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5694519A (en) * 1992-02-18 1997-12-02 Lucent Technologies, Inc. Tunable post-filter for tandem coders
US5995923A (en) * 1997-06-26 1999-11-30 Nortel Networks Corporation Method and apparatus for improving the voice quality of tandemed vocoders
US6260009B1 (en) * 1999-02-12 2001-07-10 Qualcomm Incorporated CELP-based to CELP-based vocoder packet translation
US7006787B1 (en) * 2000-02-14 2006-02-28 Lucent Technologies Inc. Mobile to mobile digital wireless connection having enhanced voice quality

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5497396A (en) * 1992-12-30 1996-03-05 Societe Dite Alcatel N.V. Method of transmitting data between communication equipments connected to a communication infrastructure
US6006178A (en) * 1995-07-27 1999-12-21 Nec Corporation Speech encoder capable of substantially increasing a codebook size without increasing the number of transmitted bits
US5771452A (en) * 1995-10-25 1998-06-23 Northern Telecom Limited System and method for providing cellular communication services using a transcoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1504441A4 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1544848A2 (en) * 2003-12-18 2005-06-22 Nokia Corporation Audio enhancement in coded domain
EP1544848A3 (en) * 2003-12-18 2005-09-21 Nokia Corporation Audio enhancement in coded domain
CN100369108C (en) * 2003-12-18 2008-02-13 诺基亚公司 Method and device for audio enhancement in the coding domain
US7613607B2 (en) 2003-12-18 2009-11-03 Nokia Corporation Audio enhancement in coded domain
JP2008026372A (en) * 2006-07-18 2008-02-07 Kddi Corp Coding rule conversion method and apparatus for coded data
JP4721355B2 (en) * 2006-07-18 2011-07-13 Kddi株式会社 Coding rule conversion method and apparatus for coded data
US8416962B2 (en) 2007-12-28 2013-04-09 Panasonic Corporation Audio mixing/reproducing device

Also Published As

Publication number Publication date
AU2003217859A1 (en) 2003-12-02
EP1504441A4 (en) 2005-12-14
IL165147A0 (en) 2005-12-18
CN1653515A (en) 2005-08-10
EP1504441A1 (en) 2005-02-09
JP2005531017A (en) 2005-10-13
KR20040104701A (en) 2004-12-10

Similar Documents

Publication Publication Date Title
US6463414B1 (en) Conference bridge processing of speech in a packet network environment
CN1326415C (en) Method for conducting code conversion to audio-frequency signals code converter, network unit, wivefree communication network and communication system
CN101427551B (en) System and method of conferencing endpoints
US8271026B2 (en) Mobile communication device providing N-way communication through a plurality of communication services
USRE47345E1 (en) Apparatus and method for converting control information
JP2003521132A (en) Method and apparatus for voice packet communication
US7522586B2 (en) Method and system for tunneling wideband telephony through the PSTN
EP1504441A1 (en) Transcoding of speech in a packet network environment
KR100917546B1 (en) Method and apparatus for voice transcoding in a voip environment
US7715365B2 (en) Vocoder and communication method using the same
US20030013465A1 (en) System and method for pseudo-tunneling voice transmissions
EP1083762A1 (en) Mobile telecommunication terminal with a codec and additional decoders
US7460671B1 (en) Encryption processing apparatus and method for voice over packet networks
Falsafi High Definition Voice Rollout will Benefit all Mobile Users
RU2283545C2 (en) Method and device for permitting discrepancy in communication networks umts
WO2001024549A2 (en) Voice over pcm technique providing compatibility between pcm data and lower rate vocoder data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004506009

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1020047017694

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2003713828

Country of ref document: EP

Ref document number: 2003810962X

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 1020047017694

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2003713828

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2003713828

Country of ref document: EP