US6704707B2 - Method for automatically and dynamically switching between speech technologies - Google Patents
Method for automatically and dynamically switching between speech technologies Download PDFInfo
- Publication number
- US6704707B2 US6704707B2 US09/808,699 US80869901A US6704707B2 US 6704707 B2 US6704707 B2 US 6704707B2 US 80869901 A US80869901 A US 80869901A US 6704707 B2 US6704707 B2 US 6704707B2
- Authority
- US
- United States
- Prior art keywords
- recognition
- switch
- request
- speech
- configuring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime, expires
Links
- 238000005516 engineering process Methods 0.000 title claims abstract description 80
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000003139 buffering effect Effects 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 claims 5
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- This disclosure relates to speech recognition technology, more particularly for methods to utilize multiple speech technologies.
- Speech recognition technology has become more prevalent in recent times. Beyond dictation machines and speech-to-text conversion, speech can be used to navigate and give commands through several different types of systems. It is particularly attractive to highly mobile users, who may want to make travel reservations, leave messages, access e-mail and perform other tasks using any available phone. Being able to navigate these types of systems using voice commands, as well as being able to dictate text messages for electronic mail systems is very attractive.
- speech recognition technology or ‘recognizer’ will be used to describe the software that carries out the conversion of digital audio to text.
- Capabilities will refer to the type of speech that the technology is capable of recognizing. This might be a very small domain, such as digits only, or a very large one, such as a large vocabulary dictation system. Performance may be measured along a number of orthogonal axes including the accuracy of conversion from speech to text, the resource requirement of the conversion process, the latency of conversion and other factors. Note that in addition to items such as computation, memory, etc., resource requirements may include items such as licenses for the recognizer used.
- FIG. 1 shows a flowchart of one embodiment of a method to switch between multiple speech technologies, in accordance with the invention.
- FIG. 2 shows a flowchart of an alternative embodiment of a method to switch between multiple speech technologies, in accordance with the invention.
- FIG. 3 shows a block diagram of one embodiment of an architecture for a speech recognition switching circuit, in accordance with the invention.
- Speech recognition technology can be grouped into three different types, with the understanding that these types are not to be considered all encompassing or intended to limit application of the invention in any way.
- Small-domain speech recognition obtains high accuracy in a small domain of possible speech objects, such as letters and digits. It is relatively inexpensive, but has very limited application.
- An example of this type of speech recognition is found in Nuance ExpressTM, from Nuance Communications, Inc.
- Grammar-based speech recognition technology generally costs more, both in computational resources and in licensing costs. It supports speech recognition of phrases defined in a standard grammar but does not typically support dictation. It is optimized for high throughput so that systems can handle many simultaneous users.
- SpeechWorks 6TM by SpeechWorks, Inc.
- a company deploying a telephony-based voice portal currently must use a grammar-based recognition technology for their entire system. This limits the functions that the system can perform, eliminating the possibility of dictating email, for example. Additionally, in many places, a less capable recognition technology would be sufficient to implement the voice portal. This can result in extra expense, both in terms of licensing costs and computational requirements.
- a voice system that allows users to navigate through voice mail messages and dictate e-mails.
- login and PIN-based user authentication will require only letter and digit recognition.
- Using grammar-based speech recognition technology as described above for these simpler tasks is a waste of capabilities and more expensive than it needs to be.
- the approach also provides less accuracy than would be possible using a more domain specific recognition technology.
- FIG. 1 shows a flowchart for one embodiment of a method to employ multiple speech recognition technologies and switch between them as necessary.
- the recognition request is received.
- control information either within the recognition request or in a separate path, as will be discussed later.
- the recognition characteristics are determined at 12 using the control information as well as other aspects of the request.
- Recognition characteristics are those aspects of the recognition request that can be used to direct and manage the request.
- the characteristics identify the type of task desired by the user, such as dictation, alphanumeric conversion, or grammar-based command and control.
- the system uses the characteristics to match the request to the recognizer with the appropriate attributes. As will be discussed further, this may also be affected by the current system load balance.
- Recognition characteristics include the type, such as alphanumeric digits, command/control grammars, freeform entry; priority, such as the importance of the task or the service level of the user; and control information for the recognition engines, such as grammars, language models, dictionaries, etc.
- a switch that will route the request to a particular recognition engine and return the results to the requester will then be configured at 14 . Finally, the speech recognition technology residing in the selected recognition engine will be applied at 16 and the results returned to the requester.
- a user accesses a voice-activated system using a telephone.
- the user may log in using a spoken numeric user id and personal identification number (PIN).
- PIN personal identification number
- the small-domain speech recognition technology may be used for this task.
- the system may switch the input stream of that recognition request to a higher-level speech recognition technology, for example, the grammar-based technologies mentioned above. This may allow the user to navigate through the site using specific command words and phrases. If the user's commands indicate a desire to dictate an e-mail or some other sort of dictation task, the recognition engine for dictation would then be applied.
- the control signals may be generated from the system as the characteristics of the incoming speech changes.
- This embodiment of the invention provides the speech system with expanded functionality, in this case the ability to use dictation technology within the framework traditionally utilizing only a grammar-based speech technology.
- the speech recognition system may have system load balancing features.
- a speech recognition system having two different types of speech recognition engines is assumed. The first offers moderate accuracy with low computation requirements, and the second offers high accuracy with substantial computational requirements. Both target the same domain, offering the same capabilities at substantially different performance levels.
- a single server may handle dozens of concurrent users for the first technology, but only a few for the second.
- the first customers on an idle system such as the one described in this example may be served using the more accurate technology.
- new users would be directed to the less accurate technology.
- users accessing the more accurate technology may be migrated to the less expensive and less computationally intensive technology to increase system capacity.
- users may have a priority assigned to them, based upon the service level subscribed to by the user or some other factor, such as the importance or potential value of the task at hand. This priority can also be used to determine which users are assigned to which technologies.
- less accurate technologies could also include less accurate levels of the same type, such as small domain, grammar-based, or dictation.
- Dragon Systems has several different levels of complexity in their Dragon DictateTM products, such as a professional package as well as variations targeted to the terminology-intensive areas of medical and legal dictation. Users in the above example of a higher level of dictation technology may also be migrated to other dictation technologies that are not as accurate or do not have as many features as their currently assigned technology.
- Additional factors may affect the configuration of the switch and the subsequent application of speech technology.
- the characteristics of the recognizer engine employing a level of speech technology may be used. These characteristics include capability information, performance information, accuracy and cost information.
- Cost information may include the licensing costs, as will be discussed below.
- FIG. 2 A flow chart of one embodiment similar to the scenario discussed above is shown in FIG. 2 .
- new processes are inserted into a flow similar to that shown in FIG. 1 .
- the input audio stream may be buffered to allow smooth migration from one technology or server to another, as well as the handling of several requests received simultaneously. Requests not being processed immediately would be buffered until the system could acknowledge that request.
- the buffer allows the speech-recognition engines to be time-shared. As an example, buffering is shown being accomplished just after the reception of the request.
- FIG. 2 The placement of these additional processes in FIG. 2 are only intended as an example and are not intended to limit application of the invention.
- this embodiment In addition to utilizing recognition characteristics to configure the switch, this embodiment also monitors resource constraints and system load levels at 22 and configures the switch accordingly at 14 .
- An alternative embodiment may monitor the resources constraints and/or system load levels at 22 on an ongoing basis, perhaps in a separate process independent of the arrival of recognition requests. If this process finds load or resource balancing issues, the switch may be reconfigured to route existing audio streams to use different speech technologies. The resource constraints of the system in conjunction with the current load will determine the course of action taken by the system. At least one of these factors will be used in configuring the switch.
- the resource constraints monitored include such things a computational intensity and accuracy, as mentioned above. Also, cost of licensing a particular technology may also be considered a resource constraint. Two different types of speech recognition technologies may have comparable computational intensity and accuracy, but one may cost more per use than the other. If both technologies were available, it may be more desirable for the system to use the less expensive technology, using cost as a resource constraint.
- the load balancing may switch a user whose audio stream is already being processed to another technology, referred to as an acknowledged stream.
- the act of ‘receiving a recognition request’ may mean that the system is receiving requests generated for the new data in the acknowledged stream, rather than an audio stream from a new user.
- a user's priority code might also be used to determine to what technology a user is initially assigned. After the system becomes loaded and rebalancing is required, the user's priority may also be used to determine to what recognizer the user's stream is switched, if any.
- the input audio stream 30 enters the system at the input switch 34 .
- buffer 32 may buffer it.
- Control signals accompanying the input audio stream are received by a director module 38 , which has the responsibility of setting the input switch 34 to direct the audio stream to the appropriate recognition engine.
- the recognition engines 36 a through 36 n will have engines for at least two different speech technologies. There may also be redundant engines for each technology. The recognition engines may pass a status signal back to the director, which may in turn provide status of each recognition session to other parts of the system or to the user.
- the output text is returned to the other parts of the system, such as to an e-mail manager as an example.
- the output text may be handled by an output switch 40 , which, in cooperation with the input switch 32 , facilitates multitasking by the recognition engines to handle more than one input audio stream at a time.
- the director module 38 is responsible for setting the output switch 40 appropriately to route the results from the correct recognizer to the other parts of the system.
- the switches 34 and 40 are shown in the diagram of FIG. 3 as being separate switches. In practice, these may actually be the same switch. In that case, the functionalities of taking multiple speech requests and routing them to the appropriate technology and taking multiple output streams and routing them to the appropriate user would be accomplished on the same device.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
A method for switching between speech recognition technologies. The method includes reception of an initial recognition request accompanied by control information. Recognition characteristics are determined using the control information and then a switch is configured based upon the particular characteristic. Alternatively, the switch may be configured based upon system load levels and resource constraints.
Description
1. Field
This disclosure relates to speech recognition technology, more particularly for methods to utilize multiple speech technologies.
2. Background
Speech recognition technology has become more prevalent in recent times. Beyond dictation machines and speech-to-text conversion, speech can be used to navigate and give commands through several different types of systems. It is particularly attractive to highly mobile users, who may want to make travel reservations, leave messages, access e-mail and perform other tasks using any available phone. Being able to navigate these types of systems using voice commands, as well as being able to dictate text messages for electronic mail systems is very attractive.
Throughout this document, the terms ‘speech recognition technology’, or ‘recognizer’ will be used to describe the software that carries out the conversion of digital audio to text. The terms ‘speech recognition systems’ and ‘system’ will refer to systems that incorporate one or more speech recognition technology or recognizer components. In the discussions that follow, differentiation will be made between speech recognition technologies based on capabilities and performance.
Capabilities, as used here, will refer to the type of speech that the technology is capable of recognizing. This might be a very small domain, such as digits only, or a very large one, such as a large vocabulary dictation system. Performance may be measured along a number of orthogonal axes including the accuracy of conversion from speech to text, the resource requirement of the conversion process, the latency of conversion and other factors. Note that in addition to items such as computation, memory, etc., resource requirements may include items such as licenses for the recognizer used.
Current speech recognition systems typically use one type of speech technology. The system designers must select a technology based on their required system capabilities and performance and target technology capabilities, costs and performance. Inexpensive, lower capability technologies provide high accuracy in only a limited range of capabilities, but do not require large resource commitments. Midrange technologies have increased capabilities with a commensurate increased resource requirement. For speech recognition tasks that only need the lower capabilities, the system can bog down unnecessarily if higher capability recognition technologies are used, providing unneeded features. Higher requirement tasks, such as dictation, cannot obtain the desired accuracy with the lower capability technologies. High capability, high performance technologies would have the highest accuracy and widest range of tasks it can complete but may be too expensive to implement system-wide, or may require too high a level of resources for some enterprises.
Therefore, some method of using several different kinds of speech recognition technology in one system would seem helpful, as would ways to manage the utilization of these different technologies.
The invention may be best understood by reading the disclosure with reference to the drawings, wherein:
FIG. 1 shows a flowchart of one embodiment of a method to switch between multiple speech technologies, in accordance with the invention.
FIG. 2 shows a flowchart of an alternative embodiment of a method to switch between multiple speech technologies, in accordance with the invention.
FIG. 3 shows a block diagram of one embodiment of an architecture for a speech recognition switching circuit, in accordance with the invention.
As speech recognition systems and technologies evolve, demand for more tailored speech systems will increase. Complex speech systems often perform numerous tasks with differing capability and performance requirements. For example, a first user may only require simple voice mail retrieval, a task with very low capability requirements as the user will only need recognition of numerical choices from predefined menus. At the same time, a second user may want to dictate an e-mail using the speech system. These are two tasks that must be handled by the same speech system.
Currently, most speech systems use one type of speech recognition technology. This can result in a mismatch between the desired performance and cost and the actual performance and cost. Systems may be low cost but not have the higher level of capabilities and performance desired for some tasks. Alternatively, the system may have higher capabilities and performance than desired for some tasks, and will be less cost-effective than a system more closely matched to those particular tasks. A designer is forced either to sacrifice features and/or performance to fit in a low cost envelope, or to select a premium technology that is not cost-effective for many of the tasks.
Generally, speech recognition technology can be grouped into three different types, with the understanding that these types are not to be considered all encompassing or intended to limit application of the invention in any way. Small-domain speech recognition obtains high accuracy in a small domain of possible speech objects, such as letters and digits. It is relatively inexpensive, but has very limited application. An example of this type of speech recognition is found in Nuance Express™, from Nuance Communications, Inc.
Grammar-based speech recognition technology generally costs more, both in computational resources and in licensing costs. It supports speech recognition of phrases defined in a standard grammar but does not typically support dictation. It is optimized for high throughput so that systems can handle many simultaneous users. One such example can be found in SpeechWorks 6™, by SpeechWorks, Inc.
The most computationally intensive and most expensive speech recognition technology type is dictation. Currently, users of speaker-dependent recognition technologies must spend time ‘training’ their dictation packages to their voices and can then dictate directly into their computers. Speaker independent recognition technologies perform reasonably for a wide spectrum of users without this additional user burden. These packages also may have the capability of learning as the speakers pronounce new words and correct mistakes in the translation. Examples of this type of technology include ViaVioce™ by IBM, and DragonDictate™ by Dragon Systems, Inc. These examples are speaker-dependent technologies. There are currently no commercially available speaker-independent dictation-targeted recognizers, though there are a wide variety of academic and commercial research systems.
In order to provide a cost-effective solution, a company deploying a telephony-based voice portal currently must use a grammar-based recognition technology for their entire system. This limits the functions that the system can perform, eliminating the possibility of dictating email, for example. Additionally, in many places, a less capable recognition technology would be sufficient to implement the voice portal. This can result in extra expense, both in terms of licensing costs and computational requirements.
For example, there may be desire for a voice system that allows users to navigate through voice mail messages and dictate e-mails. As the user first enters the system, login and PIN-based user authentication will require only letter and digit recognition. Using grammar-based speech recognition technology as described above for these simpler tasks is a waste of capabilities and more expensive than it needs to be. The approach also provides less accuracy than would be possible using a more domain specific recognition technology.
Navigation of email can be performed effectively using the grammar based speech recognition technology. Tasks such as email dictation are simply not possible using grammar-based technologies. It must be noted that the description using these particular terms for speech recognition technology is only intended as an example and is in no way intended to limit application of the invention.
FIG. 1 shows a flowchart for one embodiment of a method to employ multiple speech recognition technologies and switch between them as necessary. At 10, the recognition request is received. Accompanying the recognition request will be control information, either within the recognition request or in a separate path, as will be discussed later. Once the request is received, the recognition characteristics are determined at 12 using the control information as well as other aspects of the request.
Recognition characteristics are those aspects of the recognition request that can be used to direct and manage the request. The characteristics identify the type of task desired by the user, such as dictation, alphanumeric conversion, or grammar-based command and control. The system uses the characteristics to match the request to the recognizer with the appropriate attributes. As will be discussed further, this may also be affected by the current system load balance. Recognition characteristics include the type, such as alphanumeric digits, command/control grammars, freeform entry; priority, such as the importance of the task or the service level of the user; and control information for the recognition engines, such as grammars, language models, dictionaries, etc. A switch that will route the request to a particular recognition engine and return the results to the requester will then be configured at 14. Finally, the speech recognition technology residing in the selected recognition engine will be applied at 16 and the results returned to the requester.
For purposes of discussion, and not with any intention of limiting application of the invention, assume a user accesses a voice-activated system using a telephone. The user may log in using a spoken numeric user id and personal identification number (PIN). The small-domain speech recognition technology may be used for this task. Once the user has logged in, the system may switch the input stream of that recognition request to a higher-level speech recognition technology, for example, the grammar-based technologies mentioned above. This may allow the user to navigate through the site using specific command words and phrases. If the user's commands indicate a desire to dictate an e-mail or some other sort of dictation task, the recognition engine for dictation would then be applied. The control signals may be generated from the system as the characteristics of the incoming speech changes.
This embodiment of the invention provides the speech system with expanded functionality, in this case the ability to use dictation technology within the framework traditionally utilizing only a grammar-based speech technology.
Alternative embodiments of this method may address other issues. For example, the speech recognition system may have system load balancing features. As an example, a speech recognition system having two different types of speech recognition engines is assumed. The first offers moderate accuracy with low computation requirements, and the second offers high accuracy with substantial computational requirements. Both target the same domain, offering the same capabilities at substantially different performance levels. A single server may handle dozens of concurrent users for the first technology, but only a few for the second.
The first customers on an idle system such as the one described in this example may be served using the more accurate technology. As the number of concurrent users increases past a given threshold, new users would be directed to the less accurate technology. If the number of users reaches a second threshold, users accessing the more accurate technology may be migrated to the less expensive and less computationally intensive technology to increase system capacity. Additionally, users may have a priority assigned to them, based upon the service level subscribed to by the user or some other factor, such as the importance or potential value of the task at hand. This priority can also be used to determine which users are assigned to which technologies.
It must be noted that less accurate technologies could also include less accurate levels of the same type, such as small domain, grammar-based, or dictation. For example, Dragon Systems has several different levels of complexity in their Dragon Dictate™ products, such as a professional package as well as variations targeted to the terminology-intensive areas of medical and legal dictation. Users in the above example of a higher level of dictation technology may also be migrated to other dictation technologies that are not as accurate or do not have as many features as their currently assigned technology.
Additional factors may affect the configuration of the switch and the subsequent application of speech technology. For example, the characteristics of the recognizer engine employing a level of speech technology may be used. These characteristics include capability information, performance information, accuracy and cost information. Cost information may include the licensing costs, as will be discussed below.
A flow chart of one embodiment similar to the scenario discussed above is shown in FIG. 2. In this example, new processes are inserted into a flow similar to that shown in FIG. 1. For example, the input audio stream may be buffered to allow smooth migration from one technology or server to another, as well as the handling of several requests received simultaneously. Requests not being processed immediately would be buffered until the system could acknowledge that request. Additionally, the buffer allows the speech-recognition engines to be time-shared. As an example, buffering is shown being accomplished just after the reception of the request. The placement of these additional processes in FIG. 2 are only intended as an example and are not intended to limit application of the invention.
In addition to utilizing recognition characteristics to configure the switch, this embodiment also monitors resource constraints and system load levels at 22 and configures the switch accordingly at 14. An alternative embodiment may monitor the resources constraints and/or system load levels at 22 on an ongoing basis, perhaps in a separate process independent of the arrival of recognition requests. If this process finds load or resource balancing issues, the switch may be reconfigured to route existing audio streams to use different speech technologies. The resource constraints of the system in conjunction with the current load will determine the course of action taken by the system. At least one of these factors will be used in configuring the switch.
The resource constraints monitored include such things a computational intensity and accuracy, as mentioned above. Also, cost of licensing a particular technology may also be considered a resource constraint. Two different types of speech recognition technologies may have comparable computational intensity and accuracy, but one may cost more per use than the other. If both technologies were available, it may be more desirable for the system to use the less expensive technology, using cost as a resource constraint.
Additionally, as mentioned above, the load balancing may switch a user whose audio stream is already being processed to another technology, referred to as an acknowledged stream. In this case, the act of ‘receiving a recognition request’ may mean that the system is receiving requests generated for the new data in the acknowledged stream, rather than an audio stream from a new user. The above examples are merely demonstrations of application of the invention and are not intended to limit the scope of the invention.
A user's priority code, mentioned above, might also be used to determine to what technology a user is initially assigned. After the system becomes loaded and rebalancing is required, the user's priority may also be used to determine to what recognizer the user's stream is switched, if any.
These tasks will likely be handled by software loaded onto a general system architecture having some sort of processor, such as a digital signal processor, general use central processing unit or controller, all of which would be included in the term processor. In this embodiment of the invention, the method of the invention would be implemented on some sort of article having stored software instructions that, when executed by a processor, would result in the tasks of the invention.
An example of such a system is shown in FIG. 3. The input audio stream 30 enters the system at the input switch 34. As mentioned above, buffer 32 may buffer it. Control signals accompanying the input audio stream are received by a director module 38, which has the responsibility of setting the input switch 34 to direct the audio stream to the appropriate recognition engine. The recognition engines 36 a through 36 n will have engines for at least two different speech technologies. There may also be redundant engines for each technology. The recognition engines may pass a status signal back to the director, which may in turn provide status of each recognition session to other parts of the system or to the user.
Once the selected recognition engine processes a given input audio signal, the output text is returned to the other parts of the system, such as to an e-mail manager as an example. The output text may be handled by an output switch 40, which, in cooperation with the input switch 32, facilitates multitasking by the recognition engines to handle more than one input audio stream at a time. The director module 38 is responsible for setting the output switch 40 appropriately to route the results from the correct recognizer to the other parts of the system.
The switches 34 and 40 are shown in the diagram of FIG. 3 as being separate switches. In practice, these may actually be the same switch. In that case, the functionalities of taking multiple speech requests and routing them to the appropriate technology and taking multiple output streams and routing them to the appropriate user would be accomplished on the same device.
In this manner, several speech technologies may be used in one speech recognition system. This has advantages for the users, as mentioned above, as well as for system management.
Thus, although there has been described to this point a particular embodiment for a method and apparatus for using multiple speech technologies in one speech recognition system, it is not intended that such specific references be considered as limitations upon the scope of this invention except in-so-far as set forth in the following claims.
Claims (30)
1. A method for switching between speech recognition technologies, the method comprising:
a) receiving a recognition request, wherein the recognition request is accompanied by control information;
b) determining recognition characteristics of the recognition request using the control information;
c) configuring a switch based upon the recognition characteristics; and
d) applying a speech recognition technology determined by the configuration of the switch.
2. The method of claim 1 , the method further comprising monitoring at least one of resource constraints and load levels and configuring a switch is also based upon at least one of the resource constraints and load levels.
3. The method of claim 2 , the method further comprising reconfiguring the switch as necessary based upon at least one of the resource constraints and load levels after the initial application of a speech technology.
4. The method of claim 3 , wherein the method further comprises buffering signals from the recognition request and voice data prior to configuring the switch.
5. The method of claim 1 , wherein the method is applied to several recognition requests received by the system at substantially the same time.
6. The method of claim 1 , wherein the control information includes a priority rating usable to determine the speech technology to apply.
7. The method of claim 1 , wherein the request is from a new audio stream.
8. The method of claim 1 , wherein the request is from new data in an acknowledged audio stream.
9. The method of claim 1 , wherein configuring a switch is also based upon recognizer characteristics.
10. An article containing software instructions that, when executed, result in:
a) reception of a recognition request, wherein the recognition request is accompanied by control information;
b) determination of recognition characteristics of the recognition request using the control information;
c) configuration of a switch based upon the recognition characteristics; and
d) application of a speech technology determined by the configuration of the switch.
11. The article of claim 10 , the article further including instructions that, when executed, result in monitoring at least one of resource constraints and load levels and configuring a switched based upon at least one of the resource constraints and load levels.
12. The article of claim 10 , wherein the instructions are executed for several recognition requests received by the system at substantially the same time.
13. The article of claim 10 , wherein the control information includes a priority rating usable to determine the speech technology to apply.
14. The article of claim 10 , the article further including instructions that, when executed, result in reconfiguration of the switch as necessary based upon at least one of resource constraints and load levels after a speech technology has been applied.
15. The article of claim 10 , wherein the article further includes instructions that, when executed, results in buffering of signals from the recognition request and voice data prior to reconfiguring the switch.
16. A speech recognition manager, comprising:
a) a control module operable to receive input audio streams and indicate recognition characteristics of the input audio streams;
b) a director operable to receive control information accompanying the input audio stream;
c) a switch operable to direct the input audio streams in accordance with configuration instructions from the director; and
d) at least two recognition engines operable to receive an input audio stream from the switch when the switch is configured to direct the stream to that recognition engine, and to perform speech conversion.
17. The manager of claim 16 wherein the manager further comprises an output switch operable to select between at least two output streams from the recognition engines.
18. The manager of claim 16 , wherein the at least two recognition engines further comprises an array of recognition engines.
19. A method for switching between speech recognition technologies, the method comprising:
a) receiving a recognition request;
b) monitoring at least one of resource constraints and system load levels;
c) configuring a switch based upon at least one of the resource constraints and system load levels; and
d) applying a speech recognition technology determined by the configuration of the switch.
20. The method of claim 19 , wherein the request is from a new audio stream.
21. The method of claim 20 , wherein a priority code accompanying the request is used in configuring the switch.
22. The method of claim 19 , wherein monitoring at least one of the resource constraints and system load levels is an ongoing process.
23. The method of claim 19 , wherein the method is applied to several recognition requests received by the system at substantially the same time.
24. The method of claim 19 , wherein the method further comprises buffering signals from the recognition request and voice data prior to configuring the switch.
25. The method of claim 19 , wherein the request is from new data in an acknowledged audio stream.
26. The method of claim 21 , wherein a priority code accompanying the request is used in configuring the switch.
27. The method of claim 19 , wherein configuring the switch is also based upon recognizer characteristics.
28. An article containing software instructions that, when executed result in:
a) reception of a recognition request;
b) monitoring at least one of resource constraints and load levels;
c) configuring a switched based upon at least one of the resource constraints and system load levels; and
d) applying a speech recognition technology determined by the configuration of the switch.
29. The article of claim 28 , wherein the request is from a new audio stream.
30. The article of claim 28 , wherein the request is from new data in an acknowledged audio stream.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/808,699 US6704707B2 (en) | 2001-03-14 | 2001-03-14 | Method for automatically and dynamically switching between speech technologies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/808,699 US6704707B2 (en) | 2001-03-14 | 2001-03-14 | Method for automatically and dynamically switching between speech technologies |
Publications (2)
Publication Number | Publication Date |
---|---|
US20020133337A1 US20020133337A1 (en) | 2002-09-19 |
US6704707B2 true US6704707B2 (en) | 2004-03-09 |
Family
ID=25199460
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/808,699 Expired - Lifetime US6704707B2 (en) | 2001-03-14 | 2001-03-14 | Method for automatically and dynamically switching between speech technologies |
Country Status (1)
Country | Link |
---|---|
US (1) | US6704707B2 (en) |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020107816A1 (en) * | 2000-12-05 | 2002-08-08 | James Craig | Method and system for securely recording a verbal transaction |
US20030139925A1 (en) * | 2001-12-31 | 2003-07-24 | Intel Corporation | Automating tuning of speech recognition systems |
US20050274797A1 (en) * | 2002-03-12 | 2005-12-15 | Cassandra Mollett | Systems and methods for determining an authorization |
US20050288935A1 (en) * | 2004-06-28 | 2005-12-29 | Yun-Wen Lee | Integrated dialogue system and method thereof |
US7082392B1 (en) * | 2000-02-22 | 2006-07-25 | International Business Machines Corporation | Management of speech technology modules in an interactive voice response system |
US7191941B1 (en) | 2002-03-12 | 2007-03-20 | First Data Corporation | Systems and methods for determining a need for authorization |
US20070162282A1 (en) * | 2006-01-09 | 2007-07-12 | Gilad Odinak | System and method for performing distributed speech recognition |
US20080221889A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile content search environment speech processing facility |
US20080221897A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile environment speech processing facility |
US20080243515A1 (en) * | 2007-03-29 | 2008-10-02 | Gilad Odinak | System and method for providing an automated call center inline architecture |
US20090030685A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using speech recognition results based on an unstructured language model with a navigation system |
US20090030687A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Adapting an unstructured language model speech recognition system based on usage |
US20090030697A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using contextual information for delivering results generated from a speech recognition facility using an unstructured language model |
US20090030684A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using speech recognition results based on an unstructured language model in a mobile communication facility application |
US20090030696A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility |
US20090030691A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using an unstructured language model associated with an application of a mobile communication facility |
US20090030688A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Tagging speech recognition results based on an unstructured language model for use in a mobile communication facility application |
US20100106497A1 (en) * | 2007-03-07 | 2010-04-29 | Phillips Michael S | Internal and external speech recognition use with a mobile communication facility |
US20100185448A1 (en) * | 2007-03-07 | 2010-07-22 | Meisel William S | Dealing with switch latency in speech recognition |
US7769638B1 (en) | 2002-03-12 | 2010-08-03 | First Data Corporation | Systems and methods for verifying authorization for electronic commerce |
US20110054897A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Transmitting signal quality information in mobile dictation application |
US20110054898A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Multiple web-based content search user interface in mobile search application |
US20110054896A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Sending a communications header with voice recording to send metadata for use in speech recognition and formatting in mobile dictation application |
US20110054899A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Command and control utilizing content information in a mobile voice-to-speech application |
US20110054895A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Utilizing user transmitted text to improve language model in mobile dictation application |
US20110060587A1 (en) * | 2007-03-07 | 2011-03-10 | Phillips Michael S | Command and control utilizing ancillary information in a mobile voice-to-speech application |
US20110066634A1 (en) * | 2007-03-07 | 2011-03-17 | Phillips Michael S | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search in mobile search application |
CN102014278A (en) * | 2010-12-21 | 2011-04-13 | 四川大学 | Intelligent video monitoring method based on voice recognition technology |
US20110137648A1 (en) * | 2009-12-04 | 2011-06-09 | At&T Intellectual Property I, L.P. | System and method for improved automatic speech recognition performance |
US8139755B2 (en) | 2007-03-27 | 2012-03-20 | Convergys Cmg Utah, Inc. | System and method for the automatic selection of interfaces |
US8386250B2 (en) | 2010-05-19 | 2013-02-26 | Google Inc. | Disambiguation of contact information using historical data |
US8949266B2 (en) | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Multiple web-based content category searching in mobile search application |
US20150120287A1 (en) * | 2013-10-28 | 2015-04-30 | At&T Intellectual Property I, L.P. | System and method for managing models for embedded speech and language processing |
US9666188B2 (en) | 2013-10-29 | 2017-05-30 | Nuance Communications, Inc. | System and method of performing automatic speech recognition using local private data |
US10056077B2 (en) | 2007-03-07 | 2018-08-21 | Nuance Communications, Inc. | Using speech recognition results based on an unstructured language model with a music system |
US11363128B2 (en) | 2013-07-23 | 2022-06-14 | Google Technology Holdings LLC | Method and device for audio input routing |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7363228B2 (en) * | 2003-09-18 | 2008-04-22 | Interactive Intelligence, Inc. | Speech recognition system and method |
US20050288926A1 (en) * | 2004-06-25 | 2005-12-29 | Benco David S | Network support for wireless e-mail using speech-to-text conversion |
KR101211796B1 (en) * | 2009-12-16 | 2012-12-13 | 포항공과대학교 산학협력단 | Apparatus for foreign language learning and method for providing foreign language learning service |
US8983836B2 (en) | 2012-09-26 | 2015-03-17 | International Business Machines Corporation | Captioning using socially derived acoustic profiles |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6014624A (en) * | 1997-04-18 | 2000-01-11 | Nynex Science And Technology, Inc. | Method and apparatus for transitioning from one voice recognition system to another |
US6094476A (en) * | 1997-03-24 | 2000-07-25 | Octel Communications Corporation | Speech-responsive voice messaging system and method |
-
2001
- 2001-03-14 US US09/808,699 patent/US6704707B2/en not_active Expired - Lifetime
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6094476A (en) * | 1997-03-24 | 2000-07-25 | Octel Communications Corporation | Speech-responsive voice messaging system and method |
US6377662B1 (en) * | 1997-03-24 | 2002-04-23 | Avaya Technology Corp. | Speech-responsive voice messaging system and method |
US6385304B1 (en) * | 1997-03-24 | 2002-05-07 | Avaya Technology Corp. | Speech-responsive voice messaging system and method |
US6522726B1 (en) * | 1997-03-24 | 2003-02-18 | Avaya Technology Corp. | Speech-responsive voice messaging system and method |
US6539078B1 (en) * | 1997-03-24 | 2003-03-25 | Avaya Technology Corporation | Speech-responsive voice messaging system and method |
US6014624A (en) * | 1997-04-18 | 2000-01-11 | Nynex Science And Technology, Inc. | Method and apparatus for transitioning from one voice recognition system to another |
Cited By (71)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7082392B1 (en) * | 2000-02-22 | 2006-07-25 | International Business Machines Corporation | Management of speech technology modules in an interactive voice response system |
US20020107816A1 (en) * | 2000-12-05 | 2002-08-08 | James Craig | Method and system for securely recording a verbal transaction |
US6928421B2 (en) * | 2000-12-05 | 2005-08-09 | Diaphonics, Inc. | Method and system for securely recording a verbal transaction |
US20030139925A1 (en) * | 2001-12-31 | 2003-07-24 | Intel Corporation | Automating tuning of speech recognition systems |
US7203644B2 (en) * | 2001-12-31 | 2007-04-10 | Intel Corporation | Automating tuning of speech recognition systems |
US20050274797A1 (en) * | 2002-03-12 | 2005-12-15 | Cassandra Mollett | Systems and methods for determining an authorization |
US7182255B2 (en) * | 2002-03-12 | 2007-02-27 | First Data Corporation | Systems and methods for determining an authorization |
US7191941B1 (en) | 2002-03-12 | 2007-03-20 | First Data Corporation | Systems and methods for determining a need for authorization |
US7769638B1 (en) | 2002-03-12 | 2010-08-03 | First Data Corporation | Systems and methods for verifying authorization for electronic commerce |
US8473351B1 (en) | 2002-03-12 | 2013-06-25 | First Data Corporation | Systems and methods for verifying authorization |
US20050288935A1 (en) * | 2004-06-28 | 2005-12-29 | Yun-Wen Lee | Integrated dialogue system and method thereof |
US20070162282A1 (en) * | 2006-01-09 | 2007-07-12 | Gilad Odinak | System and method for performing distributed speech recognition |
US20110066634A1 (en) * | 2007-03-07 | 2011-03-17 | Phillips Michael S | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search in mobile search application |
US8996379B2 (en) | 2007-03-07 | 2015-03-31 | Vlingo Corporation | Speech recognition text entry for software applications |
US20080221879A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile environment speech processing facility |
US20080221900A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile local search environment speech processing facility |
US20080221902A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile browser environment speech processing facility |
US20080221898A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile navigation environment speech processing facility |
US20080221899A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile messaging environment speech processing facility |
US20090030685A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using speech recognition results based on an unstructured language model with a navigation system |
US20090030687A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Adapting an unstructured language model speech recognition system based on usage |
US20090030697A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using contextual information for delivering results generated from a speech recognition facility using an unstructured language model |
US20090030684A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using speech recognition results based on an unstructured language model in a mobile communication facility application |
US20090030696A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility |
US20090030691A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using an unstructured language model associated with an application of a mobile communication facility |
US20090030688A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Tagging speech recognition results based on an unstructured language model for use in a mobile communication facility application |
US20100106497A1 (en) * | 2007-03-07 | 2010-04-29 | Phillips Michael S | Internal and external speech recognition use with a mobile communication facility |
US20100185448A1 (en) * | 2007-03-07 | 2010-07-22 | Meisel William S | Dealing with switch latency in speech recognition |
US20110054897A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Transmitting signal quality information in mobile dictation application |
US20110054898A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Multiple web-based content search user interface in mobile search application |
US20110054896A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Sending a communications header with voice recording to send metadata for use in speech recognition and formatting in mobile dictation application |
US20110054899A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Command and control utilizing content information in a mobile voice-to-speech application |
US20110054895A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Utilizing user transmitted text to improve language model in mobile dictation application |
US20110060587A1 (en) * | 2007-03-07 | 2011-03-10 | Phillips Michael S | Command and control utilizing ancillary information in a mobile voice-to-speech application |
US20080221897A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile environment speech processing facility |
US10056077B2 (en) | 2007-03-07 | 2018-08-21 | Nuance Communications, Inc. | Using speech recognition results based on an unstructured language model with a music system |
US9619572B2 (en) | 2007-03-07 | 2017-04-11 | Nuance Communications, Inc. | Multiple web-based content category searching in mobile search application |
US9495956B2 (en) | 2007-03-07 | 2016-11-15 | Nuance Communications, Inc. | Dealing with switch latency in speech recognition |
US20080221884A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile environment speech processing facility |
US8949130B2 (en) | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Internal and external speech recognition use with a mobile communication facility |
US8949266B2 (en) | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Multiple web-based content category searching in mobile search application |
US8886540B2 (en) | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Using speech recognition results based on an unstructured language model in a mobile communication facility application |
US8886545B2 (en) | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Dealing with switch latency in speech recognition |
US20080221889A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile content search environment speech processing facility |
US8880405B2 (en) | 2007-03-07 | 2014-11-04 | Vlingo Corporation | Application text entry in a mobile environment using a speech processing facility |
US8838457B2 (en) | 2007-03-07 | 2014-09-16 | Vlingo Corporation | Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility |
US8635243B2 (en) | 2007-03-07 | 2014-01-21 | Research In Motion Limited | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application |
US8139755B2 (en) | 2007-03-27 | 2012-03-20 | Convergys Cmg Utah, Inc. | System and method for the automatic selection of interfaces |
US9484035B2 (en) * | 2007-03-29 | 2016-11-01 | Intellisist, Inc | System and method for distributed speech recognition |
US8521528B2 (en) * | 2007-03-29 | 2013-08-27 | Intellisist, Inc. | System and method for distributed speech recognition |
US20170047070A1 (en) * | 2007-03-29 | 2017-02-16 | Intellisist, Inc. | Computer-Implemented System And Method For Performing Distributed Speech Recognition |
US10121475B2 (en) * | 2007-03-29 | 2018-11-06 | Intellisist, Inc. | Computer-implemented system and method for performing distributed speech recognition |
US20080243515A1 (en) * | 2007-03-29 | 2008-10-02 | Gilad Odinak | System and method for providing an automated call center inline architecture |
US20120253806A1 (en) * | 2007-03-29 | 2012-10-04 | Gilad Odinak | System And Method For Distributed Speech Recognition |
US8204746B2 (en) | 2007-03-29 | 2012-06-19 | Intellisist, Inc. | System and method for providing an automated call center inline architecture |
US9224389B2 (en) * | 2007-03-29 | 2015-12-29 | Intellisist, Inc. | System and method for performing distributed speech recognition |
US20130346080A1 (en) * | 2007-03-29 | 2013-12-26 | Intellisist, Inc. | System And Method For Performing Distributed Speech Recognition |
US8346549B2 (en) * | 2009-12-04 | 2013-01-01 | At&T Intellectual Property I, L.P. | System and method for supplemental speech recognition by identified idle resources |
US20110137648A1 (en) * | 2009-12-04 | 2011-06-09 | At&T Intellectual Property I, L.P. | System and method for improved automatic speech recognition performance |
US9431005B2 (en) | 2009-12-04 | 2016-08-30 | At&T Intellectual Property I, L.P. | System and method for supplemental speech recognition by identified idle resources |
US8688450B2 (en) * | 2010-05-19 | 2014-04-01 | Google Inc. | Disambiguation of contact information using historical and context data |
US8694313B2 (en) | 2010-05-19 | 2014-04-08 | Google Inc. | Disambiguation of contact information using historical data |
US8386250B2 (en) | 2010-05-19 | 2013-02-26 | Google Inc. | Disambiguation of contact information using historical data |
CN102014278A (en) * | 2010-12-21 | 2011-04-13 | 四川大学 | Intelligent video monitoring method based on voice recognition technology |
US11876922B2 (en) | 2013-07-23 | 2024-01-16 | Google Technology Holdings LLC | Method and device for audio input routing |
US11363128B2 (en) | 2013-07-23 | 2022-06-14 | Google Technology Holdings LLC | Method and device for audio input routing |
US9773498B2 (en) | 2013-10-28 | 2017-09-26 | At&T Intellectual Property I, L.P. | System and method for managing models for embedded speech and language processing |
US9530416B2 (en) * | 2013-10-28 | 2016-12-27 | At&T Intellectual Property I, L.P. | System and method for managing models for embedded speech and language processing |
US20150120287A1 (en) * | 2013-10-28 | 2015-04-30 | At&T Intellectual Property I, L.P. | System and method for managing models for embedded speech and language processing |
US9905228B2 (en) | 2013-10-29 | 2018-02-27 | Nuance Communications, Inc. | System and method of performing automatic speech recognition using local private data |
US9666188B2 (en) | 2013-10-29 | 2017-05-30 | Nuance Communications, Inc. | System and method of performing automatic speech recognition using local private data |
Also Published As
Publication number | Publication date |
---|---|
US20020133337A1 (en) | 2002-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6704707B2 (en) | Method for automatically and dynamically switching between speech technologies | |
US7085723B2 (en) | System and method for determining utterance context in a multi-context speech application | |
US7899673B2 (en) | Automatic pruning of grammars in a multi-application speech recognition interface | |
US7624018B2 (en) | Speech recognition using categories and speech prefixing | |
US6487534B1 (en) | Distributed client-server speech recognition system | |
US7917364B2 (en) | System and method using multiple automated speech recognition engines | |
EP1181684B1 (en) | Client-server speech recognition | |
US11188289B2 (en) | Identification of preferred communication devices according to a preference rule dependent on a trigger phrase spoken within a selected time from other command data | |
US5893063A (en) | Data processing system and method for dynamically accessing an application using a voice command | |
US5732187A (en) | Speaker-dependent speech recognition using speaker independent models | |
US6178401B1 (en) | Method for reducing search complexity in a speech recognition system | |
US20020194000A1 (en) | Selection of a best speech recognizer from multiple speech recognizers using performance prediction | |
JP5062171B2 (en) | Speech recognition system, speech recognition method, and speech recognition program | |
US10431225B2 (en) | Speaker identification assisted by categorical cues | |
US10170122B2 (en) | Speech recognition method, electronic device and speech recognition system | |
US20020138272A1 (en) | Method for improving speech recognition performance using speaker and channel information | |
CN104299623A (en) | Automated confirmation and disambiguation modules in voice applications | |
US6345254B1 (en) | Method and apparatus for improving speech command recognition accuracy using event-based constraints | |
US11562747B2 (en) | Speech-to-text transcription with multiple languages | |
US6745165B2 (en) | Method and apparatus for recognizing from here to here voice command structures in a finite grammar speech recognition system | |
US20060069563A1 (en) | Constrained mixed-initiative in a voice-activated command system | |
US5897618A (en) | Data processing system and method for switching between programs having a same title using a voice command | |
JP2022522926A (en) | Recognition of unknown words in direct acoustic word speech recognition using acoustic word embedding | |
CN113486661A (en) | Text understanding method, system, terminal equipment and storage medium | |
CN112767916A (en) | Voice interaction method, device, equipment, medium and product of intelligent voice equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ANDERSON, ANDREW V.;BENNETT, STEVEN M.;REEL/FRAME:011624/0015 Effective date: 20010313 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |