WO2018219105A1 - 语音识别方法及相关产品 - Google Patents
语音识别方法及相关产品 Download PDFInfo
- Publication number
- WO2018219105A1 WO2018219105A1 PCT/CN2018/086205 CN2018086205W WO2018219105A1 WO 2018219105 A1 WO2018219105 A1 WO 2018219105A1 CN 2018086205 W CN2018086205 W CN 2018086205W WO 2018219105 A1 WO2018219105 A1 WO 2018219105A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- recognition
- dialect
- mobile terminal
- algorithm
- recognition result
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000004590 computer program Methods 0.000 claims description 11
- 238000004891 communication Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 16
- 238000005516 engineering process Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 108090000461 Aurora Kinase A Proteins 0.000 description 2
- 102100032311 Aurora kinase A Human genes 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 2
- 241001575999 Hakka Species 0.000 description 2
- 235000016278 Mentha canadensis Nutrition 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- the present invention relates to the field of computer technology, and in particular to a voice recognition method and related products.
- Speech recognition technology is a technique that allows a machine to transform a speech signal into a corresponding text or command through an identification and understanding process.
- Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology.
- the voice recognition technology car network has also been fully quoted, for example: just dictate to set the destination direct navigation, safe and convenient.
- Speech recognition is an interdisciplinary subject. In the past two decades, speech recognition technology has made significant progress and has begun to move from the laboratory to the market. It is expected that in the next 10 years, speech recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, and consumer electronics. The areas covered by speech recognition technology include: signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence, and so on.
- the embodiment of the invention provides a speech recognition method and related products for improving the accuracy of recognition of non-standard speech.
- an embodiment of the present invention provides a voice recognition method, including:
- the target data is used to perform voice recognition on the voice data to obtain a recognition result.
- the acquiring the geographic location of the mobile terminal includes:
- the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
- the method before the determining the dialect type corresponding to the geographic location, the method further includes:
- a database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
- the method before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:
- a database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
- acquiring the recognition algorithm corresponding to the dialect type as the target algorithm includes:
- an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
- the using the target algorithm to perform voice recognition on the voice data to obtain a recognition result includes:
- the voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
- the method further includes:
- the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
- the method further includes: recording the recognition result to the recognition result set, determining the recognition result with the highest accuracy in the recognition result set, and using the recognition algorithm corresponding to the highest accuracy one type of recognition result as a follow-up A recognition algorithm for speech recognition.
- the speech recognition algorithm can be dynamically adjusted, on the one hand, dynamically adjusted according to the geographic location, and more importantly, based on the recognition result after multiple times of dynamically adjusting the recognition algorithm, a more optimized recognition algorithm can be determined as the final
- the recognition algorithm for private devices, will have higher accuracy and recognition speed will be high.
- “acquiring the geographical location of the mobile terminal, determining the dialect type corresponding to the geographical location, and acquiring the recognition algorithm corresponding to the dialect type as the target algorithm” may be performed.
- the second embodiment of the present invention further provides a mobile terminal, including a processing unit and an input and output unit.
- the input/output unit is configured to receive input data and output data
- the processing unit is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographic location, acquire a recognition algorithm corresponding to the dialect type as a target algorithm, and use the target after collecting the voice data.
- the algorithm performs speech recognition on the speech data to obtain a recognition result.
- the processing unit is further configured to: after the mobile terminal is started, collect location information of the mobile terminal to obtain a history record set; analyze the history record set to obtain the The geographical area to which the mobile terminal belongs is the geographical location.
- the third embodiment of the present invention further provides a mobile terminal, including one or more processors, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory. And configured to be executed by the one or more processors, the program comprising instructions for performing the steps of any of the methods provided by embodiments of the present invention.
- the present invention further provides a computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes the computer to perform the method of any one of claims 1-6
- the computer includes a mobile terminal.
- the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the non-standard voice.
- the accuracy of the recognition is determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the non-standard voice.
- FIG. 1 is a schematic flow chart of a method provided by an embodiment of the present invention.
- FIG. 2 is a schematic diagram of an interface according to an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
- references to "an embodiment” herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the invention.
- the appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.
- the mobile terminal involved in the embodiments of the present invention may include various mobile handheld devices, in-vehicle devices, wearable devices, computing devices, or other processing devices connected to the wireless modem, and various forms of user equipment (User Equipment, UE), mobile station (MS), terminal device, and the like.
- UE User Equipment
- MS mobile station
- terminal device and the like.
- the devices mentioned above are collectively referred to as mobile terminals.
- the non-standard voice is relative to the standard voice
- the standard voice may be: Mandarin pronunciation of Chinese, or some dialect pronunciations that are included in the standard. This will not be repeated hereafter.
- FIG. 1 is a schematic flowchart of a voice recognition method according to an embodiment of the present invention, which is applied to a mobile terminal.
- the camera control method includes:
- the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location.
- the dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China, namely:
- Cantonese (abbreviation: Cantonese);
- Hunan dialect abbreviation: Xiang language
- Hakka (abbreviation: Hakka).
- Step 102 Obtain an identification algorithm corresponding to the dialect type as the target algorithm.
- MIT Media lab Speech Dataset MIT Institute of Media Lab Voice Dataset
- Pitch and voicingng Estimates for Aurora 2 Aurora2 Speech Library Gene Cycle and Tone Estimation
- Congressional Speech Data and Mandarin Speech Frame Data
- voice data used to test the blind source separation algorithm, and the like.
- different dialect types may have different recognition algorithms corresponding thereto, and in particular, different recognition algorithms may correspond to speech databases of standard dialects of different dialect types; therefore, for the determined dialect types, the identification may be specifically improved. Speed and accuracy.
- the voice data collected above may be a person speaking to the terminal device, and the voice pickup device of the terminal device, for example, a microphone, collects voice data input by the user.
- the algorithm of the speech recognition that is, the target algorithm is determined, the specific identification process is not described in detail in the embodiments of the present invention. It can be understood that for different dialects, a voice database with different dialects can be used in conjunction with the recognition algorithm.
- the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the accuracy of the recognition of the non-standard voice. rate.
- the embodiment of the present invention provides a solution, because the location information obtained by the terminal device is not necessarily a common or real location of the terminal device, such as a mobile terminal of the travel client.
- the geographical location of obtaining the mobile terminal mentioned above includes:
- the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing
- the geographical area to which the mobile terminal belongs is the above geographical location.
- the history set is used to determine the area to which the terminal device belongs. This can avoid the problem that the mobile terminal frequently moves in various dialect areas to cause inaccurate judgment.
- the manner of analyzing the historical record set may be as follows: determining that the terminal device lasts for a long time in a certain geographical area, and the geographic area may be the most likely real geographical location area of the mobile terminal. For example, the location where the car is parked the most, the location where the phone is at night, and so on.
- the embodiment of the present invention further provides an implementation solution for establishing a pre-established database to improve the recognition speed and accuracy, and the following is as follows: Before determining the dialect type corresponding to the geographical location, the method further includes:
- a more accurate identification can be performed for a more refined dialect, for example:
- Wu language is also known as Jiangsu-Zhejiang dialect or Jiangnan dialect.
- the Suzhou dialect was used as the representative.
- the population used in Shanghai dialect has been increasing, and the number of Shanghai dialects has gradually increased. Therefore, the representative of Wu language today is Shanghai dialect.
- the main areas are south of the Yangtze River in Jiangsu province, east of Zhenjiang, a small part of Nantong, and most of Shanghai and Zhejiang. It can be divided into five pieces:
- the Jinhua dialect is the representative of Zhangzhou.
- the method before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:
- a database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
- the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types.
- the embodiment provides the solution as follows: acquiring the identifier corresponding to the above dialect type
- the algorithm as a target algorithm includes:
- an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
- multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
- a plurality of different recognition algorithms may be used, and multiple different recognition results may occur.
- This embodiment provides a solution as follows: the foregoing target algorithm is used to perform voice recognition on the voice data.
- the recognition results include:
- the obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
- the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
- the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, which is specifically as follows: after the voice recognition is performed on the voice data by using the target algorithm, the recognition result is obtained.
- the above methods also include:
- the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
- the two recognition results are displayed in the form of text, or can be played by using a voice, and if the voice is played, the corresponding dialect can be further played.
- the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better;
- the solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
- the voice recognition device may be a mobile terminal, and specifically includes:
- a location obtaining unit 301 configured to acquire a geographic location of the mobile terminal
- a type determining unit 302 configured to determine a dialect type corresponding to the geographical location
- An algorithm obtaining unit 303 configured to acquire a recognition algorithm corresponding to the dialect type as a target algorithm
- the identifying unit 304 is configured to perform voice recognition on the voice data to obtain a recognition result by using the target algorithm after the voice data is collected.
- the location obtaining unit 301 is configured to use Obtaining the geographic location of the mobile terminal includes:
- the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing
- the geographical area to which the mobile terminal belongs is the above geographical location.
- multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
- a plurality of different recognition results may occur due to the use of multiple identification algorithms.
- the embodiment provides the solution as follows:
- the voice recognition device further includes: a data establishing unit 305, configured to: before the determining the dialect type corresponding to the geographical location, further comprising:
- the data establishing unit 305 is further configured to: establish a database of correspondence between the dialect type and the recognition algorithm, and a dialect type in the database corresponding to the relationship between the dialect type and the recognition algorithm corresponds to an identification algorithm.
- the above-mentioned type determining unit 302 is configured to Obtaining an identification algorithm corresponding to the above dialect type as a target algorithm includes:
- an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
- the identification unit 304 is configured to perform voice recognition on the voice data by using the target algorithm to obtain a recognition result, including:
- the obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
- the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
- the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, specifically as follows: the foregoing voice recognition device further includes:
- the algorithm modifying unit 306 is configured to: after the voice recognition is performed on the voice data by using the target algorithm, obtain the recognition result, and sort the recognition result according to an accurate probability; the output accurate probability is greater than or equal to the preset threshold. And receiving the selection instruction; after the selection instruction specifies the accurate recognition result in the at least two recognition results, modifying the target algorithm to the recognition algorithm corresponding to the recognition result.
- Fig. 2 two kinds of recognition results are displayed; the two recognition results can be displayed in the form of text, or can be played by using a voice, and if the voice is played, the corresponding dialect can be further played.
- the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better;
- the solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
- an embodiment of the present invention further provides a mobile terminal, including a processing unit 401 and an input and output unit 403.
- the processing unit 402 is configured to perform control and management on actions of the terminal device.
- the processing unit 402 is configured to support
- the terminal device performs steps 101-103 of Figure 1 or other processes for the techniques described herein.
- the input and output unit 403 is for supporting data input and output.
- the terminal device may further include a storage unit 401 for storing program codes and data of the terminal device.
- the processing unit 402 can be a processor or a controller, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
- the above processors may also be a combination of computing functions, such as one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
- the input and output unit 403 may be a microphone, an earpiece, a speaker, etc., and the storage unit 401 may be a memory.
- the input/output unit 403 is configured to receive input data and output data.
- the processing unit 401 is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographical location, and obtain a recognition algorithm corresponding to the dialect type as a target algorithm; after collecting the voice data, use the target algorithm to perform the foregoing
- the speech data is speech-recognized to obtain the recognition result.
- the processing unit 401 is further configured to: after the mobile terminal is started, acquire a history set, where the history set is that the mobile terminal counts the mobile after each time it is started. Obtaining the location information of the terminal; analyzing the historical record set to obtain the geographical area to which the mobile terminal belongs as the geographical location.
- processor 401 For other processes that the processor 401 is used for execution, reference may be made to the foregoing method embodiments, and details are not described herein again.
- FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
- the mobile terminal includes one or more processors, a memory, a communication interface, and one or more programs, where One or more of the above programs are stored in the memory and configured to be executed by the one or more processors, the program including instructions for performing the following steps;
- Obtaining a geographic location of the mobile terminal determining a dialect type corresponding to the geographical location; acquiring an identification algorithm corresponding to the dialect type as a target algorithm; and after acquiring the voice data, using the target algorithm to perform voice recognition on the voice data to be recognized result.
- the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location.
- the dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China.
- different dialect types may have different recognition algorithms corresponding thereto, and in particular, different recognition algorithms may correspond to speech databases of standard dialects of different dialect types; therefore, for the determined dialect types, the identification may be specifically improved. Speed and accuracy.
- the voice data collected above may be a person speaking to the terminal device, and the voice pickup device of the terminal device, for example, a microphone, collects voice data input by the user.
- the algorithm of the speech recognition that is, the target algorithm is determined, the specific identification process is not described in detail in the embodiments of the present invention. It can be understood that for different dialects, a voice database with different dialects can be used in conjunction with the recognition algorithm.
- the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the accuracy of the recognition of the non-standard voice. rate.
- the embodiment of the present invention provides a solution, because the location information obtained by the terminal device is not necessarily a common or real location of the terminal device, such as a mobile terminal of the travel client.
- the geographical location of obtaining the mobile terminal mentioned above includes:
- the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing
- the geographical area to which the mobile terminal belongs is the above geographical location.
- the history set is used to determine the area to which the terminal device belongs. This can avoid the problem that the mobile terminal frequently moves in various dialect areas to cause inaccurate judgment.
- the manner of analyzing the historical record set may be as follows: determining that the terminal device lasts for a long time in a certain geographical area, and the geographic area may be the most likely real geographical location area of the mobile terminal. For example, the location where the car is parked the most, the location where the phone is at night, and so on.
- the embodiment of the present invention further provides an implementation solution for establishing a pre-established database to improve the recognition speed and accuracy, and the following is as follows: Before determining the dialect type corresponding to the geographical location, the method further includes:
- a more accurate dialect can be more accurately identified, and the same dialect type is also divided into a plurality of more detailed branches, thus establishing corresponding
- the database can further improve the accuracy of speech recognition.
- the embodiment of the present invention further provides an implementation scheme for establishing a database for pre-establishing a relationship between a dialect type and a recognition algorithm to improve recognition speed and accuracy, as follows: Before the recognition algorithm corresponding to the dialect type is used as the target algorithm, the method further includes:
- a database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
- the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types.
- the embodiment provides the solution as follows: acquiring the identifier corresponding to the above dialect type
- the algorithm as a target algorithm includes:
- an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
- multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
- a plurality of different recognition algorithms may be used, and multiple different recognition results may occur.
- This embodiment provides a solution as follows: the foregoing target algorithm is used to perform voice recognition on the voice data.
- the recognition results include:
- the obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
- the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
- the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, which is specifically as follows: after performing the voice recognition on the voice data by using the foregoing target algorithm to obtain the recognition result, the method further includes:
- the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better;
- the solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
- the mobile terminal includes corresponding hardware structures and/or software modules for performing various functions.
- the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
- the embodiment of the present invention may divide the functional unit into the mobile terminal according to the foregoing method example.
- each functional unit may be divided according to each function, or two or more functions may be integrated into one processing unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
- the embodiment of the present invention further provides another mobile terminal.
- the mobile terminal can be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), an in-vehicle computer, and the mobile terminal is used as a mobile phone as an example:
- FIG. 6 is a block diagram showing a partial structure of a mobile phone related to a mobile terminal provided by an embodiment of the present invention.
- the mobile phone includes: a radio frequency (RF) circuit 910, a memory 920, an input unit 930, a display unit 940, a sensor 950, an audio circuit 960, a wireless fidelity (WiFi) module 970, and a processor 980. And power supply 990 and other components.
- RF radio frequency
- the RF circuit 910 can be used for receiving and transmitting information.
- RF circuit 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
- LNA Low Noise Amplifier
- RF circuitry 910 can also communicate with the network and other devices via wireless communication.
- the above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- E-mail Short Messaging Service
- the memory 920 can be used to store software programs and modules, and the processor 980 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 920.
- the memory 920 can mainly include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; the storage data area can store data created according to the use of the mobile phone (such as an application). Use parameters, etc.).
- the memory 920 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
- the input unit 930 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset.
- the input unit 930 may include a fingerprint sensor 931 and other input devices 932.
- the fingerprint sensor 931 can collect fingerprint data of the user.
- the input unit 930 may also include other input devices 932.
- the other input device 932 may include, but is not limited to, one or more of a touch screen, a physical button, a function key (such as a volume control button, a switch button, etc.), a trackball, a mouse, a joystick, and the like.
- the display unit 940 can be used to display information input by the user or information provided to the user as well as various menus of the mobile phone.
- the display unit 940 can include a display screen 941.
- the display screen 941 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
- the fingerprint sensor 931 and the display screen 941 are two separate components to implement the input and input functions of the mobile phone, in some embodiments, the fingerprint sensor 931 can be integrated with the display screen 941 to implement the mobile phone. Input and playback features.
- the handset may also include at least one type of sensor 950, such as a light sensor, motion sensor, and other sensors.
- the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen 941 according to the brightness of the ambient light, and the proximity sensor may turn off the display screen 941 and/or when the mobile phone moves to the ear. Or backlight.
- the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
- the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
- the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
- vibration recognition related functions such as pedometer, tapping
- the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
- An audio circuit 960, a speaker 961, and a microphone 962 can provide an audio interface between the user and the handset.
- the audio circuit 960 can transmit the converted electrical data of the received audio data to the speaker 961 for conversion to the sound signal by the speaker 961; on the other hand, the microphone 962 converts the collected sound signal into an electrical signal by the audio circuit 960. After receiving, it is converted into audio data, and then processed by the audio data playback processor 980, sent to the other mobile phone via the RF circuit 910, or played back to the memory 920 for further processing.
- WiFi is a short-range wireless transmission technology
- the mobile phone can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 970, which provides users with wireless broadband Internet access.
- FIG. 6 shows the WiFi module 970, it can be understood that it does not belong to the essential configuration of the mobile phone, and can be omitted as needed within the scope of not changing the essence of the invention.
- the processor 980 is the control center of the handset, which connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 920, and invoking data stored in the memory 920, executing The phone's various functions and processing data, so that the overall monitoring of the phone.
- the processor 980 may include one or more processing units; preferably, the processor 980 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
- the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 980.
- the handset also includes a power source 990 (such as a battery) that supplies power to the various components.
- a power source 990 such as a battery
- the power source can be logically coupled to the processor 980 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
- the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
- each step method flow can be implemented based on the structure of the mobile phone.
- each unit function can be implemented based on the structure of the mobile phone.
- the embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, the computer program causing the computer to perform some or all of the steps of any of the methods described in the foregoing method embodiments.
- the above computer includes a mobile terminal.
- the embodiment of the present invention further provides a computer program product, the computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause the computer to execute any one of the methods described in the foregoing method embodiments Part or all of the steps of the method.
- the computer program product can be a software installation package, and the computer includes a mobile terminal.
- the disclosed apparatus may be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of the above units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
- the units described above as separate components may or may not be physically separated.
- the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the above-described integrated unit can be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a memory. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the above-described methods of various embodiments of the present invention.
- the foregoing memory includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.
- ROM Read-Only Memory
- RAM Random Access Memory
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
提供了一种语音识别方法及相关产品,其中该方法包括:获取移动终端的地理位置,确定与该地理位置对应的方言类型(101);获取与该方言类型对应的识别算法作为目标算法(102);在采集到语音数据后,使用该目标算法对该语音数据进行语音识别得到识别结果(103)。通过移动设备的地理位置来确定移动终端所属的区域使用哪些类型的方言,这样可以使用相应的识别算法来提高语音识别的准确性,因此提高了非标准语音的识别的准确率。
Description
本发明要求2017年5月31日递交的发明名称为“语音识别方法及相关产品”的申请号201710401786.1的在先申请优先权,上述在先申请的内容以引入的方式并入本文本中。
本发明涉及计算机技术领域,具体涉及语音识别方法及相关产品。
与机器进行语音交流,让机器明白你说什么,这是人们长期以来梦寐以求的事情。中国物联网校企联盟形象得把语音识别比做为机器的听觉系统。语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的技术。
语音识别技术主要包括特征提取技术、模式匹配准则及模型训练技术三个方面。语音识别技术车联网也得到了充分的引用,例如:只需口述即可设置目的地直接导航,安全、便捷。
语音识别是一门交叉学科。近二十年来,语音识别技术取得显著进步,开始从实验室走向市场。人们预计,未来10年内,语音识别技术将进入工业、家电、通信、汽车电子、医疗、家庭服务、消费电子产品等各个领域。语音识别技术所涉及的领域包括:信号处理、模式识别、概率论和信息论、发声机理和听觉机理、人工智能等等。
如何提高语音识别的准确率以及识别速度,是该领域技术人员努力的方向;目前,由于人们说话带有口音,甚至有区别很大的方言,给语音识别造成了较大的困难,因此需要提出解决方案。
发明内容
本发明实施例提供了语音识别方法及相关产品,用于提高非标准语音的识别的准确率。
第一方面,本发明实施例提供了一种语音识别方法,包括:
获取移动终端的地理位置,确定与所述地理位置对应的方言类型;
获取与所述方言类型对应的识别算法作为目标算法;
在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。
在一个可能的实现方式中,所述获取移动终端的地理位置包括:
在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。
在一个可能的实现方式中,在所述确定与所述地理位置对应的方言类型之前,还包括:
建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。
在一个可能的实现方式中,在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:
建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。
在一个可能的实现方式中,获取与所述方言类型对应的识别算法作为目标算法包括:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。
在一个可能的实现方式中,所述使用所述目标算法对所述语音数据进行语音识别得到识别结果包括:
使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。
在一个可能的实现方式中,在使用所述目标算法对所述语音数据进行语音识别得到识别结果之后,所述方法还包括:
将识别结果按照准确的概率由大至小进行排序;
输出准确的概率大于或等于预设阈值的至少两个识别结果;
接收选择指令;
在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。
在一个可能的实现方式中,所述方法还包括:记录识别结果到识别结果集,确定识别结果集中准确度最高一类识别结果,将所述准确度最高一类识别结果对应的识别算法作为后续进行语音识别的识别算法。
该实施例,可以动态调整语音识别算法,一方面根据地理位置来动态调整,更为重要的是,基于多次动态调整识别算法后的识别结果,可以确定一 个更为优化的识别算法作为最终的识别算法,这样对于私人设备而言,会具有较高的准确度并且识别速度会很高。后续可以不必再执行前文中提到的“获取移动终端的地理位置,确定与所述地理位置对应的方言类型;获取与所述方言类型对应的识别算法作为目标算法”。
二方面本发明实施例还提供了一种移动终端,包括处理单元和输入输出单元,
所述输入输出单元,用于接收输入的数据和输出数据;
所述处理单元,用于获取移动终端的地理位置,确定与所述地理位置对应的方言类型;获取与所述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。
在一个可能的实现方式中,所述处理单元,还用于在所述移动终端被启动后,统计所述移动终端所处的位置信息得到历史记录集;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。
三方面本发明实施例还提供了一种移动终端,包括一个或多个处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述一个或多个处理器执行,所述程序包括用于执行本发明实施例提供的任一项方法中的步骤的指令。
四方面本发明实施例还提供了一种计算机可读存储介质,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-6任一项所述的方法,所述计算机包括移动终端。
可以看出,本发明实施例中,通过移动设备的地理位置来确定移动终端所属的区域使用哪些类型的方言,这样可以使用相应的识别算法来提高语音识别的准确性,因此提高了非标准语音的识别的准确率。
下面将对本发明实施例所涉及到的附图作简单地介绍。
图1是本发明实施例提供的方法的流程示意图;
图2是本发明实施例的界面示意图;
图3是本发明实施例的语音识别设备结构示意图;
图4是本发明实施例的移动终端结构示意图;
图5是本发明实施例的移动终端的结构示意图;
图6是本发明实施例的移动终端的结构示意图。
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本发明实施例所涉及到的移动终端可以包括各种可移动的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。为方便描述,上面提到的设备统称为移动终端。
语音识别的准确性一直是语音识别的大难题,目前使用各种算法来提高语音识别的准确性,但是对于移动终端而言,使用者千差万别,语言类型容易区分,但是各地方言造成极大困扰。
在本发明实施例中,非标准语音是相对于标准语音而言的,标准语音可以是:汉语的普通话发音,或者,某些被列入标准的方言发音。后续实施例对此不再一一赘述。
下面结合附图对本发明实施例进行介绍。
请参阅图1,图1是本发明实施例提供了一种语音识别方法的流程示意图,应用于移动终端,如图所示,本拍照控制方法包括:
101,获取移动终端的地理位置,确定与上述地理位置对应的方言类型;
在本实施例中,地理位置可以使用经纬度,或者行政区划等方式来表示;也可以使用预置的方言区域划分来表示,并不仅限于经纬度的方式来表示该地理位置。
方言类型是指方言所属的种类。目前在中国主要有如下七种,分别为:
1、北方话(简称:北语);
2、广东话(简称:粤语);
3、江浙话(简称:吴语);
4、福建话(简称:闽语);
5、湖南话(简称:湘语);
6、江西话(简称:赣语);
7、客家话(简称:客语)。
除此之外还有很多其他的方言类型,在此不再一一罗列。
102:获取与上述方言类型对应的识别算法作为目标算法;
在语音识别的研究发展过程中,研究人员根据不同语言的发音特点,设计和制作了以汉语(包括不同方言)、英语等各类语言的语音数据库,这些语音数据库,例如:MIT Media lab Speech Dataset(麻省理工学院媒体实验室语音数据集)、Pitch and Voicing Estimates for Aurora 2(Aurora2语音库的基因周期和声调估计)、Congressional speech data(国会语音数据)、Mandarin Speech Frame Data(普通话语音帧数据)、用于测试盲源分离算法的语音数据等。
因此,不同的方言类型可以有不同的识别算法与之对应,特别地不同的识别算法可以对应到不同的方言类型的标准语音的语音数据库;因此对于确定的方言类型,可以有针对性地提高识别速度和准确度。
103:在采集到语音数据后,使用上述目标算法对上述语音数据进行语音识别得到识别结果。
上述采集语音数据,可以是人对着终端设备说话,由终端设备的语音拾取设备,例如:话筒,采集用户输入的语音数据。在语音识别的算法,即目标算法确定后,具体的识别过程本发明实施例不作赘述。可以理解的是,对于不同的方言,可以有不同方言的语音数据库与识别算法配套使用。
在本实施例中,通过移动设备的地理位置来确定移动终端所属的区域使用哪些类型的方言,这样可以使用相应的识别算法来提高语音识别的准确性,因此提高了非标准语音的识别的准确率。
在一个可选的实现方式中,由于即时获取的地理位置信息未必是终端设备的常用或者真实的能够体现其方言区域的位置,例如:出差客户的移动终端,因此本发明实施例提供了解决方案如下:上述获取移动终端的地理位置包括:
在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。
在本实施例中,采用历史记录集的方式来确定终端设备真实所属的区域,这样可以避免移动终端频繁在各种不同方言区域移动导致判断不准确的问题。
上述分析历史记录集的方式,可以如:确定终端设备在某地理区域持续的时间最长,则该地理区域可以作为该移动终端最可能的真实地理位置区域。例如:汽车停放最多的地理位置,手机晚上所在最多的地理位置等等。
在一个可选的实现方式中,本发明实施例还提供了建立预先建立数据库来提高识别速度和准确性的实现方案,具体如下:在上述确定与上述地理位置对应的方言类型之前,还包括:
建立地理区域与方言类型之间对应关系的数据库,在上述数据库中一个地理区域对应到一个或一个以上的方言类型。
本实施例中,通过建立了方言类型和数据库,那么可以针对更为细化的方言进行更为准确的识别,例如:
吴语又称江浙话或江南话。过去以苏州话为代表,现今随着上海市的经济发展,使上海话使用的人口不断的增多,通晓上海话也逐渐多。因此现今吴语的代表是上海话。通行地域主要是江苏省长江以南、镇江以东,南通小部份,上海及浙江大部份地区,可分为五个片:
(1)以上海话为代表的太湖片,通行地域:上海市、常州地区、杭州地区和宁波地区。
(2)以临海话为代表的台州片。
(3)以温州话为代表的东欧片。
(4)以金华话为代表婺州片。
(5)以丽水话为代表的丽衢片。
可见,即是同一个方言类型也会分为多种更为细化的分支,因此建立相应的数据库可以进一步提高语音识别的准确性。
在一个可选的实现方式中,在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:
建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:获取与上述方言类型对应的识别算法作为目标算法包括:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。
在本实施例中,可以对应到不同的方言类型,获得多个识别算法与之分别对应;有可能多种方言类型对应到一种识别算法,因此识别算法的个数可以比方言类型的数量少。
在一个可选的实现方式中,由于使用了多种识别算法,那么可能会出现多个不同的识别结果,本实施例提供了解决方案具体如下:上述使用上述目 标算法对上述语音数据进行语音识别得到识别结果包括:
使用获取的各目标算法对上述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。
基于概率论,识别结果会对应到一个准确的概率,那么各个识别算法得到的识别结果都会对应到一个概率,那么可以将概率值最大的识别结果作为最终的识别结果。
在一个可选的实现方式中,本发明实施例还提供了进一步矫正识别算法的选择方案,如图2所示,具体如下:在使用上述目标算法对上述语音数据进行语音识别得到识别结果之后,上述方法还包括:
将识别结果按照准确的概率由大至小进行排序;
输出准确的概率大于或等于预设阈值的至少两个识别结果;
接收选择指令;
在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。
在图2中,显示了两种识别结果;该两种识别结果可以使用文字的形式显示出来,也可以使用语音的方式播放,如果采用语音的方式播放可以进一步使用对应的方言播放。
在本实施例中,在采集到语音数据后,然后采用一种或者多种算法得到了两种以上的识别结果,然后通过用户确认的更为准确的识别结果可以确定哪一种算法更好;该方案极为适合例如手机等较为私人或者口音类似的用户使用,可以在保证识别速度的前提下,提高非标准语音的识别的准确率。
如图3所示,为本发明实施例提供的一种语音识别设备,该语音识别设备可以为移动终端,具体包括:
位置获取单元301,用于获取移动终端的地理位置;
类型确定单元302,用于确定与上述地理位置对应的方言类型;
算法获取单元303,用于获取与上述方言类型对应的识别算法作为目标算法;
识别单元304,用于在采集到语音数据后,使用上述目标算法对上述语音数据进行语音识别得到识别结果。
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:上述位置获取单元301,用于获取移动终端的地理位置包括:
在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。
在本实施例中,可以对应到不同的方言类型,获得多个识别算法与之分 别对应;有可能多种方言类型对应到一种识别算法,因此识别算法的个数可以比方言类型的数量少。
在一个可选的实现方式中,由于使用了多种识别算法,那么可能会出现多个不同的识别结果,本实施例提供了解决方案具体如下:
上述语音识别设备还包括:数据建立单元305,用于在上述确定与上述地理位置对应的方言类型之前,还包括:
建立地理区域与方言类型之间对应关系的数据库,在上述数据库中一个地理区域对应到一个或一个以上的方言类型。
本实施例中,通过建立了方言类型和数据库,那么可以针对更为细化的方言进行更为准确的识别。同一个方言类型也会分为多种更为细化的分支,因此建立相应的数据库可以进一步提高语音识别的准确性。
上述数据建立单元305,还用于:建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:上述类型确定单元302,用于获取与上述方言类型对应的识别算法作为目标算法包括:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。
上述识别单元304,用于使用上述目标算法对上述语音数据进行语音识别得到识别结果包括:
使用获取的各目标算法对上述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。
基于概率论,识别结果会对应到一个准确的概率,那么各个识别算法得到的识别结果都会对应到一个概率,那么可以将概率值最大的识别结果作为最终的识别结果。
在一个可选的实现方式中,本发明实施例还提供了进一步矫正识别算法的选择方案,如图2所示,具体如下:上述语音识别设备还包括:
算法修正单元306,用于在使用上述目标算法对上述语音数据进行语音识别得到识别结果之后,将识别结果按照准确的概率由大至小进行排序;输出准确的概率大于或等于预设阈值的至少两个识别结果;接收选择指令;在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。
在图2中,显示了两种识别结果;该两种识别结果可以使用文字的形式显示出来,也可以使用语音的方式播放,如果采用语音的方式播放可以进一步 使用对应的方言播放。
在本实施例中,在采集到语音数据后,然后采用一种或者多种算法得到了两种以上的识别结果,然后通过用户确认的更为准确的识别结果可以确定哪一种算法更好;该方案极为适合例如手机等较为私人或者口音类似的用户使用,可以在保证识别速度的前提下,提高非标准语音的识别的准确率。
如图4所示,本发明实施例还提供了一种移动终端,包括处理单元401和输入输出单元403,处理单元402用于对终端设备的动作进行控制管理,例如,处理单元402用于支持终端设备执行图1中的步骤101-103或用于本文所描述的技术的其它过程。输入输出单元403用于支持数据输入和输出。终端设备还可以包括存储单元401,用于存储终端设备的程序代码和数据。
其中,处理单元402可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。上述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。输入输出单元403可以话筒、听筒、喇叭等,存储单元401可以是存储器。
其中,上述输入输出单元403,用于接收输入的数据和输出数据;
上述处理单元401,用于获取移动终端的地理位置,确定与上述地理位置对应的方言类型;获取与上述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用上述目标算法对上述语音数据进行语音识别得到识别结果。
在一个可选的实现方式中,上述处理单元401,还用于在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。
上述处理器401还用于执行的其他流程可以参考前文方法实施例,在此不再一一赘述。
请参阅图5,图5是本发明实施例提供的一种移动终端的结构示意图,如图所示,该移动终端包括一个或多个处理器、存储器、通信接口以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述一个或多个处理器执行,上述程序包括用于执行以下步骤的指令;
获取移动终端的地理位置,确定与上述地理位置对应的方言类型;获取与上述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用 上述目标算法对上述语音数据进行语音识别得到识别结果。
在本实施例中,地理位置可以使用经纬度,或者行政区划等方式来表示;也可以使用预置的方言区域划分来表示,并不仅限于经纬度的方式来表示该地理位置。方言类型是指方言所属的种类。目前在中国主要有如下七种。
因此,不同的方言类型可以有不同的识别算法与之对应,特别地不同的识别算法可以对应到不同的方言类型的标准语音的语音数据库;因此对于确定的方言类型,可以有针对性地提高识别速度和准确度。
上述采集语音数据,可以是人对着终端设备说话,由终端设备的语音拾取设备,例如:话筒,采集用户输入的语音数据。在语音识别的算法,即目标算法确定后,具体的识别过程本发明实施例不作赘述。可以理解的是,对于不同的方言,可以有不同方言的语音数据库与识别算法配套使用。
在本实施例中,通过移动设备的地理位置来确定移动终端所属的区域使用哪些类型的方言,这样可以使用相应的识别算法来提高语音识别的准确性,因此提高了非标准语音的识别的准确率。
在一个可选的实现方式中,由于即时获取的地理位置信息未必是终端设备的常用或者真实的能够体现其方言区域的位置,例如:出差客户的移动终端,因此本发明实施例提供了解决方案如下:上述获取移动终端的地理位置包括:
在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。
在本实施例中,采用历史记录集的方式来确定终端设备真实所属的区域,这样可以避免移动终端频繁在各种不同方言区域移动导致判断不准确的问题。
上述分析历史记录集的方式,可以如:确定终端设备在某地理区域持续的时间最长,则该地理区域可以作为该移动终端最可能的真实地理位置区域。例如:汽车停放最多的地理位置,手机晚上所在最多的地理位置等等。
在一个可选的实现方式中,本发明实施例还提供了建立预先建立数据库来提高识别速度和准确性的实现方案,具体如下:在上述确定与上述地理位置对应的方言类型之前,还包括:
建立地理区域与方言类型之间对应关系的数据库,在上述数据库中一个地理区域对应到一个或一个以上的方言类型。
本实施例中,通过建立了方言类型和数据库,那么可以针对更为细化的方言进行更为准确的识别,同一个方言类型也会分为多种更为细化的分支,因此建立相应的数据库可以进一步提高语音识别的准确性。
在一个可选的实现方式中,本发明实施例还提供了建立预先建立方言类 型与识别算法之间对应的关系的数据库来提高识别速度和准确性的实现方案,具体如下:在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:
建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:获取与上述方言类型对应的识别算法作为目标算法包括:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。
在本实施例中,可以对应到不同的方言类型,获得多个识别算法与之分别对应;有可能多种方言类型对应到一种识别算法,因此识别算法的个数可以比方言类型的数量少。
在一个可选的实现方式中,由于使用了多种识别算法,那么可能会出现多个不同的识别结果,本实施例提供了解决方案具体如下:上述使用上述目标算法对上述语音数据进行语音识别得到识别结果包括:
使用获取的各目标算法对上述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。
基于概率论,识别结果会对应到一个准确的概率,那么各个识别算法得到的识别结果都会对应到一个概率,那么可以将概率值最大的识别结果作为最终的识别结果。
在一个可选的实现方式中,本发明实施例还提供了进一步矫正识别算法的选择方案,具体如下:在使用上述目标算法对上述语音数据进行语音识别得到识别结果之后,还包括:
将识别结果按照准确的概率由大至小进行排序;输出准确的概率大于或等于预设阈值的至少两个识别结果;接收选择指令;在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。
在本实施例中,在采集到语音数据后,然后采用一种或者多种算法得到了两种以上的识别结果,然后通过用户确认的更为准确的识别结果可以确定哪一种算法更好;该方案极为适合例如手机等较为私人或者口音类似的用户使用,可以在保证识别速度的前提下,提高非标准语音的识别的准确率。
上述主要从方法侧执行过程的角度对本发明实施例的方案进行了介绍。可以理解的是,移动终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和 计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
本发明实施例可以根据上述方法示例对移动终端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本发明实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
本发明实施例还提供了另一种移动终端,如图6所示,为了便于说明,仅示出了与本发明实施例相关的部分,具体技术细节未揭示的,请参照本发明实施例方法部分。该移动终端可以为包括手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等任意终端设备,以移动终端为手机为例:
图6示出的是与本发明实施例提供的移动终端相关的手机的部分结构的框图。参考图6,手机包括:射频(Radio Frequency,RF)电路910、存储器920、输入单元930、显示单元940、传感器950、音频电路960、无线保真(Wireless Fidelity,WiFi)模块970、处理器980、以及电源990等部件。本领域技术人员可以理解,图6中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图6对手机的各个构成部件进行具体的介绍:
RF电路910可用于信息的接收和发送。通常,RF电路910包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路910还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器920可用于存储软件程序以及模块,处理器980通过运行存储在存储器920的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器920可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据手机的使用所创建的数据(比如应用的使用参数等)等。此外,存储器920可以包括高 速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元930可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元930可包括指纹传感器931以及其他输入设备932。指纹传感器931,可采集用户在其上的指纹数据。除了指纹传感器931,输入单元930还可以包括其他输入设备932。具体地,其他输入设备932可以包括但不限于触控屏、物理按键、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元940可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元940可包括显示屏941,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示屏941。虽然在图6中,指纹传感器931与显示屏941是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将指纹传感器931与显示屏941集成而实现手机的输入和播放功能。
手机还可包括至少一种传感器950,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示屏941的亮度,接近传感器可在手机移动到耳边时,关闭显示屏941和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路960、扬声器961,传声器962可提供用户与手机之间的音频接口。音频电路960可将接收到的音频数据转换后的电信号,传输到扬声器961,由扬声器961转换为声音信号播放;另一方面,传声器962将收集的声音信号转换为电信号,由音频电路960接收后转换为音频数据,再将音频数据播放处理器980处理后,经RF电路910以发送给比如另一手机,或者将音频数据播放至存储器920以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块970可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图6示出了WiFi模块970,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器980是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器920内的软件程序和/或模块,以及调用存 储在存储器920内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器980可包括一个或多个处理单元;优选的,处理器980可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器980中。
手机还包括给各个部件供电的电源990(比如电池),优选的,电源可以通过电源管理系统与处理器980逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
前述图1所示的实施例中,各步骤方法流程可以基于该手机的结构实现。
前述图3~4所示的实施例中,各单元功能可以基于该手机的结构实现。
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤,上述计算机包括移动终端。
本发明实施例还提供一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包,上述计算机包括移动终端。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例上述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本发明实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。
Claims (20)
- 一种语音识别方法,其特征在于,包括:获取移动终端的地理位置,确定与所述地理位置对应的方言类型;获取与所述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。
- 根据权利要求1所述方法,其特征在于,所述获取移动终端的地理位置包括:在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。
- 根据权利要求2所述方法,其特征在于,在所述确定与所述地理位置对应的方言类型之前,还包括:建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。
- 根据权利要求3所述方法,其特征在于,在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。
- 根据权利要求4所述方法,其特征在于,所述获取与所述方言类型对应的识别算法作为目标算法包括:在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。
- 根据权利要求5所述方法,其特征在于,所述使用所述目标算法对所述语音数据进行语音识别得到识别结果包括:使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。
- 根据权利要求5项所述方法,其特征在于,在使用所述目标算法对所述语音数据进行语音识别得到识别结果之后,所述方法还包括:将识别结果按照准确的概率由大至小进行排序;输出准确的概率大于或等于预设阈值的至少两个识别结果;接收选择指令;在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。
- 一种移动终端,其特征在于,包括一个或多个处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器 中,并且被配置由所述一个或多个处理器执行,所述程序包括用于执行以下操作的指令:获取移动终端的地理位置,确定与所述地理位置对应的方言类型;获取与所述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。
- 根据权利要求8所述的移动终端,其特征在于,在获取移动终端的地理位置方面,所述程序中指令具体用于执行以下操作:在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。
- 根据权利要求9所述的移动终端,其特征在于,所述程序还包括用于执行以下操作的指令:建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。
- 根据权利要求10所述的移动终端,其特征在于,所述程序还包括用于执行以下操作的指令:建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。
- 根据权利要求11所述的移动终端,其特征在于,在获取与所述方言类型对应的识别算法作为目标算法方面,所述程序中的指令具体用于执行以下操作:在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。
- 根据权利要求12所述的移动终端,其特征在于,在使用所述目标算法对所述语音数据进行语音识别得到识别结果方面,所述程序中的指令具体用于执行以下操作:使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。
- 根据权利要求12所述的移动终端,其特征在于,所述程序还包括用于执行以下操作的指令:将识别结果按照准确的概率由大至小进行排序;输出准确的概率大于或等于预设阈值的至少两个识别结果;接收选择指令;在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将 所述目标算法修正为所述识别结果对应的识别算法。
- 一种计算机可读存储介质,其特征在于,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行以下操作:获取移动终端的地理位置,确定与所述地理位置对应的方言类型;获取与所述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。
- 根据权利要求15所述计算机可读存储介质,其特征在于,在执行获取移动终端的地理位置时,所述计算机具体用于执行以下操作:在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。
- 根据权利要求16所述计算机可读存储介质,其特征在于,在执行确定与所述地理位置对应的方言类型之前,所述计算机还执行以下操作:建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。
- 根据权利要求17所述计算机可读存储介质,其特征在于,在执行获取与所述方言类型对应的识别算法作为目标算法之前,所述计算机还执行以下操作:在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。
- 根据权利要求18所述计算机可读存储介质,其特征在于,在执行使用所述目标算法对所述语音数据进行语音识别得到识别结果时,所述计算机具体执行以下操作:使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。
- 根据权利要求18项所述计算机可读存储介质,其特征在于,在执行使用所述目标算法对所述语音数据进行语音识别得到识别结果之后,所述计算机还执行以下操作:将识别结果按照准确的概率由大至小进行排序;输出准确的概率大于或等于预设阈值的至少两个识别结果;接收选择指令;在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710401786.1 | 2017-05-31 | ||
CN201710401786.1A CN107274885B (zh) | 2017-05-31 | 2017-05-31 | 语音识别方法及相关产品 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018219105A1 true WO2018219105A1 (zh) | 2018-12-06 |
Family
ID=60064910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/086205 WO2018219105A1 (zh) | 2017-05-31 | 2018-05-09 | 语音识别方法及相关产品 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107274885B (zh) |
WO (1) | WO2018219105A1 (zh) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107274885B (zh) * | 2017-05-31 | 2020-05-26 | Oppo广东移动通信有限公司 | 语音识别方法及相关产品 |
CN108417203A (zh) * | 2018-01-31 | 2018-08-17 | 广东聚晨知识产权代理有限公司 | 一种人体语音识别传输方法及系统 |
CN108346426B (zh) * | 2018-02-01 | 2020-12-08 | 威盛电子(深圳)有限公司 | 语音识别装置以及语音识别方法 |
CN110797014B (zh) * | 2018-07-17 | 2024-06-07 | 中兴通讯股份有限公司 | 一种语音识别方法、装置及计算机存储介质 |
CN110909134A (zh) * | 2018-09-18 | 2020-03-24 | 奇酷互联网络科技(深圳)有限公司 | 语音转换的方法、移动终端和可读存储介质 |
CN109377990A (zh) * | 2018-09-30 | 2019-02-22 | 联想(北京)有限公司 | 一种信息处理方法和电子设备 |
CN109410935A (zh) * | 2018-11-01 | 2019-03-01 | 平安科技(深圳)有限公司 | 一种基于语音识别的目的地搜索方法及装置 |
CN109493848A (zh) * | 2018-12-17 | 2019-03-19 | 深圳市沃特沃德股份有限公司 | 语音识别方法、系统及电子装置 |
CN111951808B (zh) * | 2019-04-30 | 2023-09-08 | 深圳市优必选科技有限公司 | 语音交互方法、装置、终端设备及介质 |
CN112116909A (zh) * | 2019-06-20 | 2020-12-22 | 杭州海康威视数字技术股份有限公司 | 语音识别方法、装置及系统 |
CN110491368B (zh) * | 2019-07-23 | 2023-06-16 | 平安科技(深圳)有限公司 | 基于方言背景的语音识别方法、装置、计算机设备和存储介质 |
CN110570837B (zh) * | 2019-08-28 | 2022-03-11 | 卓尔智联(武汉)研究院有限公司 | 一种语音交互方法、装置及存储介质 |
CN111142999A (zh) * | 2019-12-24 | 2020-05-12 | 深圳市元征科技股份有限公司 | 一种设备语言选择方法、系统、装置及计算机存储介质 |
CN111291154B (zh) * | 2020-01-17 | 2022-08-23 | 厦门快商通科技股份有限公司 | 方言样本数据抽取方法、装置、设备及存储介质 |
CN112749543B (zh) * | 2020-12-22 | 2022-08-05 | 浙江吉利控股集团有限公司 | 一种信息解析过程的匹配方法、装置、设备及存储介质 |
CN114165819A (zh) * | 2021-11-26 | 2022-03-11 | 珠海格力电器股份有限公司 | 吸油烟机及其控制方法、模组及计算机可读介质 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005331608A (ja) * | 2004-05-18 | 2005-12-02 | Matsushita Electric Ind Co Ltd | 情報処理装置および情報処理方法 |
CN103037117A (zh) * | 2011-09-29 | 2013-04-10 | 中国电信股份有限公司 | 语音识别方法、系统和语音接入平台 |
CN103903611A (zh) * | 2012-12-24 | 2014-07-02 | 联想(北京)有限公司 | 一种语音信息的识别方法和设备 |
CN104575493A (zh) * | 2010-05-26 | 2015-04-29 | 谷歌公司 | 使用地理信息的声学模型适配 |
CN105225665A (zh) * | 2015-10-15 | 2016-01-06 | 桂林电子科技大学 | 一种语音识别方法及语音识别装置 |
CN105931643A (zh) * | 2016-06-30 | 2016-09-07 | 北京海尔广科数字技术有限公司 | 语音识别方法及装置 |
CN106057204A (zh) * | 2016-05-05 | 2016-10-26 | 刘世超 | 一种在线呼叫服务的方法及系统 |
CN106228974A (zh) * | 2016-08-19 | 2016-12-14 | 镇江惠通电子有限公司 | 基于语音识别的控制方法、装置及系统 |
CN107274885A (zh) * | 2017-05-31 | 2017-10-20 | 广东欧珀移动通信有限公司 | 语音识别方法及相关产品 |
CN107316637A (zh) * | 2017-05-31 | 2017-11-03 | 广东欧珀移动通信有限公司 | 语音识别方法及相关产品 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106128462A (zh) * | 2016-06-21 | 2016-11-16 | 东莞酷派软件技术有限公司 | 语音识别方法及系统 |
-
2017
- 2017-05-31 CN CN201710401786.1A patent/CN107274885B/zh not_active Expired - Fee Related
-
2018
- 2018-05-09 WO PCT/CN2018/086205 patent/WO2018219105A1/zh active Application Filing
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005331608A (ja) * | 2004-05-18 | 2005-12-02 | Matsushita Electric Ind Co Ltd | 情報処理装置および情報処理方法 |
CN104575493A (zh) * | 2010-05-26 | 2015-04-29 | 谷歌公司 | 使用地理信息的声学模型适配 |
CN103037117A (zh) * | 2011-09-29 | 2013-04-10 | 中国电信股份有限公司 | 语音识别方法、系统和语音接入平台 |
CN103903611A (zh) * | 2012-12-24 | 2014-07-02 | 联想(北京)有限公司 | 一种语音信息的识别方法和设备 |
CN105225665A (zh) * | 2015-10-15 | 2016-01-06 | 桂林电子科技大学 | 一种语音识别方法及语音识别装置 |
CN106057204A (zh) * | 2016-05-05 | 2016-10-26 | 刘世超 | 一种在线呼叫服务的方法及系统 |
CN105931643A (zh) * | 2016-06-30 | 2016-09-07 | 北京海尔广科数字技术有限公司 | 语音识别方法及装置 |
CN106228974A (zh) * | 2016-08-19 | 2016-12-14 | 镇江惠通电子有限公司 | 基于语音识别的控制方法、装置及系统 |
CN107274885A (zh) * | 2017-05-31 | 2017-10-20 | 广东欧珀移动通信有限公司 | 语音识别方法及相关产品 |
CN107316637A (zh) * | 2017-05-31 | 2017-11-03 | 广东欧珀移动通信有限公司 | 语音识别方法及相关产品 |
Also Published As
Publication number | Publication date |
---|---|
CN107274885B (zh) | 2020-05-26 |
CN107274885A (zh) | 2017-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018219105A1 (zh) | 语音识别方法及相关产品 | |
CN111261144B (zh) | 一种语音识别的方法、装置、终端以及存储介质 | |
JP5996783B2 (ja) | 声紋特徴モデルを更新するための方法及び端末 | |
JP2021516786A (ja) | 複数人の音声を分離する方法、装置、およびコンピュータプログラム | |
CN108538320B (zh) | 录音控制方法和装置、可读存储介质、终端 | |
WO2018072543A1 (zh) | 模型生成方法、语音合成方法及装置 | |
US11274932B2 (en) | Navigation method, navigation device, and storage medium | |
CN107170454A (zh) | 语音识别方法及相关产品 | |
CN106528545B (zh) | 一种语音信息的处理方法及装置 | |
CN111798821B (zh) | 声音转换方法、装置、可读存储介质及电子设备 | |
CN107316637A (zh) | 语音识别方法及相关产品 | |
CN111522592A (zh) | 一种基于人工智能的智能终端唤醒方法和装置 | |
CN110097895B (zh) | 一种纯音乐检测方法、装置及存储介质 | |
WO2018214760A1 (zh) | 对焦方法及相关产品 | |
CN106791010B (zh) | 一种信息处理的方法、装置和移动终端 | |
CN112948763B (zh) | 件量预测方法、装置、电子设备及存储介质 | |
CN112242143B (zh) | 一种语音交互方法、装置、终端设备及存储介质 | |
WO2020102979A1 (zh) | 语音信息的处理方法、装置、存储介质及电子设备 | |
CN116597828B (zh) | 模型确定方法、模型应用方法和相关装置 | |
CN117012202B (zh) | 语音通道识别方法、装置、存储介质及电子设备 | |
CN117731288B (zh) | 一种ai心理咨询方法和系统 | |
CN115995014A (zh) | 一种喇叭单体的检测方法、音频检测的方法以及相关装置 | |
CN106847280A (zh) | 音频信息处理方法、智能终端及语音控制终端 | |
CN118658464A (zh) | 声场景分类模型生成方法、声场景分类方法、装置、存储介质及电子设备 | |
CN117395339A (zh) | 一种联系人的推荐方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18810404 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18810404 Country of ref document: EP Kind code of ref document: A1 |