[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2019024692A1 - 语音输入方法、装置、计算机设备和存储介质 - Google Patents

语音输入方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019024692A1
WO2019024692A1 PCT/CN2018/096412 CN2018096412W WO2019024692A1 WO 2019024692 A1 WO2019024692 A1 WO 2019024692A1 CN 2018096412 W CN2018096412 W CN 2018096412W WO 2019024692 A1 WO2019024692 A1 WO 2019024692A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
text information
information
input location
target input
Prior art date
Application number
PCT/CN2018/096412
Other languages
English (en)
French (fr)
Inventor
桂浩群
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2019024692A1 publication Critical patent/WO2019024692A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Definitions

  • the present application relates to a voice input method, apparatus, computer device, and storage medium.
  • terminal pages such as web pages or application pages.
  • the general terminal page provides an input box, and the user inputs the information to be submitted in the input box, and submits the input information through the operation button in the page.
  • the user inputs information in the terminal page by clicking the cursor in the blank space of the input box and calling the input method application to input the character to be input through the physical keyboard or the virtual keyboard applied by the input method.
  • This type of input is cumbersome, inefficient, and easy to type.
  • a bank card number Since the bank card number contains a large number of numbers, it is necessary to input while watching, which may cause input errors.
  • some input methods can provide voice recognition function, it needs to manually select the voice recognition function in the operation interface of the input method after calling the input method application, and the user needs to manually select the input position, so that the process of inputting information More cumbersome. Therefore, how to simplify the input process and improve the input accuracy has become a technical problem that needs to be solved.
  • a voice input method, apparatus, computer device, and storage medium are provided.
  • a voice input method includes:
  • At least part of the content of the text information is entered at the target input location.
  • a voice input device comprising:
  • An acquisition module configured to collect voice information according to a preset voice collection instruction
  • An identification module configured to identify the voice information according to a preset voice recognition learning algorithm, and obtain the recognized text information
  • a determining module configured to determine a target input location of the text information
  • an input module configured to input at least part of the content of the text information at the target input location.
  • the text information includes an indication field and a field to be input
  • the determining module is configured to determine an input location associated with the indication field as a target input location
  • the input module is configured to input the to-be-entered field into the target input location.
  • a computer device comprising a memory and one or more processors having stored therein computer readable instructions, the computer readable instructions being executable by the processor to cause the one or more processors to execute The following steps:
  • At least part of the content of the text information is entered at the target input location.
  • One or more non-transitory computer readable storage mediums storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the steps of:
  • At least part of the content of the text information is entered at the target input location.
  • FIG. 1 is an application environment diagram of a voice input method in accordance with one or more embodiments
  • FIG. 2 is an internal block diagram of a computer device in accordance with one or more embodiments
  • FIG. 3 is a schematic flow chart of a voice input method according to one or more embodiments.
  • FIG. 4 is a schematic flow chart of a voice recognition method according to one or more embodiments.
  • FIG. 5 is a block diagram of a voice input device in accordance with one or more embodiments.
  • FIG. 6 is a block diagram of a speech recognition device in accordance with one or more embodiments.
  • the voice input method provided in the embodiment of the present application can be applied to an application environment as shown in FIG. 1.
  • a communication connection is established between computer device 10 and server 20.
  • a speech recognition database is stored on the computer device 10 or the server 20, and the speech recognition database contains speech samples.
  • the computer device 10 stores a voice collection instruction, and when the voice collection instruction is triggered, the computer device 10 collects voice information input by the user.
  • the computer device 10 identifies the voice information according to the voice samples in the locally stored voice recognition database to obtain text information.
  • the computer device 10 establishes a communication connection with the server 20, transmits the collected voice information to the server 20, and the server 20 identifies the voice information according to the voice samples in the voice recognition database to obtain text information, and the computer device 10 acquires the server 20 for identification. Text information.
  • the computer device 10 also determines a target input location of the textual information, and inputs at least part of the content of the textual information at the target input location. This allows you to enter information on the page accurately and efficiently.
  • the computer device 10 is a terminal capable of collecting voice information, and may be a desktop computer, a notebook computer, a tablet computer, a palmtop computer, a sales terminal, or a smart phone.
  • a computer device which, as shown in FIG. 2, can include a processor, memory, and network interface coupled through a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire computer device.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the database can be a speech recognition database.
  • the non-volatile storage medium can be a non-transitory computer readable storage medium.
  • the internal memory provides an environment for operation of an operating system and computer readable instructions in a non-volatile storage medium.
  • the database of the computer device is used to store data such as voice samples.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the computer readable instructions are executed by a processor to implement a voice input method.
  • FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • a voice input method 30 is provided.
  • the method is applied to the computer device 10 shown in FIG. 1 or FIG. 2 as an example, and specifically includes the following steps:
  • Step S302 collecting voice information according to a preset voice collection instruction.
  • the computer device pre-stores a voice collection instruction, and when the voice collection instruction is triggered, the voice information is collected in response to the voice collection instruction.
  • the voice collection instruction is triggered by a specific user operation.
  • an icon for invoking the voice capture instruction is provided in a page displayed by the computer device, and when the icon is clicked or touched, the voice capture instruction is triggered to collect voice information.
  • the icon can be set anywhere in the page, for example, can be set on the upper, lower, left or right side of the page.
  • the position of the icon is set according to the position of the input box in the page, for example, the icon is set in the input box, or the icon is set on one side of the input box.
  • the icon is a lip shaped icon.
  • the computer device provides a button for invoking the voice collection instruction, and the button is optionally a physical button or a virtual button, and when the button is detected to be pressed or touched, the voice collection instruction is triggered to collect voice message.
  • the button can be customized by the user in a plurality of buttons of the computer device.
  • the pre-stored voice capture instructions are triggered to collect voice information.
  • the shaking of the computer device is detected by a sensor provided inside the computer device.
  • Step S304 identifying voice information according to a preset voice recognition learning algorithm, and acquiring the recognized text information.
  • the computer device is pre-configured with a speech recognition learning algorithm and a corresponding speech recognition database, and the acquired speech information is compared with the speech samples in the speech recognition database according to the speech recognition learning algorithm to identify the text. information.
  • the computer device establishes a communication connection with the server and transmits the collected voice information to the server.
  • the computer device can establish a communication connection with the server through a network interface, wherein the network interface can be an Ethernet card or a wireless network card or the like.
  • the server recognizes the voice information according to the preset voice recognition learning algorithm, obtains the text information, and feeds the recognized text information to the computer device, and the computer device acquires the recognized text information from the server.
  • Step S306 determining a target input position of the text information.
  • the input location is determined as the target input location.
  • the page displayed by the computer device contains at least two input locations, one embodiment is to determine the target input location based on the recognized textual information.
  • the recognized text information includes not only text information to be input, but also indication information for indicating a target input position, and the target input position may be determined according to the indication information.
  • the text information to be input is a field to be input, and the indication information is an indication field.
  • the computer device presets a keyword associated with each input location, and when the recognized text information includes an indication field that matches the keyword, determines an input location associated with the indication field as the target input location.
  • the page displayed by the computer device includes a first input location for inputting a card number and a second input location for inputting a verification code, wherein the first input location is set to be associated with a keyword "card number”, and second The input location is set to be associated with the keyword "verification code”.
  • the recognized text information includes the keyword "card number”
  • the first input position associated with the "card number” keyword is determined as the target input. position.
  • the voice input method further includes: receiving a voice collection instruction, where the voice collection instruction carries an indication of the target input position information.
  • step S306 is to determine the target input position according to the indication information in the voice collection instruction before or after step S304.
  • the voice collection instruction generated by the user by clicking a specific location in the page is received, and the voice collection instruction includes indication information generated according to the specific location.
  • the specific location may be an input box in the page or a location associated with the input box next to the input box.
  • a voice collection instruction carrying the indication information of the input box is generated, and the input box can be determined as the target input position according to the indication information in the voice collection instruction.
  • a voice collection instruction carrying the indication information of the input box is generated, according to the voice collection instruction.
  • the indication information can determine the input box as the target input location.
  • the preset icon may be a lip-shaped icon.
  • Step S308 input at least part of the content of the text information at the target input position.
  • the recognized text information when the recognized text information includes a field to be input and an indication field, the field to be input is input at the target input position; or, when the recognized text information includes the indication field, the indication field in the text information is filtered out. , the text content other than the indication field is input to the target input location.
  • the recognized text information does not include the indication field, the text information recognized directly is input at the target input position.
  • the collected voice information is identified, the target input position is automatically determined, and the recognized text information is input to the target input position, and the user can input by reading the voice information, thereby not typing while watching.
  • Improve input efficiency and accuracy since the target input position is automatically determined and the input method does not need to be called, the input flow can be simplified and the input efficiency can be improved.
  • the bank card number can be entered using the voice input method described above.
  • some bank cards have a raised card pattern, and blind or poorly sighted people can enter the bank card number by touching the bank card and reading the card number.
  • the card cannot be scanned, and the above-mentioned voice recognition method can be used to input the card number conveniently and quickly.
  • the customer service staff of the bank repeats the card number read by the customer during the customer service, the card number can be accurately input to avoid input errors caused by keyboard errors.
  • a plurality of speech recognition databases are stored in the memory of the computer device, each speech recognition data inventory having speech samples of different language types.
  • the A speech recognition database contains the speech samples of Mandarin
  • the B speech recognition database contains the Cantonese speech samples
  • the C speech recognition database contains the speech samples of the Chongqing dialect
  • the D speech recognition database contains the English speech samples. Wait.
  • the speech recognition database of each language contains 0 to 9 voice samples of 10 numbers.
  • the speech recognition database of each language also contains speech samples of common words in the banking field and the financial field.
  • the voice recognition database for each language also contains voices such as "card number”, “account”, “debit card”, “credit card”, “withdrawal”, "amount”, "balance”, and the name of each bank. sample.
  • the collected voice information includes a plurality of voice segments.
  • each voice segment is matched with each voice sample in each voice recognition database, and the matching degree is the highest and higher.
  • the text character corresponding to the voice sample of the preset threshold is used as the recognition result of the voice segment, and the text information corresponding to the voice information is generated according to the recognition result of each voice segment.
  • step S304 is: calculating a matching degree between the at least one voice segment and the voice samples in the plurality of preset voice recognition databases according to a preset voice recognition learning algorithm;
  • the speech recognition database in which the speech sample is located is set as the target speech recognition database; according to the speech recognition learning algorithm, each speech segment is matched with the speech sample in the target speech recognition database, and the text characters corresponding to each speech segment are obtained; corresponding to each speech segment Text characters generate text information. That is, in step S304, the language type of the voice information is first determined according to the partial voice segments in the voice information, and then the preset voice recognition learning algorithm is used, and the voice samples in the voice recognition database corresponding to the language type are used for other voice segments.
  • the speech recognition database corresponding to other language types is filtered, so that the calculation amount at the time of speech recognition is reduced, so that the speech recognition efficiency can be improved. For example, when the user reads "1, 2, 3, 4, 5, 6, 7, 8, 9" in the order of mandarin, according to the order, the number "1" is first recognized, and the voice segment of the number "1" is recognized. Performing matching calculation with the voice samples in each speech recognition database, selecting one of the speech samples whose matching degree is greater than the preset threshold as the matching result of the number “1”, and the A speech recognition database where the matching result is located The target speech recognition database is determined, and when other numbers are subsequently recognized, the speech segments of other numbers are compared with the samples in the A speech recognition database.
  • the number of the target speech recognition database may be greater than 1. For example, when there are speech samples in the speech recognition database that are similar to the pronunciation of the "1" in the speech recognition database, when the number "1" is recognized, a plurality of matching results appear, first The speech recognition database corresponding to the plurality of matching results is confirmed as the primary target speech recognition database; when the number "2" is recognized, the speech segment of the digital "2" is compared with the speech samples in the primary target speech recognition database, when some When the matching result of the number "2" does not exist in the primary target speech recognition database, it is filtered out, and the remaining one is recorded as the secondary target speech recognition database; when the number "3" is recognized, the speech segment of the number "3” and the secondary level are The speech samples in the target speech recognition database can be compared, and so on, the speech recognition database that needs to be compared can be continuously filtered, and the calculation amount of the speech recognition is reduced, thereby improving the contrast efficiency.
  • the waveform similarity and the wavelength similarity of the speech segment and the speech sample may be separately calculated, and the speech segment is calculated according to the waveform similarity, the wavelength similarity, and the preset weight ratio.
  • the matching of voice samples may be separately calculated, and the speech segment is calculated according to the waveform similarity, the wavelength similarity, and the preset weight ratio.
  • the text characters corresponding to the at least two voice samples with the highest matching degree are used as the candidate characters, and the candidate characters are output for the user to select.
  • the voice input method 30 further includes: when the recognized text information includes at least two to-be-selected text information, detecting a user's selection operation on a candidate text information; in this case, step S308 is: At least part of the content of the candidate text information selected by the user is filled in the target input location.
  • the voice input method 30 further includes: storing the voice information and the text information as a new sample of the voice recognition learning algorithm; updating the voice recognition learning algorithm according to the newly added sample.
  • the voice information identifies a plurality of to-be-selected text information
  • the text information selected by the user and the collected voice information are stored and stored as a new sample of the speech recognition learning algorithm.
  • the voice recognition is performed by the server in step S304, the collected voice information and the text information selected by the user are uploaded to the server, so that the server associates the voice information and the text information selected by the user as a corresponding voice recognition database. New sample in .
  • the computer device or the server updates the speech recognition algorithm according to a certain time interval or according to the accumulated number of newly added samples; for example, updating the speech recognition algorithm according to the accumulated new sample number, and adding a new sample for each new sample Add one, when the accumulated new sample reaches the preset update threshold, update the speech recognition algorithm and clear the accumulated new sample number.
  • the accuracy of the speech recognition can be improved by continuously increasing the speech samples in the speech sample library and updating the speech recognition learning algorithm.
  • the voice input method 30 further includes: determining whether the field format of the recognized text information is consistent with the field format specified by the target input location, if yes, executing step S308, otherwise generating prompt information, Prompt the user to enter an error.
  • the field format specified by the verification code input box is 6 digits, and if the recognized text information is not 6 digits, for example, the recognized text information includes non-numeric characters, or the recognized text information is 7 digits, etc. , a prompt message is generated to prompt the user to enter an error.
  • the prompt information may be in the form of one or more of popup information, voice information, and vibration information.
  • the information input by the user can be accurately identified and the user is promptly prompted to avoid inputting and submitting the wrong information, thereby improving the accuracy of the information input.
  • a voice recognition method 40 is provided.
  • the method is applied to the server 20 shown in FIG. 2 as an example, and specifically includes the following steps:
  • the server is pre-configured with a speech recognition learning algorithm and a corresponding speech recognition database, and the received speech information is compared with the speech samples in the speech recognition database according to the speech recognition learning algorithm to identify the text information.
  • the server stores a plurality of speech recognition databases, each of which has speech samples of different language types.
  • the A speech recognition database contains the speech samples of Mandarin
  • the B speech recognition database contains the Cantonese speech samples
  • the C speech recognition database contains the speech samples of the Chongqing dialect
  • the D speech recognition database contains the English speech samples. Wait.
  • the speech recognition database of each language contains 0 to 9 voice samples of 10 numbers.
  • the speech recognition database of each language also contains speech samples of common words in the banking field and the financial field.
  • the voice recognition database for each language also contains voices such as "card number”, “account”, “debit card”, “credit card”, “withdrawal”, "amount”, "balance”, and the name of each bank. sample.
  • the received voice information includes a plurality of voice segments.
  • each voice segment is matched with each voice sample in each voice recognition database, and the matching degree is the highest and higher.
  • the text character corresponding to the voice sample of the preset threshold is used as the recognition result of the voice segment, and the text information corresponding to the voice information is generated according to the recognition result of each voice segment.
  • step S404 is: calculating a matching degree between the at least one voice segment and the voice samples in the plurality of preset voice recognition databases according to a preset voice recognition learning algorithm;
  • the speech recognition database in which the speech sample is located is set as a target speech recognition database; according to the speech recognition learning algorithm, the speech segment of each character is matched with the speech sample in the speech recognition database of the target speech recognition library, and the text corresponding to the speech segment of each character is obtained.
  • Character generates text information according to text characters corresponding to each voice segment.
  • step S404 the language type of the voice information is first determined according to the partial voice segments in the voice information, and then the voice samples in the voice recognition database corresponding to the language type are used for voice recognition, because other languages are filtered.
  • the speech recognition database corresponding to the type reduces the amount of calculation in speech recognition, and thus can improve speech recognition efficiency. For example, when the user reads "1, 2, 3, 4, 5, 6, 7, 8, 9" in the order of mandarin, according to the order, the number "1" is first recognized, and the voice segment of the number "1" is recognized.
  • the target speech recognition database is determined, and when other numbers are subsequently recognized, the speech segments of other numbers are compared with the samples in the A speech recognition database.
  • the number of the target speech recognition database may be greater than 1.
  • the speech recognition database corresponding to the plurality of matching results is confirmed as the primary target speech recognition database; when the number "2" is recognized, the speech segment of the digital "2" is compared with the speech samples in the primary target speech recognition database, when some When the matching result of the number "2" does not exist in the primary target speech recognition database, it is filtered out, and the remaining one is recorded as the secondary target speech recognition database; when the number "3" is recognized, the speech segment of the number "3” and the secondary level are The speech samples in the target speech recognition database can be compared, and so on, the speech recognition database that needs to be compared can be continuously filtered, and the calculation amount of the speech recognition is reduced, thereby improving the contrast efficiency.
  • the waveform similarity and the wavelength similarity of the speech segment and the speech sample may be separately calculated, and the speech segment is calculated according to the waveform similarity, the wavelength similarity, and the preset weight ratio.
  • the matching of voice samples may be separately calculated, and the speech segment is calculated according to the waveform similarity, the wavelength similarity, and the preset weight ratio.
  • the received voice information is identified and the recognized text information is sent to the computer device, so that the user can implement voice input through the computer device, without typing while watching, thereby improving input efficiency and accuracy.
  • the text character corresponding to the at least two voice samples with the highest degree of matching with the voice segment may be used as the candidate character, according to at least two
  • the candidate characters generate at least two pieces of text information to be selected, and send at least two pieces of text information to be selected to the computer device for selection by the user.
  • the voice recognition method 40 before the step S406, further includes: determining whether the field format of the recognized text information is consistent with the specified field format of the target input location, if yes, executing step S406, otherwise sending a prompt to the computer device Information to prompt the user to enter an error.
  • the voice information received by the server carries the indication information of the target input location, and the specified field format of the target input location is determined according to the indication information. After the text information is recognized, the field format and template input of the text information are determined. The specified field format of the location is consistent.
  • the server pre-stores a plurality of keywords and a prescribed field format corresponding to each keyword, wherein the keyword is used to indicate a target input location.
  • the server determines a corresponding specified field format and a target input position according to the keyword, and further determines a field to be input other than the keyword in the text information and the rule. If the field format is consistent, step S406 is performed, otherwise the prompt information is sent to the computer device to prompt the user to input an error.
  • the information input by the user can be accurately identified and the user is promptly prompted to avoid inputting and submitting the wrong information, thereby improving the accuracy of the information input.
  • FIGS. 3 and 4 are sequentially displayed as indicated by the arrows, these steps are not necessarily performed in the order indicated by the arrows. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in FIGS. 3 and 4 may include a plurality of sub-steps or stages, which are not necessarily performed at the same time, but may be performed at different times, or The order of execution of the stages is also not necessarily sequential, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
  • a voice input device is provided.
  • the voice input device 50 is applied to the computer device 10 shown in FIG. 1 or FIG. 2 as an example.
  • the voice input device 50 includes An acquisition module 502, an identification module 504, a determination module 506, and an input module 508, wherein:
  • the collecting module 502 is configured to collect voice information according to a preset voice collection instruction.
  • the identification module 504 is configured to identify the voice information according to the preset voice recognition learning algorithm, and obtain the recognized text information.
  • the determining module 506 is configured to determine a target input location of the text information.
  • the input module 508 is configured to input at least part of the content of the text information at the target input location.
  • the text information includes an indication field and a field to be input; the determining module 506 is further configured to determine an input location associated with the indication field as a target input location; and the input module 508 is further configured to input the field to be input Target input location.
  • the voice input device 50 further includes: an instruction receiving module, configured to receive a voice collection instruction, where the voice collection instruction carries indication information of a target input location; and the input module 508 is further configured to: determine text information according to the indication information. The target input location.
  • the voice input device 50 further includes: a detecting module, configured to: when the text information corresponding to the voice information includes at least two to-be-selected text information, detect a user's selection operation on a candidate text information; the input module 508 It is also used to input at least part of the content of the selected text information selected by the user into the target input location.
  • a detecting module configured to: when the text information corresponding to the voice information includes at least two to-be-selected text information, detect a user's selection operation on a candidate text information; the input module 508 It is also used to input at least part of the content of the selected text information selected by the user into the target input location.
  • the voice input device 50 further includes: a storage module, configured to store the voice information and the text information as a new sample of the voice recognition learning algorithm; and an update module, configured to update the voice recognition learning algorithm according to the newly added sample .
  • the voice information includes a plurality of voice segments;
  • the recognition module 504 includes: a calculating unit, configured to calculate at least one voice segment and the voice samples in the plurality of preset voice recognition databases according to the preset voice recognition learning algorithm a matching unit, configured to set a speech recognition database in which the highest matching speech sample is located as a target speech recognition database, and a matching unit configured to use each speech segment and the speech in the target speech recognition database according to the speech recognition learning algorithm The samples are matched to obtain text characters corresponding to the respective voice segments; and the generating unit is configured to generate text information according to the text characters corresponding to the respective voice segments.
  • the network interface may be an Ethernet card or a wireless network card.
  • the above modules may be embedded in the hardware in the processor or in the memory in the server, or may be stored in the memory in the server, so that the processor calls the corresponding operations of the above modules.
  • the processor can be a central processing unit (CPU), a microprocessor, a microcontroller, or the like.
  • a voice recognition device is provided.
  • the voice recognition device 60 is applied to the server 20 shown in FIG. 2, and the voice recognition device 60 includes a communication module 602. Identification module 604.
  • the communication module 602 is configured to establish a communication connection with the computer device and receive voice information uploaded by the computer device.
  • the identification module 604 is configured to identify the voice information according to the preset voice recognition learning algorithm, and obtain the recognized text information.
  • the communication module 602 is further configured to send the recognized text information to the computer device.
  • the received voice information includes a plurality of voice segments; the identification module 604 is configured to: perform a matching degree calculation on each voice segment separately from each voice sample in each voice recognition database, and have the highest matching degree. And the text character corresponding to the voice sample higher than the preset threshold is used as the recognition result of the voice segment, and the text information corresponding to the voice information is generated according to the recognition result of each voice segment.
  • the identification module 604 is configured to: calculate, according to a preset voice recognition learning algorithm, a matching degree between the at least one voice segment and the voice samples in the plurality of preset voice recognition databases; where the voice sample with the highest matching degree is located
  • the speech recognition database is set as a target speech recognition database; the speech segment of each character is matched with the speech samples in the speech recognition database of the target speech recognition library according to a preset speech recognition learning algorithm, and the text characters corresponding to the speech segments of each character are obtained;
  • the text information is generated according to the text characters corresponding to the respective voice segments.
  • the voice recognition device 60 further includes a determining module, configured to determine whether the field format of the recognized text information is consistent with a specified field format of the target input location, and the communication module 602 sends the recognized text to the computer device. Information, otherwise the communication module 602 sends a prompt message to the computer device to prompt the user to enter an error.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种语音输入方法,包括:根据预设的语音采集指令采集语音信息(S302);根据预设的语音识别学习算法识别语音信息,获取识别出的文本信息(S304);确定文本信息的目标输入位置(S306);在目标输入位置输入文本信息的至少部分内容(S308)。该语音输入方法,可以根据采集到语音信息,自动确定目标输入位置并将识别出的文本信息输入到目标输入位置,用户通过读出语音信息即可实现输入,从而简化了输入流程,提升了输入效率。还提供了一种语音输入装置、计算机设备及计算机可读存储介质。

Description

语音输入方法、装置、计算机设备和存储介质
本申请要求于2017年08月02日提交中国专利局,申请号为2017106533198,申请名称为“语音输入方法、装置、计算机设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种语音输入方法、装置、计算机设备和存储介质。
背景技术
随着互联网技术和终端技术的发展,用户可以通过终端页面如网页或应用程序页面等进行社交、购物、理财等活动。一般终端页面提供输入框,用户在输入框中输入所需提交的信息,并通过页面中的操作按钮提交所输入的信息。
通常用户在终端页面中输入信息的方式为:点击输入框空白处插入光标,同时调用输入法应用,通过物理按键或输入法应用的虚拟键盘输入所需输入的字符。这种输入方式比较繁琐、效率较低,而且容易输入错误。例如,用户在通过网银或手机银行进行转账、理财等业务时,通常需要输入银行卡号,由于银行卡号包含的数字较多,需要边看边输入,容易导致输入错误。虽然目前有些输入法能提供语音识别功能,但其需要在调用输入法应用后,由用户在输入法应用的操作界面中手动选择语音识别功能,还需用户手动选择输入位置,使得输入信息的过程更加繁琐。因此,如何简化输入流程及提升输入准确率成为目前需要解决的一个技术问题。
发明内容
根据本申请公开的各种实施例,提供一种语音输入方法、装置、计算机设备和存储介质。
一种语音输入方法,包括:
根据预设的语音采集指令采集语音信息;
根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息;
确定所述文本信息的目标输入位置;及
在所述目标输入位置输入所述文本信息的至少部分内容。
一种语音输入装置,包括:
采集模块,用于根据预设的语音采集指令采集语音信息;
识别模块,用于根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息;
确定模块,用于确定所述文本信息的目标输入位置;及
输入模块,用于在所述目标输入位置输入所述文本信息的至少部分内容。
在其中一个实施例中,所述文本信息中包括指示字段和待输入字段;
所述确定模块,用于确定与所述指示字段相关联的输入位置为目标输入位置;
所述输入模块,用于将所述待输入字段输入所述目标输入位置。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
根据预设的语音采集指令采集语音信息;
根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息;
确定所述文本信息的目标输入位置;及
在所述目标输入位置输入所述文本信息的至少部分内容。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
根据预设的语音采集指令采集语音信息;
根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信 息;
确定所述文本信息的目标输入位置;及
在所述目标输入位置输入所述文本信息的至少部分内容。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中语音输入方法的应用环境图;
图2为根据一个或多个实施例中计算机设备的内部框图;
图3为根据一个或多个实施例中语音输入方法的流程示意图;
图4为根据一个或多个实施例中语音识别方法的流程示意图;
图5为根据一个或多个实施例中语音输入装置的框图;
图6为根据一个或多个实施例中语音识别装置的框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例中所提供的语音输入方法,可应用于如图1所示的应用环境中。参考图1,计算机设备10与服务器20之间建立通信相连。计算机设备10或服务器20上存储有语音识别数据库,语音识别数据库中包含语音样本。计算机设备10存储有语音采集指令,当语音采集指令被触发时,计算机设备10采集用户输入的语音信息。可选地,计算机设备10根据本地存储的语音识别数据库中的语音样本对语音信息进行识别,得到文本信息。或者, 计算机设备10与服务器20建立通信连接,向服务器20发送采集到的语音信息,由服务器20根据语音识别数据库中的语音样本对语音信息进行识别,得到文本信息,计算机设备10获取服务器20识别出的文本信息。其中,计算机设备10还确定文本信息的目标输入位置,在目标输入位置输入文本信息的至少部分内容。由此准确、高效地在页面中输入信息。其中,计算机设备10是能够采集语音信息的终端,可以是台式电脑、笔记本电脑、平板电脑、掌上电脑、销售终端或者智能手机等。
在一个实施例中,提供了一种计算机设备,如图2所示,该计算机设备10可以包括通过系统总线连接的处理器、存储器和网络接口。其中,该处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该数据库可以是语音识别数据库。该非易失性存储介质可以是非易失性计算机可读存储介质。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储语音样本等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种语音输入方法。本领域技术人员可以理解,图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,如图3所示,提供了一种语音输入方法30,以该方法应用于图1或图2所示的计算机设备10为例进行说明,具体包括以下步骤:
步骤S302,根据预设的语音采集指令采集语音信息。
其中,计算机设备预先存储语音采集指令,当该语音采集指令被触发时,响应该语音采集指令采集语音信息。其中,该语音采集指令通过特定的用户操作触发。
在一个实施例中,计算机设备所显示的页面中提供用于调用该语音采集指令的图标,当该图标被点击或触控时,触发该语音采集指令以采集语音信 息。其中,该图标可设置于页面中的任意位置,例如可设置于页面的上部、下部、左侧或右侧等。优选地,该图标的位置根据页面中的输入框的位置设置,例如该图标设置于输入框内,或该图标设置于输入框的一侧。为了便于使用者理解该图标代表语音输入,优选地,该图标为嘴唇形状的图标。
在一个实施例中,计算机设备提供用于调用该语音采集指令的按键,该按键可选的为物理按键或虚拟按键,当检测到该按键被按压或触控时,触发该语音采集指令以采集语音信息。其中,该按键可由用户在计算机设备的多个按键中自定义选择设置。
在一个实施例中,当检测到计算机设备被来回摇晃时,触发预存的语音采集指令以采集语音信息。具体地,通过计算机设备内部设置的传感器检测计算机设备的晃动。
步骤S304,根据预设的语音识别学习算法识别语音信息,获取识别出的文本信息。
在一个实施例中,计算机设备本地预先设置有语音识别学习算法及对应的语音识别数据库,根据语音识别学习算法,将采集得到的语音信息与语音识别数据库中的语音样本进行对比计算,识别出文本信息。
在一个实施例中,计算机设备与服务器建立通信连接,将采集得到的语音信息发送至服务器。例如,计算机设备可通过网络接口与服务器建立通信连接,其中,网络接口可以是以太网卡或无线网卡等。服务器根据预设的语音识别学习算法识别语音信息,得到文本信息,并将识别出的文本信息反馈给计算机设备,计算机设备从服务器获取识别出的文本信息。
步骤S306,确定文本信息的目标输入位置。
其中,若计算机设备所显示的页面中仅包括一个输入位置,则将该输入位置确定为目标输入位置。若计算机设备显示的页面中包含至少两个输入位置,一种实施方式是,根据识别出的文本信息确定目标输入位置。具体地,识别出的文本信息中不仅包括待输入的文本信息,还包括用于指示目标输入位置的指示信息,可根据指示信息确定目标输入位置。例如待输入的文本信息为待输入字段,指示信息为指示字段。计算机设备预先设置各输入位置所 关联的关键词,当识别出的文本信息包括与关键词相匹配的指示字段时,将与该指示字段相关联的输入位置确定为目标输入位置。
举例来说,计算机设备显示的页面中包括用于输入卡号的第一输入位置和用于输入验证码的第二输入位置,其中第一输入位置被设置与关键词“卡号”相关联,第二输入位置被设置与关键词“验证码”相关联,当识别出的文本信息中包括“卡号”这一关键词时,则将与“卡号”关键词相关联的第一输入位置确定为目标输入位置。
若计算机设备显示的页面中包含至少两个输入位置,在根据预设的语音采集指令采集语音信息之前,该语音输入方法还包括:接收语音采集指令,语音采集指令中携带有目标输入位置的指示信息。此时,步骤S306的另一种实施方式是,在步骤S304之前或之后,根据语音采集指令中的指示信息确定目标输入位置。具体地,接收用户通过点击页面中的特定位置生成的语音采集指令,该语音采集指令中包括根据该特定位置生成的指示信息。其中,该特定位置可以是页面中的输入框,或者是在输入框旁边与输入框相关联的位置。例如,当用户点击页面中的输入框时,生成携带有该输入框的指示信息的语音采集指令,根据语音采集指令中的指示信息可将该输入框确定为目标输入位置。又如,当用户点击输入框中的预设图标或点击输入框旁边与该输入框相关联的预设图标时,生成携带有该输入框的指示信息的语音采集指令,根据语音采集指令中的指示信息可将该输入框确定为目标输入位置。优选地,为了便于用户理解预设图标代表语音输入,预设图标可以为嘴唇形状的图标。
步骤S308,在目标输入位置输入文本信息的至少部分内容。
具体地,当识别出的文本信息包括待输入字段和指示字段时,在目标输入位置输入该待输入字段;或者说,当识别出的文本信息包括指示字段时,过滤掉文本信息中的指示字段,将文本信息中除指示字段之外的其他内容输入目标输入位置。当识别出的文本信息不包括指示字段时,在目标输入位置输入直接输入识别出的文本信息。
本实施例中,对采集到的语音信息进行识别,自动确定目标输入位置并将识别出的文本信息输入到目标输入位置,用户通过读出语音信息即可实现 输入,不用边看边打字,从而提升输入效率及准确率。而且,由于自动确定目标输入位置且无需调用输入法,因此能够简化输入流程,提升输入效率。
在一个实施例中,可采用上述语音输入方法输入银行卡号。例如,一些银行卡的卡号为凸起图案,盲人或视力不佳人群可通过触摸银行卡并读出卡号来输入银行卡号。又如,用户通过电脑等计算机设备输入卡号时,无法扫卡输入,采用上述语音识别方法可方便快捷地输入卡号。又如,银行的客服人员在进行客户服务时,复述客户读出的卡号即可准确输入卡号,避免由于按键盘出错导致输入错误。
在一个实施例中,计算机设备的存储器中存有多个语音识别数据库,每个语音识别数据库存有不同语言类型的语音样本。例如,A语音识别数据库中存有普通话的语音样本,B语音识别数据库中存有粤语的语音样本,C语音识别数据库中存有重庆话的语音样本,D语音识别数据库中存有英语的语音样本等。其中,每种语言的语音识别数据库中存有0~9共10个数字的语音样本。优选地,每种语言的语音识别数据库中还存有银行领域、金融领域中常见词汇的语音样本。例如,每种语言的语音识别数据库中还存有“卡号”、“账号”、“借记卡”、“信用卡”、“取款”、“金额”、“余额”及各银行的名称等的语音样本。
其中,采集到的语音信息中包括多个语音片段;步骤S304的一种实施方式是,将各语音片段分别与各语音识别数据库中的各语音样本进行匹配度计算,将匹配度最高且高于预设阈值的语音样本所对应的文本字符作为该语音片段的识别结果,根据各语音片段的识别结果生成语音信息所对应的文本信息。
为了提升识别效率,步骤S304的另一种实施方式是,根据预设的语音识别学习算法计算至少一个语音片段与多个预设的语音识别数据库中的语音样本的匹配度;将匹配度最高的语音样本所在的语音识别数据库设置为目标语音识别数据库;根据该语音识别学习算法将各语音片段与目标语音识别数据库中的语音样本进行匹配,获取各语音片段对应的文本字符;根据各语音片段对应的文本字符生成文本信息。即,在步骤S304中,先根据语音信息中的 部分语音片段确定语音信息的语言类型,进而采用预设的语音识别学习算法,根据该语言类型对应的语音识别数据库中的语音样本对其他语音片段进行语音识别,由于过滤了其他语言类型对应的语音识别数据库,减少了语音识别时的计算量,因此能提高语音识别效率。举例来说,当用户采用普通话顺序读出“1、2、3、4、5、6、7、8、9”时,根据顺序,先识别数字“1”,将数字“1”的语音片段与各语音识别数据库中的语音样本进行匹配度计算,在匹配度大于预设阈值的语音样本中选择匹配度最高的一个作为数字“1”的匹配结果,将该匹配结果所在的A语音识别数据库确定为目标语音识别数据库,后续识别其他数字时,将其他数字的语音片段与A语音识别数据库中的样本进行比对即可。其中,目标语音识别数据库的数量可以大于1,例如,当几个语音识别数据库中都存在与“1”的普通话发音相近的语音样本时,识别数字“1”时出现多个匹配结果,则先将该多个匹配结果对应的语音识别数据库都确认为初级目标语音识别数据库;识别数字“2”时,将数字“2”的语音片段与初级目标语音识别数据库中的语音样本进行对比,当一些初级目标语音识别数据库中不存在数字“2”的匹配结果时,将其过滤掉,余下的记为二级目标语音识别数据库;识别数字“3”时,数字“3”的语音片段与二级目标语音识别数据库中的语音样本进行对比即可,以此类推,可以不断过滤需要对比的语音识别数据库,减少语音识别的计算量,从而提高对比效率。
具体地,计算语音片段与语音样本的匹配度时,可分别计算语音片段与语音样本的波形相似度和波长相似度,根据波形相似度、波长相似度及预设的权重比例,计算语音片段与语音样本的匹配度。
在一个实施例中,若不存在匹配度大于预设阈值的语音样本,则将匹配度最高的至少两种语音样本对应的文本字符作为待选字符,输出待选字符以供用户选择。
在一个实施例中,该语音输入方法30还包括:当识别出的文本信息包括至少两个待选文本信息时,检测用户对一个待选文本信息的选择操作;此时,步骤S308为:将用户选择的待选文本信息的至少部分内容填入目标输入位置。
在一个实施例中,步骤S308之后,该语音输入方法30还包括:将语音信息与文本信息关联存储为语音识别学习算法的新增样本;根据新增样本更新语音识别学习算法。其中,当语音信息识别出多个待选文本信息时,将用户所选择的文本信息与采集到的语音信息管理存储为语音识别学习算法的新增样本。其中,若步骤S304中通过服务器进行语音识别,则将采集到的语音信息与用户选择的文本信息上传至服务器,以使服务器将该语音信息及用户选择的文本信息关联存储为对应的语音识别数据库中的新增样本。优选地,计算机设备或服务器按照一定的时间间隔或根据累计新增样本数更新语音识别算法;以根据累计新增样本数更新语音识别算法为例,每新增一个样本,则累计新增样本数加一,当累计新增样本数达到预设更新阈值时,更新语音识别算法并将累计新增样本数清零。
本实施例中,通过不断增加语音样本库中的语音样本及更新语音识别学习算法,能够提升语音识别的准确率。
在一个实施例中,步骤S308之前,该语音输入方法30还包括:判断识别出的文本信息的字段格式与目标输入位置规定的字段格式是否一致,是则执行步骤S308,否则生成提示信息,以提示用户输入错误。举例来说,验证码输入框规定的字段格式为6位数字,若识别出的文本信息不是6位数字,例如识别出的文本信息包括非数字字符、或识别出的文本信息为7位数字等,则生成提示信息,以提示用户输入错误。可选地,提示信息的形式可以是弹窗信息、语音信息、震动信息中的一种或多种。
本实施例中,能准确识别用户错误输入的信息并及时提示用户,避免输入及提交错误的信息,提升信息输入的准确性。
在一个实施例中,如图4所示,提供了一种语音识别方法40,以该方法应用于图2所示的服务器20为例进行说明,具体包括以下步骤:
S402,与计算机设备建立通信连接,并接收计算机设备上传的语音信息。
S404,根据预设的语音识别学习算法识别语音信息,得到识别出的文本信息。
S406,向计算机设备发送识别出的文本信息。
在一个实施例中,服务器预先设置有语音识别学习算法及对应的语音识别数据库,根据语音识别学习算法,将接收到的语音信息与语音识别数据库中的语音样本进行对比计算,识别出文本信息。
具体地,服务器存有多个语音识别数据库,每个语音识别数据库存有不同语言类型的语音样本。例如,A语音识别数据库中存有普通话的语音样本,B语音识别数据库中存有粤语的语音样本,C语音识别数据库中存有重庆话的语音样本,D语音识别数据库中存有英语的语音样本等。其中,每种语言的语音识别数据库中存有0~9共10个数字的语音样本。优选地,每种语言的语音识别数据库中还存有银行领域、金融领域中常见词汇的语音样本。例如,每种语言的语音识别数据库中还存有“卡号”、“账号”、“借记卡”、“信用卡”、“取款”、“金额”、“余额”及各银行的名称等的语音样本。
其中,接收到的语音信息中包括多个语音片段;步骤S404的一种实施方式是,将各语音片段分别与各语音识别数据库中的各语音样本进行匹配度计算,将匹配度最高且高于预设阈值的语音样本所对应的文本字符作为该语音片段的识别结果,根据各语音片段的识别结果生成语音信息所对应的文本信息。
为了提升识别效率,步骤S404的另一种实施方式是,根据预设的语音识别学习算法计算至少一个语音片段与多个预设的语音识别数据库中的语音样本的匹配度;将匹配度最高的语音样本所在的语音识别数据库设置为目标语音识别数据库;根据该语音识别学习算法将各字符的语音片段与目标语音识别库语音识别数据库中的语音样本进行匹配,获取各字符的语音片段对应的文本字符;根据各语音片段对应的文本字符生成文本信息。即,在步骤S404中,先根据语音信息中的部分语音片段确定语音信息的语言类型,进而根据该语言类型对应的语音识别数据库中的语音样本对其他语音片段进行语音识别,由于过滤了其他语言类型对应的语音识别数据库,减少了语音识别时的计算量,因此能提高语音识别效率。举例来说,当用户采用普通话顺序读出“1、2、3、4、5、6、7、8、9”时,根据顺序,先识别数字“1”,将数字“1”的语音片段与各语音识别数据库中的语音样本进行匹配度计算,在匹配度大 于预设阈值的语音样本中选择匹配度最高的一个作为数字“1”的匹配结果,将该匹配结果所在的A语音识别数据库确定为目标语音识别数据库,后续识别其他数字时,将其他数字的语音片段与A语音识别数据库中的样本进行比对即可。其中,目标语音识别数据库的数量可以大于1,例如,当几个语音识别数据库中都存在与“1”的普通话发音相近的语音样本时,识别数字“1”时出现多个匹配结果,则先将该多个匹配结果对应的语音识别数据库都确认为初级目标语音识别数据库;识别数字“2”时,将数字“2”的语音片段与初级目标语音识别数据库中的语音样本进行对比,当一些初级目标语音识别数据库中不存在数字“2”的匹配结果时,将其过滤掉,余下的记为二级目标语音识别数据库;识别数字“3”时,数字“3”的语音片段与二级目标语音识别数据库中的语音样本进行对比即可,以此类推,可以不断过滤需要对比的语音识别数据库,减少语音识别的计算量,从而提高对比效率。
具体地,计算语音片段与语音样本的匹配度时,可分别计算语音片段与语音样本的波形相似度和波长相似度,根据波形相似度、波长相似度及预设的权重比例,计算语音片段与语音样本的匹配度。
本实施例中,对接收到的语音信息进行识别并向计算机设备发送识别出的文本信息,使得用户可通过计算机设备实现语音输入,不用边看边打字,从而提升输入效率及准确率。
在一个实施例中,若不存在与语音片段的匹配度大于预设阈值的语音样本,可将与语音片段的匹配度最高的至少两种语音样本对应的文本字符作为待选字符,根据至少两个待选字符生成待选的至少两条文本信息,向计算机设备发送待选的至少两条文本信息以供用户选择。
在一个实施例中,在步骤S406之前,语音识别方法40还包括:判断识别出的文本信息的字段格式与目标输入位置的规定字段格式是否一致,是则执行步骤S406,否则向计算机设备发送提示信息,以提示用户输入错误。
在一个实施例中,服务器接收到的语音信息中携带目标输入位置的指示信息,根据该指示信息确定目标输入位置的规定字段格式,在识别出文本信息之后,判断文本信息的字段格式与模板输入位置的规定字段格式是否一致。
在一个实施例中,服务器预先存储多个关键词及各关键词对应的规定字段格式,其中该关键词用于指示目标输入位置。识别出文本信息之后,若文本信息中包括预先存储的关键词,服务器根据该关键词确定对应的规定字段格式及目标输入位置,进而判断文本信息中除关键词之外的待输入字段与该规定字段格式是否一致,是则执行步骤S406,否则向计算机设备发送提示信息,以提示用户输入错误。本实施例中,能准确识别用户错误输入的信息并及时提示用户,避免输入及提交错误的信息,提升信息输入的准确性。
应该理解的是,虽然图3和图4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图3和图4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图5所示,提供了一种语音输入装置,以该语音输入装置50应用于图1或图2所示的计算机设备10为例进行说明,该语音输入装置50包括:采集模块502、识别模块504、确定模块506及输入模块508,其中:
采集模块502,用于根据预设的语音采集指令采集语音信息。
识别模块504,用于根据预设的语音识别学习算法识别语音信息,获取识别出的文本信息。
确定模块506,用于确定文本信息的目标输入位置。
输入模块508,用于在目标输入位置输入文本信息的至少部分内容。
在一个实施例中,文本信息中包括指示字段和待输入字段;确定模块506,还用于确定与指示字段相关联的输入位置为目标输入位置;输入模块508,还用于将待输入字段输入目标输入位置。
在一个实施例中,语音输入装置50还包括:指令接收模块,用于接收语 音采集指令,语音采集指令中携带有目标输入位置的指示信息;输入模块508还用于:根据指示信息确定文本信息的目标输入位置。
在一个实施例中,语音输入装置50还包括:检测模块,用于当语音信息对应的文本信息包括至少两个待选文本信息时,检测用户对一个待选文本信息的选择操作;输入模块508还用于将用户选择的待选文本信息中的至少部分内容输入目标输入位置。
在一个实施例中,语音输入装置50还包括:存储模块,用于将语音信息与文本信息关联存储为语音识别学习算法的新增样本;更新模块,用于根据新增样本更新语音识别学习算法。
在一个实施例中,语音信息包括多个语音片段;识别模块504包括:计算单元,用于根据预设的语音识别学习算法计算至少一个语音片段与多个预设的语音识别数据库中的语音样本的匹配度;设置单元,用于将匹配度最高的语音样本所在的语音识别数据库设置为目标语音识别数据库;匹配单元,用于根据语音识别学习算法将各语音片段与目标语音识别数据库中的语音样本进行匹配,获取各语音片段对应的文本字符;生成单元,用于根据各语音片段对应的文本字符生成文本信息。
上述语音输入装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。其中,网络接口可以是以太网卡或无线网卡等。上述各模块可以硬件形式内嵌于或独立于服务器中的处理器中,也可以以软件形式存储于服务器中的存储器中,以便于处理器调用执行以上各个模块对应的操作。该处理器可以为中央处理单元(CPU)、微处理器、单片机等。
在一个实施例中,如图6所示,提供了一种语音识别装置,以该语音识别装置60应用于图2所示的服务器20为例进行说明,该语音识别装置60包括通信模块602及识别模块604。
通信模块602用于与计算机设备建立通信连接,并接收计算机设备上传的语音信息。识别模块604,用于根据预设的语音识别学习算法识别语音信息,得到识别出的文本信息。
通信模块602还用于向计算机设备发送识别出的文本信息。
在一个实施例中,接收到的语音信息中包括多个语音片段;识别模块604用于:用于将各语音片段分别与各语音识别数据库中的各语音样本进行匹配度计算,将匹配度最高且高于预设阈值的语音样本所对应的文本字符作为该语音片段的识别结果,根据各语音片段的识别结果生成语音信息所对应的文本信息。
为了提升识别效率,识别模块604用于:根据预设的语音识别学习算法计算至少一个语音片段与多个预设的语音识别数据库中的语音样本的匹配度;将匹配度最高的语音样本所在的语音识别数据库设置为目标语音识别数据库;根据预设的语音识别学习算法将各字符的语音片段与目标语音识别库语音识别数据库中的语音样本进行匹配,获取各字符的语音片段对应的文本字符;根据各语音片段对应的文本字符生成文本信息。
在一个实施例中,语音识别装置60还包括判断模块,用于判断识别出的文本信息的字段格式与目标输入位置的规定字段格式是否一致,是则通信模块602向计算机设备发送识别出的文本信息,否则通信模块602向计算机设备发送提示信息,以提示用户输入错误。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的程序可存储于非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种语音输入方法,包括:
    根据预设的语音采集指令采集语音信息;
    根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息;
    确定所述文本信息的目标输入位置;及
    在所述目标输入位置输入所述文本信息的至少部分内容。
  2. 根据权利要求1所述的方法,其特征在于,所述文本信息中包括指示字段和待输入字段;
    所述确定所述文本信息的目标输入位置包括:确定与所述指示字段相关联的输入位置为目标输入位置;及
    所述在所述目标输入位置输入所述文本信息的至少部分内容包括:将所述待输入字段输入所述目标输入位置。
  3. 根据权利要求1所述的方法,其特征在于,在所述根据预设的语音采集指令采集语音信息的步骤之前,所述方法还包括:
    接收语音采集指令,所述语音采集指令中携带有目标输入位置的指示信息;及
    所述确定所述文本信息的目标输入位置包括:根据所述指示信息确定所述文本信息的目标输入位置。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当所述语音信息对应的文本信息包括至少两个待选文本信息时,检测用户对一个所述待选文本信息的选择操作;及
    所述在所述目标输入位置输入所述文本信息的至少部分内容包括:将用户选择的所述待选文本信息中的至少部分内容输入所述目标输入位置。
  5. 根据权利要求1所述的方法,其特征在于,在所述目标输入位置输入所述文本信息的至少部分内容之后,所述方法还包括:
    将所述语音信息与所述文本信息关联存储为所述语音识别学习算法的新增样本;及
    根据所述新增样本更新所述语音识别学习算法。
  6. 根据权利要求1所述的方法,其特征在于,所述语音信息包括多个语音片段;所述根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息包括:
    根据预设的语音识别学习算法计算至少一个语音片段与多个预设的语音识别数据库中的语音样本的匹配度;
    将所述匹配度最高的语音样本所在的语音识别数据库设置为目标语音识别数据库;
    根据所述语音识别学习算法将各所述语音片段与所述目标语音识别数据库中的语音样本进行匹配,获取各所述语音片段对应的文本字符;及
    根据各语音片段对应的文本字符生成所述文本信息。
  7. 一种语音输入装置,包括:
    采集模块,用于根据预设的语音采集指令采集语音信息;
    识别模块,用于根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息;
    确定模块,用于确定所述文本信息的目标输入位置;及
    输入模块,用于在所述目标输入位置输入所述文本信息的至少部分内容。
  8. 根据权利要求7所述的装置,其特征在于,所述文本信息中包括指示字段和待输入字段;
    所述确定模块,用于确定与所述指示字段相关联的输入位置为目标输入位置;
    所述输入模块,用于将所述待输入字段输入所述目标输入位置。
  9. 一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:
    根据预设的语音采集指令采集语音信息;
    根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息;
    确定所述文本信息的目标输入位置;及
    在所述目标输入位置输入所述文本信息的至少部分内容。
  10. 根据权利要求9所述的计算机设备,其特征在于,所述文本信息中包括指示字段和待输入字段;所述计算机可读指令被处理器执行时,使得一个或多个处理器还执行以下步骤:
    确定与所述指示字段相关联的输入位置为目标输入位置;及
    将所述待输入字段输入所述目标输入位置。
  11. 根据权利要求9所述的计算机设备,其特征在于,在所述根据预设的语音采集指令采集语音信息的步骤之前,所述计算机可读指令被处理器执行时,使得一个或多个处理器还执行以下步骤:
    接收语音采集指令,所述语音采集指令中携带有目标输入位置的指示信息;及
    根据所述指示信息确定所述文本信息的目标输入位置。
  12. 根据权利要求9所述的计算机设备,其特征在于,所述计算机可读指令被处理器执行时,使得一个或多个处理器还执行以下步骤:
    接收语音采集指令,所述语音采集指令中携带有目标输入位置的指示信息;及
    根据所述指示信息确定所述文本信息的目标输入位置。
  13. 根据权利要求9所述的计算机设备,其特征在于,所述计算机可读指令被处理器执行时,使得一个或多个处理器还执行以下步骤:
    当所述语音信息对应的文本信息包括至少两个待选文本信息时,检测用户对一个所述待选文本信息的选择操作;及
    将用户选择的所述待选文本信息中的至少部分内容输入所述目标输入位置。
  14. 根据权利要求9所述的计算机设备,其特征在于,所述语音信息包括多个语音片段;所述计算机可读指令被处理器执行时,使得一个或多个处理器还执行以下步骤:
    根据预设的语音识别学习算法计算至少一个语音片段与多个预设的语音 识别数据库中的语音样本的匹配度;
    将所述匹配度最高的语音样本所在的语音识别数据库设置为目标语音识别数据库;
    根据所述语音识别学习算法将各所述语音片段与所述目标语音识别数据库中的语音样本进行匹配,获取各所述语音片段对应的文本字符;及
    根据各语音片段对应的文本字符生成所述文本信息。
  15. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
    根据预设的语音采集指令采集语音信息;
    根据预设的语音识别学习算法识别所述语音信息,获取识别出的文本信息;
    确定所述文本信息的目标输入位置;及
    在所述目标输入位置输入所述文本信息的至少部分内容。
  16. 根据权利要求15所述的非易失性计算机可读存储介质,其特征在于,所述文本信息中包括指示字段和待输入字段;所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还执行以下步骤:
    确定与所述指示字段相关联的输入位置为目标输入位置;及
    将所述待输入字段输入所述目标输入位置。
  17. 根据权利要求15所述的非易失性计算机可读存储介质,其特征在于,在所述根据预设的语音采集指令采集语音信息的步骤之前,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还执行以下步骤:
    接收语音采集指令,所述语音采集指令中携带有目标输入位置的指示信息;及
    根据所述指示信息确定所述文本信息的目标输入位置。
  18. 根据权利要求15所述的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还执行以下步骤:
    接收语音采集指令,所述语音采集指令中携带有目标输入位置的指示信息;及
    根据所述指示信息确定所述文本信息的目标输入位置。
  19. 根据权利要求15所述的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还执行以下步骤:
    当所述语音信息对应的文本信息包括至少两个待选文本信息时,检测用户对一个所述待选文本信息的选择操作;及
    将用户选择的所述待选文本信息中的至少部分内容输入所述目标输入位置。
  20. 根据权利要求15所述的非易失性计算机可读存储介质,其特征在于,所述语音信息包括多个语音片段;所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还执行以下步骤:
    根据预设的语音识别学习算法计算至少一个语音片段与多个预设的语音识别数据库中的语音样本的匹配度;
    将所述匹配度最高的语音样本所在的语音识别数据库设置为目标语音识别数据库;
    根据所述语音识别学习算法将各所述语音片段与所述目标语音识别数据库中的语音样本进行匹配,获取各所述语音片段对应的文本字符;及
    根据各语音片段对应的文本字符生成所述文本信息。
PCT/CN2018/096412 2017-08-02 2018-07-20 语音输入方法、装置、计算机设备和存储介质 WO2019024692A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710653319.8A CN107785021B (zh) 2017-08-02 2017-08-02 语音输入方法、装置、计算机设备和介质
CN201710653319.8 2017-08-02

Publications (1)

Publication Number Publication Date
WO2019024692A1 true WO2019024692A1 (zh) 2019-02-07

Family

ID=61438223

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096412 WO2019024692A1 (zh) 2017-08-02 2018-07-20 语音输入方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN107785021B (zh)
WO (1) WO2019024692A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951232A (zh) * 2021-03-02 2021-06-11 深圳创维-Rgb电子有限公司 语音输入方法、装置、设备及计算机可读存储介质

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107785021B (zh) * 2017-08-02 2020-06-02 深圳壹账通智能科技有限公司 语音输入方法、装置、计算机设备和介质
CN108491179A (zh) * 2018-03-13 2018-09-04 黄玉玲 一种文字输入的方法及系统
CN108417208B (zh) * 2018-03-26 2020-09-11 宇龙计算机通信科技(深圳)有限公司 一种语音输入方法和装置
CN110875034B (zh) * 2018-09-03 2024-03-22 嘉楠明芯(北京)科技有限公司 用于语音识别的模板训练方法、语音识别方法及其系统
CN109286554B (zh) * 2018-09-14 2021-07-13 腾讯科技(深圳)有限公司 社交应用中社交功能解锁方法及装置
CN111198936B (zh) * 2018-11-20 2023-09-15 北京嘀嘀无限科技发展有限公司 一种语音搜索方法、装置、电子设备及存储介质
CN109410923B (zh) * 2018-12-26 2022-06-10 中国联合网络通信集团有限公司 语音识别方法、装置、系统及存储介质
CN109920408B (zh) * 2019-01-17 2024-05-28 平安科技(深圳)有限公司 基于语音识别的字典项设置方法、装置、设备和存储介质
CN110569017A (zh) * 2019-09-12 2019-12-13 四川长虹电器股份有限公司 基于语音的文本输入方法
CN111126009A (zh) * 2019-12-12 2020-05-08 深圳追一科技有限公司 表单填写方法、装置、终端设备及存储介质
CN111782171A (zh) * 2020-06-22 2020-10-16 Oppo(重庆)智能科技有限公司 一种信息输入方法、装置、设备及存储介质
CN111883134B (zh) * 2020-07-24 2024-06-04 北京贝塔科技有限公司 一种语音输入方法、装置、电子设备及存储介质
CN112073785A (zh) * 2020-09-07 2020-12-11 深圳创维-Rgb电子有限公司 文字输入方法、装置、智能电视及计算机可读存储介质
CN112214997A (zh) * 2020-10-09 2021-01-12 深圳壹账通智能科技有限公司 语音信息录入方法、装置、电子设备及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009236960A (ja) * 2008-03-25 2009-10-15 Nec Corp 音声認識装置、音声認識方法及びプログラム
CN103186247A (zh) * 2011-12-31 2013-07-03 北大方正集团有限公司 公式输入方法和系统
CN103428336A (zh) * 2012-05-17 2013-12-04 西安闻泰电子科技有限公司 手机语音输入方法
CN103914209A (zh) * 2014-03-28 2014-07-09 联想(北京)有限公司 一种信息处理方法及电子设备
CN105786797A (zh) * 2016-02-23 2016-07-20 北京云知声信息技术有限公司 一种基于语音输入的信息处理方法及装置
CN107785021A (zh) * 2017-08-02 2018-03-09 上海壹账通金融科技有限公司 语音输入方法、装置、计算机设备和介质
CN107945841A (zh) * 2017-11-28 2018-04-20 浙江和仁科技股份有限公司 基于语音输入的电子病历系统及利用该系统生成电子病历的方法
CN108091370A (zh) * 2017-12-15 2018-05-29 上海京颐科技股份有限公司 信息录入方法及装置、计算机可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106186A (zh) * 2013-01-22 2013-05-15 百度在线网络技术(北京)有限公司 一种表单校验方法及系统
CN105931642B (zh) * 2016-05-31 2020-11-10 北京京东尚科信息技术有限公司 语音识别方法、设备及系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009236960A (ja) * 2008-03-25 2009-10-15 Nec Corp 音声認識装置、音声認識方法及びプログラム
CN103186247A (zh) * 2011-12-31 2013-07-03 北大方正集团有限公司 公式输入方法和系统
CN103428336A (zh) * 2012-05-17 2013-12-04 西安闻泰电子科技有限公司 手机语音输入方法
CN103914209A (zh) * 2014-03-28 2014-07-09 联想(北京)有限公司 一种信息处理方法及电子设备
CN105786797A (zh) * 2016-02-23 2016-07-20 北京云知声信息技术有限公司 一种基于语音输入的信息处理方法及装置
CN107785021A (zh) * 2017-08-02 2018-03-09 上海壹账通金融科技有限公司 语音输入方法、装置、计算机设备和介质
CN107945841A (zh) * 2017-11-28 2018-04-20 浙江和仁科技股份有限公司 基于语音输入的电子病历系统及利用该系统生成电子病历的方法
CN108091370A (zh) * 2017-12-15 2018-05-29 上海京颐科技股份有限公司 信息录入方法及装置、计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951232A (zh) * 2021-03-02 2021-06-11 深圳创维-Rgb电子有限公司 语音输入方法、装置、设备及计算机可读存储介质
CN112951232B (zh) * 2021-03-02 2024-06-04 深圳创维-Rgb电子有限公司 语音输入方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN107785021B (zh) 2020-06-02
CN107785021A (zh) 2018-03-09

Similar Documents

Publication Publication Date Title
WO2019024692A1 (zh) 语音输入方法、装置、计算机设备和存储介质
US8751972B2 (en) Collaborative gesture-based input language
US20190251471A1 (en) Machine learning device
CN104834996B (zh) 一种填单方法及装置
US20120330662A1 (en) Input supporting system, method and program
US11164564B2 (en) Augmented intent and entity extraction using pattern recognition interstitial regular expressions
WO2020159572A1 (en) System and method for information extraction with character level features
US20190294912A1 (en) Image processing device, image processing method, and image processing program
CN108227565A (zh) 一种信息处理方法、终端及计算机可读介质
KR20190095099A (ko) 거래 시스템 에러 검출 방법, 장치, 저장 매체 및 컴퓨터 장치
CN110999264A (zh) 用于将消息内容集成到目标数据处理设备中的系统和方法
US20190027149A1 (en) Documentation tag processing system
US20230351409A1 (en) Intelligent merchant onboarding
KR102308062B1 (ko) 창업을 위한 정보를 제공하기 위한 전자 장치 및 그 동작 방법
CN109034199B (zh) 数据处理方法及装置、存储介质和电子设备
US11423219B2 (en) Generation and population of new application document utilizing historical application documents
JP2019133218A (ja) 帳票対応システム、帳票対応方法及び帳票対応プログラム
KR102019752B1 (ko) 컴퓨터 수행 가능한 ui/ux 전략제공방법 및 이를 수행하는 ui/ux 전략제공장치
CN115050042A (zh) 一种理赔资料录入方法、装置、计算机设备及存储介质
EP4386651A2 (en) Product identification assistance techniques in an electronic marketplace application
JP2017514225A (ja) コンテキスト依存型ワークフローのためのスマート光入出力(i/o)拡張部
TWM584476U (zh) 轉帳伺服系統
US11481859B2 (en) Methods and systems for scheduling a user transport
US20180137178A1 (en) Accessing data and performing a data processing command on the data with a single user input
TW202046217A (zh) 轉帳資料建立伺服端及轉帳資料建立方法、轉帳伺服系統及電腦程式產品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18841975

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.05.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18841975

Country of ref document: EP

Kind code of ref document: A1