[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN108831441B - A kind of training method and device of speech recognition modeling - Google Patents

A kind of training method and device of speech recognition modeling Download PDF

Info

Publication number
CN108831441B
CN108831441B CN201810433323.8A CN201810433323A CN108831441B CN 108831441 B CN108831441 B CN 108831441B CN 201810433323 A CN201810433323 A CN 201810433323A CN 108831441 B CN108831441 B CN 108831441B
Authority
CN
China
Prior art keywords
text
matching degree
degree value
speech recognition
recognition modeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810433323.8A
Other languages
Chinese (zh)
Other versions
CN108831441A (en
Inventor
张卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai ituzhian artificial intelligence Co.,Ltd.
Shanghai Yitu Information Technology Co ltd
Shanghai Yitu Technology Co ltd
Shenzhen Yitu Information Technology Co ltd
Original Assignee
Shanghai Map Intelligent Network Technology Co Ltd
Shenzhen Yi Chart Information Technology Co Ltd
Shanghai Is According To Figure Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Map Intelligent Network Technology Co Ltd, Shenzhen Yi Chart Information Technology Co Ltd, Shanghai Is According To Figure Network Technology Co Ltd filed Critical Shanghai Map Intelligent Network Technology Co Ltd
Priority to CN201810433323.8A priority Critical patent/CN108831441B/en
Publication of CN108831441A publication Critical patent/CN108831441A/en
Application granted granted Critical
Publication of CN108831441B publication Critical patent/CN108831441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to the training methods and device of artificial intelligence field more particularly to a kind of speech recognition modeling.The embodiment of the present application provides a kind of training method of speech recognition modeling, it include: to input voice to speech recognition modeling, the corresponding N number of text of voice is obtained from the outlet side of speech recognition modeling, by in N number of text each text and presetting database in multiple texts for storing match, obtain the corresponding N number of matching degree value of N number of text, according to N number of matching degree value and preset condition, the corresponding text of matching degree value for meeting preset condition in N number of matching degree value is determined as target text, using voice and target text as the training data of speech recognition modeling, speech recognition modeling is trained.Due to can directly carry out deleting choosing processing by the N number of text obtained by speech recognition modeling, target text is determined, it is no longer necessary to which artificial mark obtains that target text can be obtained, and can so save labour turnover.

Description

A kind of training method and device of speech recognition modeling
Technical field
The invention relates to the training methods and dress of artificial intelligence field more particularly to a kind of speech recognition modeling It sets.
Background technique
Artificial intelligence (Artificial Intelligence, AI) is research and development for simulating, extending and extending people Intelligence theory, one method, technology, that is, application system new technological sciences, artificial intelligence is one of computer science Branch, the research of artificial intelligence field include robot, speech recognition, image recognition and natural language processing etc..Wherein, language Sound identifies an important technology as artificial intelligence field, is applied to internet, relevant each row such as communication, smart home Each industry.
Speech recognition modeling in order to obtain needs to prepare a large amount of voice data and corresponding with a large amount of voice data Text data is trained, and in the prior art, this article notebook data is obtained by following mode: a large amount of people being organized to listen to language Sound data, and write correct text data.However, speech recognition modeling allows with the progress of algorithm and computer capacity Training is added in more and more voice data and corresponding text data, and to promote the accuracy of speech recognition modeling, this makes Cost of labor becomes the bottleneck of resource input.
Summary of the invention
The embodiment of the present application provides the training method and device of a kind of speech recognition modeling, for saving cost of labor.
The embodiment of the present application provides a kind of training method of speech recognition modeling, comprising: inputs language to speech recognition modeling Sound obtains the corresponding N number of text of voice from the outlet side of speech recognition modeling, and N is positive integer, by each text in N number of text This is matched with the multiple texts stored in presetting database, obtains the corresponding N number of matching degree value of N number of text, wherein N A text and N number of matching degree value correspond, will be in N number of matching degree value according to N number of matching degree value and preset condition The corresponding text of matching degree value for meeting preset condition is determined as target text, using voice and target text as speech recognition The training data of model, is trained speech recognition modeling.Due to can directly by being obtained by speech recognition modeling it is N number of Text carries out deleting choosing processing, obtains N number of matching degree value of N number of text and the matching degree of preset condition, and then determine mesh Mark text, it is no longer necessary to which artificial mark can be obtained by target text, can so save labour turnover.
Optionally, by N number of text each text and presetting database in multiple texts for storing match, obtain The corresponding N number of matching degree value of N number of text, according to N number of matching degree value and preset condition, by expiring in N number of matching degree value The corresponding text of the matching degree value of sufficient preset condition is determined as target text, comprising: by each text and in N number of text The multiple texts stored in one presetting database are matched, and corresponding N number of first matching degree value of N number of text is obtained, wherein N number of text and N number of first matching degree value correspond, and the M greater than first threshold is determined from N number of first matching degree value A first matching degree value, M are the positive integer no more than N, according to the M corresponding M texts of the first matching degree value, from M Target text is determined in text.
Optionally, for each text in N number of text, corresponding first matching degree value of text is default according to first What the quantity that the multiple texts and text stored in database match determined.
Optionally, according to the M corresponding M texts of the first matching degree value, target text is determined from M text, It include: the multiple texts progress that will be stored in the M corresponding M texts of the first matching degree value and the second presetting database Match, obtain corresponding M the second matching degree value of M text, wherein M text and M the second matching degree values one are a pair of It answers, K the second matching degree values greater than second threshold is determined from M the second matching degree values, K is just no more than M Integer determines target text from K text according to the K corresponding K texts of the second matching degree value.
Optionally, for each text in M text, corresponding second matching degree value of text is default according to second What the quantity that the multiple texts and text stored in database match determined, the data in the second presetting database include voice The corresponding training data of the outlet side of identification model.
The embodiment of the present application provides a kind of training device of speech recognition modeling, comprising: acquiring unit, for knowing to voice Other mode input voice obtains the corresponding N number of text of voice from the outlet side of speech recognition modeling, and N is positive integer, determines single Member, for by N number of text each text and presetting database in multiple texts for storing match, obtain N number of text Corresponding N number of matching degree value, wherein N number of text and N number of matching degree value correspond, according to N number of matching degree value and in advance If condition, the corresponding text of matching degree value for meeting preset condition in N number of matching degree value is determined as target text, is instructed Practice unit, for being trained to speech recognition modeling using voice and target text as the training data of speech recognition modeling.
Optionally, determination unit is specifically used for: by N number of text each text and the first presetting database in store Multiple texts matched, obtain corresponding N number of first matching degree value of N number of text, wherein N number of text with N number of first It is corresponded with degree value, M the first matching degree values greater than first threshold, M is determined from N number of first matching degree value Target text is determined from M text according to the M corresponding M texts of the first matching degree value for the positive integer no more than N This.
Optionally, for each text in N number of text, corresponding first matching degree value of text is default according to first What the quantity that the multiple texts and text stored in database match determined.
Optionally, determination unit is specifically used for the M corresponding M texts of the first matching degree value and the second present count It is matched according to the multiple texts stored in library, obtains corresponding M the second matching degree value of M text, wherein M text It is corresponded with M the second matching degree values, K second greater than second threshold is determined from M the second matching degree values Matching degree value, K are the positive integer no more than M, according to the K corresponding K texts of the second matching degree value, from K text Determine target text.
Optionally, for each text in M text, corresponding second matching degree value of text is default according to second What the quantity that the multiple texts and text stored in database match determined, the data in the second presetting database include voice The corresponding training data of the outlet side of identification model.
The embodiment of the present application provides a kind of computer storage medium, and computer storage medium is stored with the executable finger of computer It enables, computer executable instructions make computer execute above-mentioned method when being called by computer.
The embodiment of the present application provides a kind of computer program product comprising instruction to be made when run on a computer It obtains computer and executes above-mentioned method.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, the drawings in the following description are only some examples of the present application, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 provides a kind of schematic diagram of speech recognition modeling for the embodiment of the present application;
Fig. 2 provides a kind of flow diagram of the training method of speech recognition modeling for the embodiment of the present application;
Fig. 3 provides a kind of flow diagram of the training method of speech recognition modeling for the embodiment of the present application;
Fig. 4 provides a kind of structural schematic diagram of the training device of speech recognition modeling for the embodiment of the present application.
Specific embodiment
In order to which the purpose, technical solution and beneficial effect of the embodiment of the present application is more clearly understood, below in conjunction with attached drawing And embodiment, the embodiment of the present application is further elaborated.It should be appreciated that specific embodiment described herein is only To explain the embodiment of the present application, it is not used to limit the embodiment of the present application.
In the embodiment of the present application, the training of speech recognition modeling can be divided into two stages, and the first stage is to obtain the language Sound identification model, second stage are trained to the speech recognition modeling.It is a kind of optionally to obtain the speech recognition modeling In embodiment, the text corresponding with the voice of voice needed for can first preparing the first stage, wherein needed for the first stage Voice can be by sound pick-up outfit admission obtain voice, can also be the voice for directly downloading acquisition from the Internet.And After the corresponding text of the voice can be the voice for listening to above-mentioned acquisition by employee, the text to get off is write manually, based on such The matching degree of mode, the required voice of first stage text corresponding with the voice is very high.
Optionally, using voice needed for the first stage as input X0, the corresponding text of the voice is made into output Y0, instruction Get the speech recognition modeling.Fig. 1 illustrates a kind of signal of the applicable speech recognition modeling of the embodiment of the present application Figure, is known, available speech recognition modeling due to outputting and inputting.Due in the first stage, the voice of acquisition Identification model is considered initial speech identification model, and the voice and corresponding text used quantitatively has certain office Sex-limited, therefore, it is necessary to more voices and text to be trained to the speech recognition modeling, so that the speech recognition modeling can be with Applied to different scenes.
The second stage of the application is described below, i.e., the speech recognition modeling is trained, Fig. 2 illustrates this Apply for a kind of flow diagram of the training method of the applicable speech recognition modeling of embodiment, comprising:
Step 201, voice is inputted to speech recognition modeling, obtains the corresponding N of voice from the outlet side of speech recognition modeling A text, N are positive integer;
Step 202, by N number of text each text and presetting database in multiple texts for storing match, obtain To the corresponding N number of matching degree value of N number of text, wherein N number of text and N number of matching degree value correspond;
Step 203, according to N number of matching degree value and preset condition, by the preset condition that meets in N number of matching degree value The corresponding text of matching degree value is determined as target text;
Step 204, using voice and target text as the training data of speech recognition modeling, speech recognition modeling is carried out Training.
In the embodiment of the present application, which can be trained on the terminal device, can also be in server In be trained, therefore, the execution subject of step 201 to step 204 can be terminal device or server.
In step 201 to speech recognition modeling input voice be second stage voice X1, it is different from the first stage Voice X0, for example, the corresponding text of the voice in two stages be it is different, the capacity of the voice in two stages is inconsistent 's.Each voice can obtain N number of text Y after the input of the input terminal of speech recognition modeling from output end11, Y12, Y13……Y1N, in other words, a voice can correspond to one or more text, this is because the speech recognition modeling is also It is initial speech identification model, so the speech recognition modeling can not uniquely determine an output valve, it therefore, can be possible Text all as output.Optionally, may there was only the difference of certain several word between the corresponding N number of text of a voice.
In the embodiment of the present application, the capacity of the voice of second stage can be very big, and therefore, terminal device or server exist To before speech recognition modeling input, which can be subjected to cutting, be cut into several sound bites according to preset duration, For example, the total duration of the voice is 10 minutes, 5 seconds one sound bites can be cut into.In this way, terminal device or service Device can input sound bite one by one, each available corresponding N number of text of sound bite to speech recognition modeling.
Since the training sample that the speech recognition modeling first stage includes is insufficient, it can be to voice X1It obtains N number of text Y11, Y12, Y13……Y1NIt is filtered, determines target text.
It, can be by each text and first in N number of text in advance in the embodiment of the first optional determining target text If the multiple texts stored in database are matched, corresponding N number of first matching degree value of N number of text is obtained, wherein N number of Text and N number of first matching degree value correspond.M greater than first threshold are determined from N number of first matching degree value First matching degree value, M are the positive integer no more than N;According to the M corresponding M texts of the first matching degree value, from M text Target text is determined in this.Optionally, which can be collected according to the text on internet after Obtained database, the inside can be disclosed chat record, text that personal blog is shared, copy etc..N number of text Corresponding first matching degree value of each text in this is according to the multiple texts and this article stored in the first presetting database What the quantity originally to match determined.
For example, voice has corresponded to 3 texts: No. 1 text Y11, No. 2 text Y12With No. 3 text Y13:
In a kind of optional embodiment, it is assumed that and No. 1 text Y11The same text is in first presetting database In occur 10000, then corresponding first matching degree value of No. 1 text can be 10000.Assuming that with No. 2 text Y12One mould The same text occurs 500 in first presetting database, then corresponding first matching degree value of No. 2 texts is 500. Assuming that with No. 3 text Y13The same text occurs 2000 in first presetting database, then No. 3 texts are corresponding First matching degree value is 2000.If first threshold is set as 1000, it can be concluded that corresponding first matching degree value of No. 1 text The first matching degree value corresponding with No. 3 texts is greater than first threshold.Server or terminal device can be from No. 1 texts and 3 Target text is determined in this 2 texts of number text.Optionally, can the first matching degree value in this 2 texts it is biggish that A text (No. 1 text) is determined as target text, this 2 texts can also be all determined as target text.
In another optional embodiment, if occurring of 12000 texts and No. 1 text in the first presetting database Meet predetermined probabilities with degree, then corresponding first matching degree value of No. 1 text can be 12000.Specifically, have in No. 1 text 20 words, if predetermined probabilities are 95%, that is to say, that must expire in 20 words that each text in 12000 texts includes Foot has 19 words in 19 words and No. 1 text consistent, and 19 words are locating in each text in 12000 texts Position is consistent the location of in No. 1 text with this 19 words.Similarly, if occurring 700 texts in the first presetting database The matching degree of this and No. 2 texts meets predetermined probabilities, then corresponding first matching degree value of No. 2 texts can be 700.If first Occur 3000 texts in presetting database and the matching degree of No. 3 texts meets predetermined probabilities, then No. 3 texts are first corresponding It can be 3000 with degree value.If first threshold is set as 1500, it can be concluded that corresponding first matching degree value of No. 1 text and Corresponding first matching degree value of No. 3 texts is greater than first threshold.Server or terminal device can be from No. 1 texts and No. 3 Target text is determined in this 2 texts of text.Optionally, can the first matching degree value in this 2 texts it is biggish that Text (No. 1 text) is determined as target text, this 2 texts can also be all determined as target text.
In second of optional embodiment, server or terminal device are according to the first embodiment according to N number of text It, can also be by a first matching degree values pair of M after this corresponding first matching degree value determines M text from N number of text The M text answered and the multiple texts stored in the second presetting database are matched, and it is M second corresponding to obtain M text Matching degree value, wherein M text and M the second matching degree values correspond;It is determined from M the second matching degree values Greater than the K of second threshold the second matching degree values out, K are the positive integer no more than M;According to K the second matching degree values pair The K text answered, determines target text from K text.Optionally, each text in M text is second corresponding It is the quantity to be matched according to the multiple texts stored in the second presetting database with each text in M text with degree value Determining, wherein the data in the second presetting database include the corresponding training data of outlet side of the speech recognition modeling.
In the embodiment of the present application, due to determining that K text still belongs to the application second stage from M text, i.e., Stage (preparation stage before being trained in other words), therefore, second present count are trained to the speech recognition modeling It can be the training data Y of the outlet side of the first stage speech recognition modeling according to the data in library0.Server or terminal are set It is standby that corresponding second matching degree value of each text in M text can be obtained according to the second presetting database, optionally, if M Some text in a text thinks that matched amount of text is bigger in all texts of the second presetting database, then the text pair The second matching degree value answered is bigger.If some text in M text is thought in all texts of the second presetting database The amount of text matched gets over school, then corresponding second matching degree value of the text is smaller, optionally, each text pair in M text The sum of second matching degree value answered is 1.
Based on above-mentioned example, No. 1 text and No. 3 texts are determined from 3 texts, server or terminal device will The multiple texts stored in No. 1 text and No. 3 texts and the second presetting database are matched, it is assumed that No. 1 text corresponding Two matching degree values are 80%, and corresponding second matching degree value of No. 3 texts is 20%, if second threshold is 60%, No. 1 text This corresponding second matching degree value is greater than first threshold, then No. 1 text can be determined as target text.Assuming that No. 1 text Corresponding second matching degree value is 52%, and corresponding second matching degree value of No. 3 texts is 48%, if second threshold is 45%, then K text is No. 1 text and No. 3 texts, then determines that target text optionally can from No. 1 text and No. 3 texts That biggish text of the second matching degree value is determined as target text.
In the third optional embodiment, server or terminal device export N from the output end of speech recognition modeling Before a text, it can be carried out by each text in N number of text and with multiple texts for being stored in the second presetting database Match, obtains corresponding second matching degree value of each text in N number of text, wherein each text corresponding second in N number of text The sum of matching degree value is 1, for example, in corresponding 3 texts of voice, No. 1 text Y11Corresponding second matching degree value is 70%, No. 2 text Y12Corresponding second matching degree value is 20%, No. 3 text Y13Corresponding second matching degree value is 10%, if second threshold is 60%, retain No. 1 text, then the multiple texts that will be stored in No. 1 text and the first presetting database This is matched, and corresponding first matching degree value of No. 1 text is obtained, if corresponding first matching degree value of No. 1 text is 3000, it is greater than first threshold 2000, then No. 1 text is determined into target text, that is, the corresponding text of the voice.
In the embodiment of the present application, if N number of text is the comparison for first passing through the first matching degree value and first threshold, using The comparison of second matching degree value and second threshold obtains multiple texts, then can be by the second matching degree value in multiple texts That biggish corresponding text is determined as target text.If N number of text first passes through two matching degree values and second threshold Comparison, using the comparison of the first matching degree value and first threshold, obtains multiple texts, then can will be first in multiple texts That biggish corresponding text of matching degree value is determined as target text.
It, can be by voice X in step 2041Training data with target text as speech recognition modeling, to speech recognition Model is trained, can also be by voice X1With voice X0, target text and text Y0Training number as speech recognition modeling According to being trained to speech recognition modeling.
In the prior art, all texts require to write after artificial hearing voice, and the application can pass through The speech recognition modeling that first stage obtains obtains the corresponding text of voice of second stage, deletes by embodiment of above Some undesirable texts carry out speech recognition modeling using satisfactory text and voice as new training data Training, although accuracy at the beginning does not have the original text of human translation, with the increase of frequency of training, the speech recognition modeling Accuracy also can be higher and higher, reach it is required it is accurate calmly.
Fig. 3 illustrates a kind of process signal of the training method of the applicable speech recognition modeling of the embodiment of the present application Figure, comprising:
Step 301, voice is inputted to speech recognition modeling, obtains the corresponding N of voice from the outlet side of speech recognition modeling A text, N are positive integer;
Step 302, by N number of text each text and the first presetting database in multiple texts for storing carry out Match, obtains corresponding N number of first matching degree value of N number of text;
Step 303, N number of first matching degree value is compared with first threshold, if being deposited in N number of first matching degree value It is greater than first threshold in M the first matching degree values, then goes to step 304;If there are N-M in N number of first matching degree value First matching degree value is not more than first threshold, then deletes the N-M corresponding N-M texts of the first matching degree value;
Step 304, the multiple texts stored in the M text and the second presetting database are matched, obtains M text This corresponding M second matching degree value;
Step 305, M the second matching degree values are compared with second threshold, if being deposited in M the second matching degree values It is greater than second threshold in K the second matching degree values, then goes to step 305;If in M the second matching degree values, there are M-K Second matching degree value is not more than second threshold, then deletes the M-K corresponding M-K texts of the second matching degree value;
Step 306, that maximum corresponding text of the second matching degree value in K the second matching degree values is determined as Target text;
Step 307, using voice and target text as the training data of speech recognition modeling, speech recognition modeling is carried out Training.
Step 204 can also perhaps carry out after step 307 walking according to again according to 201-204 step or 301-307 Suddenly the speech recognition modeling is trained, the sample size for making the speech recognition modeling include is more and more, obtains after inputting voice Accuracy to text is higher and higher.
Based on above embodiments and same idea, Fig. 4 shows a kind of speech recognition modeling provided by the embodiments of the present application Training device structural schematic diagram, as shown in figure 4, the training device 400 of speech recognition modeling may include fraud cluster obtain Take unit 401, determination unit 402 and training unit 403.
The embodiment of the present application provides a kind of training device of speech recognition modeling, comprising: acquiring unit, for knowing to voice Other mode input voice obtains the corresponding N number of text of voice from the outlet side of speech recognition modeling, and N is positive integer, determines single Member, for by N number of text each text and presetting database in multiple texts for storing match, obtain N number of text Corresponding N number of matching degree value, wherein N number of text and N number of matching degree value correspond, according to N number of matching degree value and in advance If condition, the corresponding text of matching degree value for meeting preset condition in N number of matching degree value is determined as target text, is instructed Practice unit, for being trained to speech recognition modeling using voice and target text as the training data of speech recognition modeling.
In a kind of optional embodiment, determination unit is specifically used for: by each text in N number of text and first pre- If the multiple texts stored in database are matched, corresponding N number of first matching degree value of N number of text is obtained, wherein N number of Text and N number of first matching degree value correspond, and M greater than first threshold are determined from N number of first matching degree value First matching degree value, M are the positive integer no more than N, according to the M corresponding M texts of the first matching degree value, from M text Target text is determined in this.
In a kind of optional embodiment, for each text in N number of text, corresponding first matching degree value of text It is that the quantity to be matched according to the multiple texts and text that store in the first presetting database determines.
In a kind of optional embodiment, determination unit, be specifically used for by M the first matching degree values it is corresponding M it is literary This is matched with the multiple texts stored in the second presetting database, obtains corresponding M the second matching degree of M text Value, wherein M text and M the second matching degree values correspond, and are determined from M the second matching degree values greater than the K the second matching degree values of two threshold values, K are the positive integer no more than M, according to the corresponding K text of K the second matching degree values This, determines target text from K text.
In a kind of optional embodiment, for each text in M text, corresponding second matching degree value of text It is that the quantity to be matched according to the multiple texts and text that store in the second presetting database determines, in the second presetting database Data include speech recognition modeling the corresponding training data of outlet side.
The training device of speech recognition modeling provided by the embodiments of the present application, which is specifically described, can refer to above-described embodiment offer Speech recognition modeling training method, repeat no more herein.
It should be noted that being schematical, only a kind of logic function to the division of unit in the embodiment of the present application It divides, there may be another division manner in actual implementation.Each functional unit in the embodiment of the present application can integrate one In a processing unit, it is also possible to each unit and physically exists alone, one can also be integrated in two or more units In unit.Above-mentioned integrated unit both can take the form of hardware realization, can also be real in the form of software functional units It is existing.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real Now, it when being realized using software program, can entirely or partly realize in the form of a computer program product.Computer program Product includes one or more instructions.When loading on computers and executing computer program instructions, entirely or partly generate According to the process or function of the embodiment of the present application.Computer can be general purpose computer, special purpose computer, computer network or Other programmable devices of person.Instruction can store in computer storage medium, or from a computer storage medium to another One computer storage medium transmission, for example, instruction can be logical from a web-site, computer, server or data center Wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode are crossed to another One web-site, computer, server or data center are transmitted.Computer storage medium can be computer and can deposit Any usable medium taken either includes that the data storages such as one or more usable mediums integrated server, data center are set It is standby.Usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape, magneto-optic disk (MO) etc.), optical medium (for example, CD, DVD, BD, HVD etc.) or semiconductor medium (such as ROM, EPROM, EEPROM, nonvolatile memory (NANDFLASH), Solid state hard disk (Solid State Disk, SSD)) etc..It should be understood by those skilled in the art that, the embodiment of the present application can provide For method, system or computer program product.Therefore, it is real that complete hardware embodiment, complete software can be used in the embodiment of the present application Apply the form of example or embodiment combining software and hardware aspects.Moreover, the embodiment of the present application can be used it is one or more its In include computer usable program code computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, Optical memory etc.) on the form of computer program product implemented.It should be understood by those skilled in the art that, the application is implemented Example can provide as method, apparatus or computer program product.Therefore, complete hardware embodiment, complete can be used in the embodiment of the present application The form of full software implementation or embodiment combining software and hardware aspects.Moreover, the embodiment of the present application can be used at one Or it is multiple wherein include computer usable program code computer-usable storage medium (including but not limited to disk storage Device, CD-ROM, optical memory etc.) on the form of computer program product implemented.
The embodiment of the present application is referring to according to the method, apparatus of the embodiment of the present application and the process of computer program product Figure and/or block diagram describe.It should be understood that can by instruction implementation flow chart and/or block diagram each process and/or side The combination of process and/or box in frame and flowchart and/or the block diagram.It can provide these to instruct to general purpose computer, specially With the processor of computer, Embedded Processor or other programmable data processing devices to generate a machine, so that passing through The instruction that computer or the processor of other programmable data processing devices execute generates for realizing in one process of flow chart Or the device for the function of being specified in multiple processes and/or one or more blocks of the block diagram.
These instructions, which may also be stored in, is able to guide computer or other programmable data processing devices work in a specific way Computer-readable memory in so that it is stored in the computer readable memory instruction generate include command device system Product are made, which realizes in one or more flows of the flowchart and/or one or more blocks of the block diagram Specified function.
These instructions also can be loaded onto a computer or other programmable data processing device so that computer or other Series of operation steps are executed on programmable device to generate computer implemented processing, thus in computer or other are programmable The instruction that executes in equipment is provided for realizing in one box of one or more flows of the flowchart and/or block diagram or more The step of function of being specified in a box.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application Spirit and scope please.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application and its wait Within the scope of technology, then the application is also intended to include these modifications and variations.

Claims (8)

1. a kind of training method of speech recognition modeling characterized by comprising
Voice is inputted to speech recognition modeling, obtains the corresponding N number of text of the voice from the outlet side of the speech recognition modeling This, the N is positive integer;
By in N number of text each text and presetting database in multiple texts for storing match, obtain described N number of The corresponding N number of matching degree value of text, wherein N number of text and N number of matching degree value correspond;
According to N number of matching degree value and preset condition, by the matching for meeting preset condition in N number of matching degree value The corresponding text of degree value is determined as target text;It specifically includes:
By in N number of text each text and the first presetting database in multiple texts for storing match, obtain institute State corresponding N number of first matching degree value of N number of text, wherein N number of text and N number of first matching degree value are one by one It is corresponding;
M the first matching degree values greater than first threshold are determined from N number of first matching degree value, the M is not Greater than the positive integer of the N;
According to the M corresponding M texts of the first matching degree value, the target text is determined from the M text;
Using the voice and the target text as the training data of the speech recognition modeling, to the speech recognition modeling It is trained.
2. the method as described in claim 1, which is characterized in that for each text in N number of text, the text pair The first matching degree value answered is matched according to the multiple texts and the text that store in first presetting database What quantity determined.
3. the method as described in claim 1, which is characterized in that described M corresponding according to the M the first matching degree values Text determines the target text from the M text, comprising:
The multiple texts stored in the M corresponding M texts of the first matching degree value and the second presetting database are carried out Matching obtains corresponding M the second matching degree value of the M text, wherein the M text is matched with the M second Degree value corresponds;
K the second matching degree values greater than second threshold are determined from the M the second matching degree values, the K is not Greater than the positive integer of the M;
According to the K corresponding K texts of the second matching degree value, the target text is determined from the K text.
4. method as claimed in claim 3, which is characterized in that for each text in the M text, the text pair The second matching degree value answered is matched according to the multiple texts and the text that store in second presetting database What quantity determined;
Data in second presetting database include the corresponding training data of outlet side of the speech recognition modeling.
5. a kind of training device of speech recognition modeling characterized by comprising
Acquiring unit obtains institute's predicate from the outlet side of the speech recognition modeling for inputting voice to speech recognition modeling The corresponding N number of text of sound, the N are positive integer;
Determination unit, for by N number of text each text and presetting database in multiple texts for storing carry out Match, obtain the corresponding N number of matching degree value of N number of text, wherein N number of text and N number of matching degree value are one by one It is corresponding;According to N number of matching degree value and preset condition, by for meeting preset condition in N number of matching degree value It is determined as target text with the corresponding text of degree value;It is specifically used for: each text in N number of text is preset with first The multiple texts stored in database are matched, and obtain corresponding N number of first matching degree value of N number of text, wherein institute It states N number of text and N number of first matching degree value corresponds;It determines to be greater than from N number of first matching degree value M the first matching degree values of first threshold, the M are the positive integer no more than the N;According to the M first matching journey The corresponding M text of angle value, determines the target text from the M text;
Training unit, for using the voice and the target text as the training data of the speech recognition modeling, to institute Speech recognition modeling is stated to be trained.
6. device as claimed in claim 5, which is characterized in that for each text in N number of text, the text pair The first matching degree value answered is matched according to the multiple texts and the text that store in first presetting database What quantity determined.
7. device as claimed in claim 6, which is characterized in that the determination unit is specifically used for:
The multiple texts stored in the M corresponding M texts of the first matching degree value and the second presetting database are carried out Matching obtains corresponding M the second matching degree value of the M text, wherein the M text is matched with the M second Degree value corresponds;K the second matching degrees greater than second threshold are determined from the M the second matching degree values Value, the K are the positive integer no more than the M;According to the K corresponding K texts of the second matching degree value, from the K The target text is determined in a text.
8. device as claimed in claim 7, which is characterized in that for each text in the M text, the text pair The second matching degree value answered is matched according to the multiple texts and the text that store in second presetting database What quantity determined;Data in second presetting database include the corresponding trained number of outlet side of the speech recognition modeling According to.
CN201810433323.8A 2018-05-08 2018-05-08 A kind of training method and device of speech recognition modeling Active CN108831441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810433323.8A CN108831441B (en) 2018-05-08 2018-05-08 A kind of training method and device of speech recognition modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810433323.8A CN108831441B (en) 2018-05-08 2018-05-08 A kind of training method and device of speech recognition modeling

Publications (2)

Publication Number Publication Date
CN108831441A CN108831441A (en) 2018-11-16
CN108831441B true CN108831441B (en) 2019-08-13

Family

ID=64148524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810433323.8A Active CN108831441B (en) 2018-05-08 2018-05-08 A kind of training method and device of speech recognition modeling

Country Status (1)

Country Link
CN (1) CN108831441B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109830229A (en) * 2018-12-11 2019-05-31 平安科技(深圳)有限公司 Audio corpus intelligence cleaning method, device, storage medium and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103165129A (en) * 2011-12-13 2013-06-19 北京百度网讯科技有限公司 Method and system for optimizing voice recognition acoustic model
CN103187052A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for establishing linguistic model for voice recognition
CN106558306A (en) * 2015-09-28 2017-04-05 广东新信通信息系统服务有限公司 Method for voice recognition, device and equipment
CN107657947A (en) * 2017-09-20 2018-02-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device based on artificial intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7529668B2 (en) * 2004-08-03 2009-05-05 Sony Corporation System and method for implementing a refined dictionary for speech recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103165129A (en) * 2011-12-13 2013-06-19 北京百度网讯科技有限公司 Method and system for optimizing voice recognition acoustic model
CN103187052A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for establishing linguistic model for voice recognition
CN106558306A (en) * 2015-09-28 2017-04-05 广东新信通信息系统服务有限公司 Method for voice recognition, device and equipment
CN107657947A (en) * 2017-09-20 2018-02-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device based on artificial intelligence

Also Published As

Publication number Publication date
CN108831441A (en) 2018-11-16

Similar Documents

Publication Publication Date Title
CN110807515B (en) Model generation method and device
CN112699991B (en) Method, electronic device, and computer-readable medium for accelerating information processing for neural network training
CN109492128B (en) Method and apparatus for generating a model
CN108734293B (en) Task management system, method and device
US11521038B2 (en) Electronic apparatus and control method thereof
US20180336207A1 (en) Data clustering
CN110019263B (en) Information storage method and device
CN112541353A (en) Video generation method, device, equipment and medium
CN109829164B (en) Method and device for generating text
CN109977905B (en) Method and apparatus for processing fundus images
CN110147433A (en) A kind of text template extracting method based on dictionary tree
CN110399499A (en) Corpus generation method and device, electronic equipment and readable storage medium
CN109214785A (en) Implementation method, server and the system of workflow
US20220156503A1 (en) Reinforcement Learning Techniques for Automated Video Summarization
CN108831441B (en) A kind of training method and device of speech recognition modeling
WO2022134946A1 (en) Model training method, apparatus, storage medium, and device
CN114462582A (en) Data processing method, device and equipment based on convolutional neural network model
CN110717539A (en) Dimension reduction model training method, retrieval method and device based on artificial intelligence
US20200012649A1 (en) System and method for adaptive information storage management
WO2024139703A1 (en) Object recognition model updating method and apparatus, electronic device, storage medium, and computer program product
CN113741864A (en) Automatic design method and system of semantic service interface based on natural language processing
CN113568888A (en) Index recommendation method and device
CN111353860A (en) Product information pushing method and system
CN112182111A (en) Block chain based distributed system layered processing method and electronic equipment
US20210124748A1 (en) System and a method for resource data classification and management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 200233 Yizhou Road, Xuhui District, Shanghai, 180, 1, first, 01, 02 rooms.

Co-patentee after: Shanghai ituzhian artificial intelligence Co.,Ltd.

Patentee after: SHANGHAI YITU TECHNOLOGY Co.,Ltd.

Co-patentee after: SHENZHEN YITU INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 200233 Yizhou Road, Xuhui District, Shanghai, 180, 1, first, 01, 02 rooms.

Co-patentee before: SHANGHAI TUZHIAN NETWORK TECHNOLOGY Co.,Ltd.

Patentee before: SHANGHAI YITU TECHNOLOGY Co.,Ltd.

Co-patentee before: SHENZHEN YITU INFORMATION TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20190926

Address after: Room 1901E, 488 Yaohua Road, Pudong New Area, Shanghai 201125

Patentee after: Shanghai Yitu Information Technology Co.,Ltd.

Address before: 200233 Yizhou Road, Xuhui District, Shanghai, 180, 1, first, 01, 02 rooms.

Co-patentee before: Shanghai ituzhian artificial intelligence Co.,Ltd.

Patentee before: SHANGHAI YITU TECHNOLOGY Co.,Ltd.

Co-patentee before: SHENZHEN YITU INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right