CN111916057A - Language identification method and device, electronic equipment and computer readable storage medium - Google Patents
Language identification method and device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN111916057A CN111916057A CN202010569842.4A CN202010569842A CN111916057A CN 111916057 A CN111916057 A CN 111916057A CN 202010569842 A CN202010569842 A CN 202010569842A CN 111916057 A CN111916057 A CN 111916057A
- Authority
- CN
- China
- Prior art keywords
- language
- target
- module
- identification model
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000012549 training Methods 0.000 claims description 44
- 238000012545 processing Methods 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 19
- 238000012360 testing method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 abstract description 11
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 6
- 238000003786 synthesis reaction Methods 0.000 description 6
- 241001672694 Citrus reticulata Species 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005034 decoration Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the application provides a language identification method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: identifying the acquired language as a first target language; wherein the acquired language is voice information; matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model; and outputting the matched first target answer according to the judged language content. The technical scheme based on the invention can realize that the neural network is utilized to construct the language identification model, can more accurately identify the language type and output the language of the same type as the input language for response, and improves the product experience and affinity of users.
Description
Technical Field
The invention relates to the technical field of intelligent decision making, in particular to a language identification method, a language identification device, electronic equipment and a computer readable storage medium.
Background
The traditional manual customer service dialing mode has difficulty in meeting the business scenes of many companies due to low efficiency and high cost. With the development of artificial intelligence and natural language understanding technology and the progress of the traditional outbound technology, the intelligent outbound system gradually replaces a plurality of traditional service scenes dialed by artificial customer service due to higher concurrency efficiency and lower cost overhead. However, in the case of complex situations of user groups with multiple geographic areas and dialects, a single language recognition model has a low recognition rate for different dialects, and the intelligent outbound system cannot well cope with the multi-dialect scene.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks. The technical scheme adopted by the application is as follows:
in a first aspect, an embodiment of the present application provides a language identification method, where the method includes:
identifying the acquired language as a first target language; wherein the acquired language is voice information;
matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model;
and outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
Optionally, the language identification model includes: pre-acquiring training data of at least one language;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
Optionally, the obtaining the language identification model by performing the training process using the convolutional neural network model and the training data of the at least one language further includes:
converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set;
inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model;
and testing the language identification model by using a regression classifier and a test set two-dimensional spectrogram.
Optionally, the outputting the matched first target answer according to the determined language content specifically includes:
according to the judged language content, the language identification model outputs a second target answer in a matched text form;
processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
In a second aspect, the present invention provides a speech recognition apparatus, comprising: the device comprises an input module, an identification module, a matching module, a judgment module and an output module; wherein,
the recognition module is used for recognizing the language acquired by the input module as a first target language; wherein the acquired language is voice information;
the matching module is used for matching the language identification model corresponding to the first target language type;
the judging module is used for judging the content of the first target language according to the language identification model;
and the output module is used for outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
Optionally, the language identification model includes:
the input module acquires training data of at least one language in advance;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
Optionally, the apparatus further comprises a language processing module;
according to the language content judged by the judging module, the output module outputs a second target answer in a matched text form;
the language processing module is used for processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the language identification method by calling the operation instruction.
In a fourth aspect, a computer-readable storage medium is characterized in that the storage medium has stored thereon a computer program, which when executed by a processor implements the above-mentioned method of language identification.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
according to the scheme provided by the embodiment of the application, the acquired language is identified as the first target language; wherein the acquired language is voice information; matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model; and outputting the matched first target answer according to the judged language content. Based on the scheme, the language identification model can be constructed by utilizing the neural network, the language type can be identified more accurately, the language of the same type as the input language is output to respond, and the product experience and affinity of a user are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a language identification method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a method for dialect recognition using a convolutional neural network according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an intelligent outbound system design based on dialect type identification according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The current intelligent outbound system can only set one language identification model, for example, the unified mandarin identification model is set, but when dialect users in places such as Sichuan, Shanghai or Guangdong are targeted, a lot of users using local dialects exist, but the unified mandarin identification model has a low recognition rate for different dialects, so that the question and answer of the business process are influenced, and the user experience is seriously influenced. And moreover, a model capable of identifying multiple dialects simultaneously is trained, so that the difficulty is high, and the accuracy cannot be guaranteed. Therefore, the intelligent outbound system needs a solution that can simultaneously deal with various dialects, mandarins, and even foreign languages and ensure a high recognition rate. The invention aims at the problems and designs an intelligent outbound method and system based on different types of language identification, which can well improve the poor experience brought by the local dialect used by different dialect user groups in different regions.
The embodiment of the application provides a language identification method, a language identification device, an electronic device and a computer-readable storage medium, which aim to solve at least one of the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic flowchart of a language identification method provided in an embodiment of the present application, and as shown in fig. 1, the method mainly includes:
step S101, identifying the acquired language as a first target language; wherein the acquired language is voice information;
step S102, matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model;
and S103, outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
In the specific embodiments of the present application, the following are specifically described: IVR (Interactive Voice Response), which is an Interactive Voice Response, is a powerful automatic telephone service system. In the embodiment, the IVR is used for acquiring and identifying the acquired language as a first target language; the obtained language is voice information, namely the voice information of the dialect user is obtained from the IVR, and a language identification model of the corresponding dialect is selected according to the type of the language, such as a dialect type identification result; performing intention judgment on a voice recognition result through natural voice processing, judging the content of the obtained dialect, and returning a first target answer, namely a corresponding text answer; and selecting a voice synthesis model of the corresponding dialect according to the dialect type recognition result, synthesizing the first target answer, namely the text answer, into the corresponding voice, for example, obtaining the dialect, namely the synthesized dialect, and playing the dialect answer to the user through the IVR.
Optionally, the language identification model includes: pre-acquiring training data of at least one language;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model. Optionally, the obtaining the language identification model by performing the training process using the convolutional neural network model and the training data of the at least one language further includes: converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set; inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model; and testing the language identification model by using a regression classifier and a test set two-dimensional spectrogram.
In the embodiments of the present application, the language is used as a dialect for describing the embodiments. The dialect voice of the user is obtained, the dialect type of the user is judged by identifying a key value in the receiver dialect, and the dialect is defaulted to be mandarin when the judgment value is lower than a judgment threshold value. And when the dialect type is judged, the corresponding language recognition model is selected for the subsequent speech recognition and speech synthesis, so that the recognition rate of the speech recognition is improved, and the speech audio of the corresponding dialect is synthesized. Dialect class identification uses CNN (convolutional neural network) for model training and dialect class classification. Converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set; inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model, specifically, converting a labeled dialect audio file comprising single words, words and sentences in a wav format into the two-dimensional spectrogram through windowing and framing and short-time Fourier transform to obtain the training set and the test set. Initializing a CNN network model, training the model by using a training set, and performing classification verification on the language type by using a Softmax regression classifier. And testing by using the test set after the trained model is obtained. The two-dimensional spectrogram is used for converting the frequency domain of the voice signal, so that the method has the advantages of avoiding the interference caused by noise and better embodying the characteristics of the voice. The two-dimensional spectrogram is obtained by performing short-time Fourier transform (STFT) on a continuous voice signal, namely windowing and framing a long signal, selecting a Hamming window or a rectangular window and the like, performing Fast Fourier Transform (FFT) on each frame, and finally stacking the results of each frame along the other dimension to obtain the spectrogram. The method comprises the following specific steps:
let the discrete time domain sampling signal be x (N), where N is 0,1, and N-1, where N is the time domain sampling point number and N is the total number of sampling points. When the signal is windowed and frame-divided, x (n) is expressed as xn(m), N is 0,1, N-1, where N is a frame number, m is a frame synchronization time number, and N is the number of sampling points in one frame. Then the short-time Fourier transform of x (n) is:
where w (n) is a selected window function, the discrete time domain fourier transform of signal x (n) is:
wherein k is greater than or equal to 0 and less than or equal to N-1, then | X (N, k) | is the spectral estimation of X (N), and then the time m spectral energy density function (two-dimensional spectrogram) is:
P(n,k)=|X(n,k)|2
fig. 2 is a schematic diagram of a dialect recognition method using a convolutional neural network, in which a dialect user responds to an outbound call, arrives through an operator, obtains a voice signal of the dialect user through an IVR, and performs short-time fourier transform on the signal to obtain a two-dimensional spectrogram. And carrying out dialect type identification on the trained CNN model.
Optionally, in an embodiment of the present application, the outputting the matched first target answer according to the determined language content specifically includes: according to the judged language content, the language identification model outputs a second target answer in a matched text form; processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language. Specifically, the technical solution is further described by taking dialects as an example and combining actual life scenes. The user answers the outgoing call by the voice (i.e. receives the voice message), identifies the dialect to which the user belongs through the dialect type (identifies the voice type through the language identification model), and judges the corresponding key value. The dialect type is a dialect voice model corresponding to the own dialect, the selected language identification model is judged through a key value, the voice is transferred to characters with high accuracy, then the result is processed by natural language technology to analyze the intention of the user and give response characters, finally, the corresponding language synthesis model or the dialect model is selected through the key value, and the dialect voice (target answer) corresponding to the dialect user is synthesized to be listened by the user. Optionally, in order to improve response efficiency, the dialect type identification may determine the valid dialect type key value only when the dialect user responds for the first time, and cache the key value in the IVR, and when the user responds again, directly read the valid dialect type key value from the cache for voice identification and voice synthesis, but not perform the dialect type identification again.
FIG. 3 is a schematic diagram of the design of an intelligent outbound call system based on dialect type identification, wherein an outbound call platform of the intelligent outbound call system makes an outbound call to a specific dialect user through a telecommunication operator and plays a welcome language; the dialect user responds to the welcome language (acquires the language); if the IVR does not cache the effective dialect type key value, the dialect user response voice is subjected to dialect type identification through the IVR; fifthly, caching the identified dialect type effective key value into the IVR passing through this time; if the effective dialect category key value exists, selecting a language identification model corresponding to the key value (matching the corresponding language identification model according to the first target language category), and converting dialect user response voice into characters; seventhly, the recognized characters are sent to a natural language processing module to judge the voice content of the dialect user, and response characters corresponding to the voice content are returned (according to the judged language content, the language recognition model outputs a second target answer in a matched text form); sending the outbound response characters to a voice synthesis module, acquiring an effective dialect category key value from a cache according to the ninthly, selecting a corresponding voice synthesis model, and synthesizing corresponding audio data (processing a second target answer in a text form into a first target answer, wherein the first target answer is the voice with the same category as the first target language); the synthesized outbound response voice is processed by IVR and ROperator networkAnswering the dialect user to complete a turn and dialectAnd (5) interactive response of the user.
Fig. 4 is a diagram of a speech recognition apparatus according to the present invention, the apparatus including: a 401 input module, a 402 identification module, a 403 matching module, a 404 judgment module and a 405 output module; wherein,
the recognition module is used for recognizing the language acquired by the input module as a first target language; wherein the acquired language is voice information;
the matching module is used for matching the language identification model corresponding to the first target language type;
the judging module is used for judging the content of the first target language according to the language identification model;
and the output module is used for outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
Optionally, the language identification model includes:
the input module acquires training data of at least one language in advance;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
Optionally, the apparatus further comprises a language processing module;
according to the language content judged by the judging module, the output module outputs a second target answer in a matched text form;
the language processing module is used for processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
It is understood that the above modules of the language identification device in the present embodiment have functions of implementing the corresponding steps of the method in the embodiment shown in fig. 1. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module, reference may be specifically made to the corresponding description of the method in the embodiment shown in fig. 1, and details are not repeated here.
The embodiment of the application provides an electronic device, which comprises a processor and a memory;
a memory for storing operating instructions;
and the processor is used for executing the language identification method provided by any embodiment of the application by calling the operation instruction.
As an example, fig. 5 shows a schematic structural diagram of an electronic device to which an embodiment of the present application is applicable, and as shown in fig. 5, the electronic device 2000 includes: a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to a memory 2003, such as via a bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present application.
The processor 2001 is applied to the embodiment of the present application to implement the method shown in the above method embodiment. The transceiver 2004 may include a receiver and a transmitter, and the transceiver 2004 is applied to the embodiments of the present application to implement the functions of the electronic device of the embodiments of the present application to communicate with other devices when executed.
The Processor 2001 may be a CPU (Central Processing Unit), general Processor, DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array) or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, etc.
The Memory 2003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
Optionally, the memory 2003 is used for storing application program code for performing the disclosed aspects, and is controlled in execution by the processor 2001. The processor 2001 is configured to execute the application program code stored in the memory 2003 to implement the language identification method provided in any of the embodiments of the present application.
The electronic device provided by the embodiment of the application is applicable to any embodiment of the method, and is not described herein again.
The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the language identification method shown in the above method embodiment.
The computer-readable storage medium provided in the embodiments of the present application is applicable to any of the embodiments of the foregoing method, and is not described herein again.
According to the scheme provided by the embodiment of the application, the acquired language is identified as the first target language; wherein the acquired language is voice information; matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model; and outputting the matched first target answer according to the judged language content. Based on the scheme, the language identification model can be constructed by utilizing the neural network, the language type can be identified more accurately, the language of the same type as the input language is output to respond, and the product experience and affinity of a user are improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A method of language identification, the method comprising:
identifying the acquired language as a first target language; wherein the acquired language is voice information;
matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model;
and outputting the matched first target answer according to the judged language content.
2. The method according to claim 1, wherein the language includes a national language or a local dialect.
3. The language identification method according to claim 1, wherein the language identification model comprises:
pre-acquiring training data of at least one language;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
4. The method of claim 3, wherein the obtaining the language identification model by training with the convolutional neural network model and the training data of the at least one language further comprises:
converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set;
inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model;
and testing the language identification model by using a regression classifier and a test set two-dimensional spectrogram.
5. The method according to claim 1 or 4, wherein the outputting the matched first target answer according to the determined language content specifically includes:
according to the judged language content, the language identification model outputs a second target answer in a matched text form;
processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
6. A speech recognition apparatus, the apparatus comprising: the device comprises an input module, an identification module, a matching module, a judgment module and an output module; wherein,
the recognition module is used for recognizing the language acquired by the input module as a first target language; wherein the acquired language is voice information;
the matching module is used for matching the language identification model corresponding to the first target language type;
the judging module is used for judging the content of the first target language according to the language identification model;
and the output module is used for outputting the matched first target answer according to the judged language content.
7. The language recognition device of claim 6, wherein the language recognition model comprises:
the input module acquires training data of at least one language in advance;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
8. The speech recognition apparatus of claim 6 or 7, wherein the apparatus further comprises a language processing module;
according to the language content judged by the judging module, the output module outputs a second target answer in a matched text form;
the language processing module is used for processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
9. An electronic device comprising a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the method of any one of claims 1-5 by calling the operation instruction.
10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010569842.4A CN111916057A (en) | 2020-06-20 | 2020-06-20 | Language identification method and device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010569842.4A CN111916057A (en) | 2020-06-20 | 2020-06-20 | Language identification method and device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111916057A true CN111916057A (en) | 2020-11-10 |
Family
ID=73226088
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010569842.4A Pending CN111916057A (en) | 2020-06-20 | 2020-06-20 | Language identification method and device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111916057A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114495931A (en) * | 2022-01-28 | 2022-05-13 | 达闼机器人股份有限公司 | Voice interaction method, system, device, equipment and storage medium |
CN117995166A (en) * | 2024-01-26 | 2024-05-07 | 长沙通诺信息科技有限责任公司 | Natural language data analysis method and system based on voice recognition |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150006148A1 (en) * | 2013-06-27 | 2015-01-01 | Microsoft Corporation | Automatically Creating Training Data For Language Identifiers |
CN104391673A (en) * | 2014-11-20 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method and voice interaction device |
CN105957516A (en) * | 2016-06-16 | 2016-09-21 | 百度在线网络技术(北京)有限公司 | Switching method and device for multiple voice identification models |
CN109256118A (en) * | 2018-10-22 | 2019-01-22 | 江苏师范大学 | End-to-end Chinese dialects identifying system and method based on production auditory model |
CN110211565A (en) * | 2019-05-06 | 2019-09-06 | 平安科技(深圳)有限公司 | Accent recognition method, apparatus and computer readable storage medium |
CN110827793A (en) * | 2019-10-21 | 2020-02-21 | 成都大公博创信息技术有限公司 | Language identification method |
-
2020
- 2020-06-20 CN CN202010569842.4A patent/CN111916057A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150006148A1 (en) * | 2013-06-27 | 2015-01-01 | Microsoft Corporation | Automatically Creating Training Data For Language Identifiers |
CN104391673A (en) * | 2014-11-20 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method and voice interaction device |
CN105957516A (en) * | 2016-06-16 | 2016-09-21 | 百度在线网络技术(北京)有限公司 | Switching method and device for multiple voice identification models |
US20190096396A1 (en) * | 2016-06-16 | 2019-03-28 | Baidu Online Network Technology (Beijing) Co., Ltd. | Multiple Voice Recognition Model Switching Method And Apparatus, And Storage Medium |
CN109256118A (en) * | 2018-10-22 | 2019-01-22 | 江苏师范大学 | End-to-end Chinese dialects identifying system and method based on production auditory model |
CN110211565A (en) * | 2019-05-06 | 2019-09-06 | 平安科技(深圳)有限公司 | Accent recognition method, apparatus and computer readable storage medium |
CN110827793A (en) * | 2019-10-21 | 2020-02-21 | 成都大公博创信息技术有限公司 | Language identification method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114495931A (en) * | 2022-01-28 | 2022-05-13 | 达闼机器人股份有限公司 | Voice interaction method, system, device, equipment and storage medium |
CN117995166A (en) * | 2024-01-26 | 2024-05-07 | 长沙通诺信息科技有限责任公司 | Natural language data analysis method and system based on voice recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111667814B (en) | Multilingual speech synthesis method and device | |
CN111048064B (en) | Voice cloning method and device based on single speaker voice synthesis data set | |
CN110782872A (en) | Language identification method and device based on deep convolutional recurrent neural network | |
CN111477216A (en) | Training method and system for pronunciation understanding model of conversation robot | |
CN103377651B (en) | The automatic synthesizer of voice and method | |
CN103514882A (en) | Voice identification method and system | |
CN111312292A (en) | Emotion recognition method and device based on voice, electronic equipment and storage medium | |
CN111508466A (en) | Text processing method, device and equipment and computer readable storage medium | |
CN114416989A (en) | Text classification model optimization method and device | |
CN111916057A (en) | Language identification method and device, electronic equipment and computer readable storage medium | |
US7844459B2 (en) | Method for creating a speech database for a target vocabulary in order to train a speech recognition system | |
CN116631412A (en) | Method for judging voice robot through voiceprint matching | |
CN115116458A (en) | Voice data conversion method and device, computer equipment and storage medium | |
CN111640423B (en) | Word boundary estimation method and device and electronic equipment | |
CN113724698B (en) | Training method, device, equipment and storage medium of voice recognition model | |
CN117351948A (en) | Training method of voice recognition model, voice recognition method, device and equipment | |
Zhu | [Retracted] Multimedia Recognition of Piano Music Based on the Hidden Markov Model | |
CN117496960A (en) | Training method and device of voice recognition model, electronic equipment and storage medium | |
CN113823271B (en) | Training method and device for voice classification model, computer equipment and storage medium | |
CN115798456A (en) | Cross-language emotion voice synthesis method and device and computer equipment | |
CN114566156A (en) | Keyword speech recognition method and device | |
CN113921042A (en) | Voice desensitization method and device, electronic equipment and storage medium | |
CN111899738A (en) | Dialogue generating method, device and storage medium | |
CN113505612B (en) | Multi-user dialogue voice real-time translation method, device, equipment and storage medium | |
CN117672221B (en) | Information transmission communication control method and system through voice control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220914 Address after: 12 / F, 15 / F, 99 Yincheng Road, Pudong New Area pilot Free Trade Zone, Shanghai, 200120 Applicant after: Jianxin Financial Science and Technology Co.,Ltd. Address before: 25 Financial Street, Xicheng District, Beijing 100033 Applicant before: CHINA CONSTRUCTION BANK Corp. Applicant before: Jianxin Financial Science and Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201110 |
|
RJ01 | Rejection of invention patent application after publication |