[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111916057A - Language identification method and device, electronic equipment and computer readable storage medium - Google Patents

Language identification method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111916057A
CN111916057A CN202010569842.4A CN202010569842A CN111916057A CN 111916057 A CN111916057 A CN 111916057A CN 202010569842 A CN202010569842 A CN 202010569842A CN 111916057 A CN111916057 A CN 111916057A
Authority
CN
China
Prior art keywords
language
target
module
identification model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010569842.4A
Other languages
Chinese (zh)
Inventor
张�浩
李志福
艾巍
鹿江锋
杨邻瑞
谢隆飞
邵小亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202010569842.4A priority Critical patent/CN111916057A/en
Publication of CN111916057A publication Critical patent/CN111916057A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a language identification method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: identifying the acquired language as a first target language; wherein the acquired language is voice information; matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model; and outputting the matched first target answer according to the judged language content. The technical scheme based on the invention can realize that the neural network is utilized to construct the language identification model, can more accurately identify the language type and output the language of the same type as the input language for response, and improves the product experience and affinity of users.

Description

Language identification method and device, electronic equipment and computer readable storage medium
Technical Field
The invention relates to the technical field of intelligent decision making, in particular to a language identification method, a language identification device, electronic equipment and a computer readable storage medium.
Background
The traditional manual customer service dialing mode has difficulty in meeting the business scenes of many companies due to low efficiency and high cost. With the development of artificial intelligence and natural language understanding technology and the progress of the traditional outbound technology, the intelligent outbound system gradually replaces a plurality of traditional service scenes dialed by artificial customer service due to higher concurrency efficiency and lower cost overhead. However, in the case of complex situations of user groups with multiple geographic areas and dialects, a single language recognition model has a low recognition rate for different dialects, and the intelligent outbound system cannot well cope with the multi-dialect scene.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks. The technical scheme adopted by the application is as follows:
in a first aspect, an embodiment of the present application provides a language identification method, where the method includes:
identifying the acquired language as a first target language; wherein the acquired language is voice information;
matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model;
and outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
Optionally, the language identification model includes: pre-acquiring training data of at least one language;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
Optionally, the obtaining the language identification model by performing the training process using the convolutional neural network model and the training data of the at least one language further includes:
converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set;
inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model;
and testing the language identification model by using a regression classifier and a test set two-dimensional spectrogram.
Optionally, the outputting the matched first target answer according to the determined language content specifically includes:
according to the judged language content, the language identification model outputs a second target answer in a matched text form;
processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
In a second aspect, the present invention provides a speech recognition apparatus, comprising: the device comprises an input module, an identification module, a matching module, a judgment module and an output module; wherein,
the recognition module is used for recognizing the language acquired by the input module as a first target language; wherein the acquired language is voice information;
the matching module is used for matching the language identification model corresponding to the first target language type;
the judging module is used for judging the content of the first target language according to the language identification model;
and the output module is used for outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
Optionally, the language identification model includes:
the input module acquires training data of at least one language in advance;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
Optionally, the apparatus further comprises a language processing module;
according to the language content judged by the judging module, the output module outputs a second target answer in a matched text form;
the language processing module is used for processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the language identification method by calling the operation instruction.
In a fourth aspect, a computer-readable storage medium is characterized in that the storage medium has stored thereon a computer program, which when executed by a processor implements the above-mentioned method of language identification.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
according to the scheme provided by the embodiment of the application, the acquired language is identified as the first target language; wherein the acquired language is voice information; matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model; and outputting the matched first target answer according to the judged language content. Based on the scheme, the language identification model can be constructed by utilizing the neural network, the language type can be identified more accurately, the language of the same type as the input language is output to respond, and the product experience and affinity of a user are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a language identification method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a method for dialect recognition using a convolutional neural network according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an intelligent outbound system design based on dialect type identification according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The current intelligent outbound system can only set one language identification model, for example, the unified mandarin identification model is set, but when dialect users in places such as Sichuan, Shanghai or Guangdong are targeted, a lot of users using local dialects exist, but the unified mandarin identification model has a low recognition rate for different dialects, so that the question and answer of the business process are influenced, and the user experience is seriously influenced. And moreover, a model capable of identifying multiple dialects simultaneously is trained, so that the difficulty is high, and the accuracy cannot be guaranteed. Therefore, the intelligent outbound system needs a solution that can simultaneously deal with various dialects, mandarins, and even foreign languages and ensure a high recognition rate. The invention aims at the problems and designs an intelligent outbound method and system based on different types of language identification, which can well improve the poor experience brought by the local dialect used by different dialect user groups in different regions.
The embodiment of the application provides a language identification method, a language identification device, an electronic device and a computer-readable storage medium, which aim to solve at least one of the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic flowchart of a language identification method provided in an embodiment of the present application, and as shown in fig. 1, the method mainly includes:
step S101, identifying the acquired language as a first target language; wherein the acquired language is voice information;
step S102, matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model;
and S103, outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
In the specific embodiments of the present application, the following are specifically described: IVR (Interactive Voice Response), which is an Interactive Voice Response, is a powerful automatic telephone service system. In the embodiment, the IVR is used for acquiring and identifying the acquired language as a first target language; the obtained language is voice information, namely the voice information of the dialect user is obtained from the IVR, and a language identification model of the corresponding dialect is selected according to the type of the language, such as a dialect type identification result; performing intention judgment on a voice recognition result through natural voice processing, judging the content of the obtained dialect, and returning a first target answer, namely a corresponding text answer; and selecting a voice synthesis model of the corresponding dialect according to the dialect type recognition result, synthesizing the first target answer, namely the text answer, into the corresponding voice, for example, obtaining the dialect, namely the synthesized dialect, and playing the dialect answer to the user through the IVR.
Optionally, the language identification model includes: pre-acquiring training data of at least one language;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model. Optionally, the obtaining the language identification model by performing the training process using the convolutional neural network model and the training data of the at least one language further includes: converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set; inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model; and testing the language identification model by using a regression classifier and a test set two-dimensional spectrogram.
In the embodiments of the present application, the language is used as a dialect for describing the embodiments. The dialect voice of the user is obtained, the dialect type of the user is judged by identifying a key value in the receiver dialect, and the dialect is defaulted to be mandarin when the judgment value is lower than a judgment threshold value. And when the dialect type is judged, the corresponding language recognition model is selected for the subsequent speech recognition and speech synthesis, so that the recognition rate of the speech recognition is improved, and the speech audio of the corresponding dialect is synthesized. Dialect class identification uses CNN (convolutional neural network) for model training and dialect class classification. Converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set; inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model, specifically, converting a labeled dialect audio file comprising single words, words and sentences in a wav format into the two-dimensional spectrogram through windowing and framing and short-time Fourier transform to obtain the training set and the test set. Initializing a CNN network model, training the model by using a training set, and performing classification verification on the language type by using a Softmax regression classifier. And testing by using the test set after the trained model is obtained. The two-dimensional spectrogram is used for converting the frequency domain of the voice signal, so that the method has the advantages of avoiding the interference caused by noise and better embodying the characteristics of the voice. The two-dimensional spectrogram is obtained by performing short-time Fourier transform (STFT) on a continuous voice signal, namely windowing and framing a long signal, selecting a Hamming window or a rectangular window and the like, performing Fast Fourier Transform (FFT) on each frame, and finally stacking the results of each frame along the other dimension to obtain the spectrogram. The method comprises the following specific steps:
let the discrete time domain sampling signal be x (N), where N is 0,1, and N-1, where N is the time domain sampling point number and N is the total number of sampling points. When the signal is windowed and frame-divided, x (n) is expressed as xn(m), N is 0,1, N-1, where N is a frame number, m is a frame synchronization time number, and N is the number of sampling points in one frame. Then the short-time Fourier transform of x (n) is:
Figure BDA0002549151470000061
where w (n) is a selected window function, the discrete time domain fourier transform of signal x (n) is:
Figure BDA0002549151470000062
wherein k is greater than or equal to 0 and less than or equal to N-1, then | X (N, k) | is the spectral estimation of X (N), and then the time m spectral energy density function (two-dimensional spectrogram) is:
P(n,k)=|X(n,k)|2
fig. 2 is a schematic diagram of a dialect recognition method using a convolutional neural network, in which a dialect user responds to an outbound call, arrives through an operator, obtains a voice signal of the dialect user through an IVR, and performs short-time fourier transform on the signal to obtain a two-dimensional spectrogram. And carrying out dialect type identification on the trained CNN model.
Optionally, in an embodiment of the present application, the outputting the matched first target answer according to the determined language content specifically includes: according to the judged language content, the language identification model outputs a second target answer in a matched text form; processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language. Specifically, the technical solution is further described by taking dialects as an example and combining actual life scenes. The user answers the outgoing call by the voice (i.e. receives the voice message), identifies the dialect to which the user belongs through the dialect type (identifies the voice type through the language identification model), and judges the corresponding key value. The dialect type is a dialect voice model corresponding to the own dialect, the selected language identification model is judged through a key value, the voice is transferred to characters with high accuracy, then the result is processed by natural language technology to analyze the intention of the user and give response characters, finally, the corresponding language synthesis model or the dialect model is selected through the key value, and the dialect voice (target answer) corresponding to the dialect user is synthesized to be listened by the user. Optionally, in order to improve response efficiency, the dialect type identification may determine the valid dialect type key value only when the dialect user responds for the first time, and cache the key value in the IVR, and when the user responds again, directly read the valid dialect type key value from the cache for voice identification and voice synthesis, but not perform the dialect type identification again.
FIG. 3 is a schematic diagram of the design of an intelligent outbound call system based on dialect type identification, wherein an outbound call platform of the intelligent outbound call system makes an outbound call to a specific dialect user through a telecommunication operator and plays a welcome language; the dialect user responds to the welcome language (acquires the language); if the IVR does not cache the effective dialect type key value, the dialect user response voice is subjected to dialect type identification through the IVR; fifthly, caching the identified dialect type effective key value into the IVR passing through this time; if the effective dialect category key value exists, selecting a language identification model corresponding to the key value (matching the corresponding language identification model according to the first target language category), and converting dialect user response voice into characters; seventhly, the recognized characters are sent to a natural language processing module to judge the voice content of the dialect user, and response characters corresponding to the voice content are returned (according to the judged language content, the language recognition model outputs a second target answer in a matched text form); sending the outbound response characters to a voice synthesis module, acquiring an effective dialect category key value from a cache according to the ninthly, selecting a corresponding voice synthesis model, and synthesizing corresponding audio data (processing a second target answer in a text form into a first target answer, wherein the first target answer is the voice with the same category as the first target language); the synthesized outbound response voice is processed by IVR and R
Figure BDA0002549151470000081
Operator network
Figure BDA0002549151470000082
Answering the dialect user to complete a turn and dialectAnd (5) interactive response of the user.
Fig. 4 is a diagram of a speech recognition apparatus according to the present invention, the apparatus including: a 401 input module, a 402 identification module, a 403 matching module, a 404 judgment module and a 405 output module; wherein,
the recognition module is used for recognizing the language acquired by the input module as a first target language; wherein the acquired language is voice information;
the matching module is used for matching the language identification model corresponding to the first target language type;
the judging module is used for judging the content of the first target language according to the language identification model;
and the output module is used for outputting the matched first target answer according to the judged language content.
Optionally, the language category includes a national language or a local dialect.
Optionally, the language identification model includes:
the input module acquires training data of at least one language in advance;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
Optionally, the apparatus further comprises a language processing module;
according to the language content judged by the judging module, the output module outputs a second target answer in a matched text form;
the language processing module is used for processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
It is understood that the above modules of the language identification device in the present embodiment have functions of implementing the corresponding steps of the method in the embodiment shown in fig. 1. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module, reference may be specifically made to the corresponding description of the method in the embodiment shown in fig. 1, and details are not repeated here.
The embodiment of the application provides an electronic device, which comprises a processor and a memory;
a memory for storing operating instructions;
and the processor is used for executing the language identification method provided by any embodiment of the application by calling the operation instruction.
As an example, fig. 5 shows a schematic structural diagram of an electronic device to which an embodiment of the present application is applicable, and as shown in fig. 5, the electronic device 2000 includes: a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to a memory 2003, such as via a bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present application.
The processor 2001 is applied to the embodiment of the present application to implement the method shown in the above method embodiment. The transceiver 2004 may include a receiver and a transmitter, and the transceiver 2004 is applied to the embodiments of the present application to implement the functions of the electronic device of the embodiments of the present application to communicate with other devices when executed.
The Processor 2001 may be a CPU (Central Processing Unit), general Processor, DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array) or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, etc.
Bus 2002 may include a path that conveys information between the aforementioned components. The bus 2002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 2002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The Memory 2003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
Optionally, the memory 2003 is used for storing application program code for performing the disclosed aspects, and is controlled in execution by the processor 2001. The processor 2001 is configured to execute the application program code stored in the memory 2003 to implement the language identification method provided in any of the embodiments of the present application.
The electronic device provided by the embodiment of the application is applicable to any embodiment of the method, and is not described herein again.
The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the language identification method shown in the above method embodiment.
The computer-readable storage medium provided in the embodiments of the present application is applicable to any of the embodiments of the foregoing method, and is not described herein again.
According to the scheme provided by the embodiment of the application, the acquired language is identified as the first target language; wherein the acquired language is voice information; matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model; and outputting the matched first target answer according to the judged language content. Based on the scheme, the language identification model can be constructed by utilizing the neural network, the language type can be identified more accurately, the language of the same type as the input language is output to respond, and the product experience and affinity of a user are improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of language identification, the method comprising:
identifying the acquired language as a first target language; wherein the acquired language is voice information;
matching a language identification model corresponding to the first target language type, and judging the content of the first target language according to the language identification model;
and outputting the matched first target answer according to the judged language content.
2. The method according to claim 1, wherein the language includes a national language or a local dialect.
3. The language identification method according to claim 1, wherein the language identification model comprises:
pre-acquiring training data of at least one language;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
4. The method of claim 3, wherein the obtaining the language identification model by training with the convolutional neural network model and the training data of the at least one language further comprises:
converting the training data of the at least one language into a two-dimensional spectrogram, and respectively producing a training set and a test set;
inputting the two-dimensional spectrogram of the training set into an initialized convolutional neural network model for model training to form a language identification model;
and testing the language identification model by using a regression classifier and a test set two-dimensional spectrogram.
5. The method according to claim 1 or 4, wherein the outputting the matched first target answer according to the determined language content specifically includes:
according to the judged language content, the language identification model outputs a second target answer in a matched text form;
processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
6. A speech recognition apparatus, the apparatus comprising: the device comprises an input module, an identification module, a matching module, a judgment module and an output module; wherein,
the recognition module is used for recognizing the language acquired by the input module as a first target language; wherein the acquired language is voice information;
the matching module is used for matching the language identification model corresponding to the first target language type;
the judging module is used for judging the content of the first target language according to the language identification model;
and the output module is used for outputting the matched first target answer according to the judged language content.
7. The language recognition device of claim 6, wherein the language recognition model comprises:
the input module acquires training data of at least one language in advance;
and carrying out training processing by utilizing a convolutional neural network model and the training data of the at least one language to obtain the language identification model.
8. The speech recognition apparatus of claim 6 or 7, wherein the apparatus further comprises a language processing module;
according to the language content judged by the judging module, the output module outputs a second target answer in a matched text form;
the language processing module is used for processing the second target answer in the text form into a first target answer; wherein the first target answer is a voice of the same kind as the first target language.
9. An electronic device comprising a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the method of any one of claims 1-5 by calling the operation instruction.
10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method of any one of claims 1-5.
CN202010569842.4A 2020-06-20 2020-06-20 Language identification method and device, electronic equipment and computer readable storage medium Pending CN111916057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010569842.4A CN111916057A (en) 2020-06-20 2020-06-20 Language identification method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010569842.4A CN111916057A (en) 2020-06-20 2020-06-20 Language identification method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111916057A true CN111916057A (en) 2020-11-10

Family

ID=73226088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010569842.4A Pending CN111916057A (en) 2020-06-20 2020-06-20 Language identification method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111916057A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114495931A (en) * 2022-01-28 2022-05-13 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium
CN117995166A (en) * 2024-01-26 2024-05-07 长沙通诺信息科技有限责任公司 Natural language data analysis method and system based on voice recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006148A1 (en) * 2013-06-27 2015-01-01 Microsoft Corporation Automatically Creating Training Data For Language Identifiers
CN104391673A (en) * 2014-11-20 2015-03-04 百度在线网络技术(北京)有限公司 Voice interaction method and voice interaction device
CN105957516A (en) * 2016-06-16 2016-09-21 百度在线网络技术(北京)有限公司 Switching method and device for multiple voice identification models
CN109256118A (en) * 2018-10-22 2019-01-22 江苏师范大学 End-to-end Chinese dialects identifying system and method based on production auditory model
CN110211565A (en) * 2019-05-06 2019-09-06 平安科技(深圳)有限公司 Accent recognition method, apparatus and computer readable storage medium
CN110827793A (en) * 2019-10-21 2020-02-21 成都大公博创信息技术有限公司 Language identification method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006148A1 (en) * 2013-06-27 2015-01-01 Microsoft Corporation Automatically Creating Training Data For Language Identifiers
CN104391673A (en) * 2014-11-20 2015-03-04 百度在线网络技术(北京)有限公司 Voice interaction method and voice interaction device
CN105957516A (en) * 2016-06-16 2016-09-21 百度在线网络技术(北京)有限公司 Switching method and device for multiple voice identification models
US20190096396A1 (en) * 2016-06-16 2019-03-28 Baidu Online Network Technology (Beijing) Co., Ltd. Multiple Voice Recognition Model Switching Method And Apparatus, And Storage Medium
CN109256118A (en) * 2018-10-22 2019-01-22 江苏师范大学 End-to-end Chinese dialects identifying system and method based on production auditory model
CN110211565A (en) * 2019-05-06 2019-09-06 平安科技(深圳)有限公司 Accent recognition method, apparatus and computer readable storage medium
CN110827793A (en) * 2019-10-21 2020-02-21 成都大公博创信息技术有限公司 Language identification method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114495931A (en) * 2022-01-28 2022-05-13 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium
CN117995166A (en) * 2024-01-26 2024-05-07 长沙通诺信息科技有限责任公司 Natural language data analysis method and system based on voice recognition

Similar Documents

Publication Publication Date Title
CN111667814B (en) Multilingual speech synthesis method and device
CN111048064B (en) Voice cloning method and device based on single speaker voice synthesis data set
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
CN111477216A (en) Training method and system for pronunciation understanding model of conversation robot
CN103377651B (en) The automatic synthesizer of voice and method
CN103514882A (en) Voice identification method and system
CN111312292A (en) Emotion recognition method and device based on voice, electronic equipment and storage medium
CN111508466A (en) Text processing method, device and equipment and computer readable storage medium
CN114416989A (en) Text classification model optimization method and device
CN111916057A (en) Language identification method and device, electronic equipment and computer readable storage medium
US7844459B2 (en) Method for creating a speech database for a target vocabulary in order to train a speech recognition system
CN116631412A (en) Method for judging voice robot through voiceprint matching
CN115116458A (en) Voice data conversion method and device, computer equipment and storage medium
CN111640423B (en) Word boundary estimation method and device and electronic equipment
CN113724698B (en) Training method, device, equipment and storage medium of voice recognition model
CN117351948A (en) Training method of voice recognition model, voice recognition method, device and equipment
Zhu [Retracted] Multimedia Recognition of Piano Music Based on the Hidden Markov Model
CN117496960A (en) Training method and device of voice recognition model, electronic equipment and storage medium
CN113823271B (en) Training method and device for voice classification model, computer equipment and storage medium
CN115798456A (en) Cross-language emotion voice synthesis method and device and computer equipment
CN114566156A (en) Keyword speech recognition method and device
CN113921042A (en) Voice desensitization method and device, electronic equipment and storage medium
CN111899738A (en) Dialogue generating method, device and storage medium
CN113505612B (en) Multi-user dialogue voice real-time translation method, device, equipment and storage medium
CN117672221B (en) Information transmission communication control method and system through voice control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220914

Address after: 12 / F, 15 / F, 99 Yincheng Road, Pudong New Area pilot Free Trade Zone, Shanghai, 200120

Applicant after: Jianxin Financial Science and Technology Co.,Ltd.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20201110

RJ01 Rejection of invention patent application after publication