[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN1746970A - Speech recognition of mobile telecommunication terminal - Google Patents

Speech recognition of mobile telecommunication terminal Download PDF

Info

Publication number
CN1746970A
CN1746970A CNA2004100514745A CN200410051474A CN1746970A CN 1746970 A CN1746970 A CN 1746970A CN A2004100514745 A CNA2004100514745 A CN A2004100514745A CN 200410051474 A CN200410051474 A CN 200410051474A CN 1746970 A CN1746970 A CN 1746970A
Authority
CN
China
Prior art keywords
mentioned
data
triphones
voice
telephone number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2004100514745A
Other languages
Chinese (zh)
Inventor
金勋
金正熙
具东昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG Electronics Huizhou Co Ltd
Original Assignee
LG Electronics Huizhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LG Electronics Huizhou Co Ltd filed Critical LG Electronics Huizhou Co Ltd
Priority to CNA2004100514745A priority Critical patent/CN1746970A/en
Publication of CN1746970A publication Critical patent/CN1746970A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

A method for identifying voice of mobile communication terminal includes establishing video - audio model by utilizing triphone, registering telephone number, forming triphone data for registered telephone number name or firm name and storing it, inputting user voice, comparing inputted voice data with stored triphone data, dialing out telephone number of inputted voice if identification is successful.

Description

The audio recognition method of mobile communication terminal machine
[technical field]
This is the mobile communication terminal machine audio recognition method that can realize the telephone directory book function of search in the invention about speech recognition, particularly mobile communication terminal machine by the flexion word speech recognition.
[background technology]
Speech recognition technology is the common technology by the numerous electrical equipment of vice activation, electronic equipment, if apply flexibly in the mobile phone, can bring very big facility to the user.
In the above-mentioned speech recognition technology, the simplest is specific language person isolated word recognition technology, and this technology can only be discerned trained voice.When putting through phone, at first to register required telephone number, and will pronounce more than 1 time or 2 times corresponding to the name of telephone number with user's voice by above-mentioned specific language person audio recognition method.In other words, the user will carry out the voice training process with the voice of oneself, and preserves corresponding speech characteristic parameter (Parameter), so just can realize speech recognition.
But this specific language person audio recognition method must carry out voice training to each word of user's registration, has brought a lot of inconveniences to the user undoubtedly; And, if mistake occurs in the voice training process that the user carries out, the problem that will cause speech identifying function significantly to descend.
In addition, the telephone set of selling on the market can be registered extremely thousands of telephone number of hundreds of names recently, therefore is difficult to preserve its feature by speech recognition process.
In order to address the above problem, begun to popularize the speaker-independent speech recognition technology.This nonspecific speech recognition technology has adopted the certain speech parameters of collecting and extracting in the more voice data of comforming, and therefore need not the user carry out independent voice training process, also can realize speech identifying function; Because therefore training process that need not the user does not exist the phonetic recognization rate difference in the training process, can guarantee certain phonetic recognization rate.But the speech recognition object word of this speaker-independent speech recognition technology is relatively-stationary, therefore can't use in the voice identification telephone set that changes according to its identifying object word of user (perhaps name).
In order to address the above problem, need nonspecific language person flexion speech recognition technology.In this technology, when the Word message of input identifying object word, will generate recognition network, and form supporting (Matching) with required sound equipment model corresponding to this literal information; Therefore, as long as user's inputting word information just can carry out the speech recognition of required word.
Tell about as the front, in the process of using nonspecific language person flexion speech recognition technology, in order to use speech identifying function, if the user utilize on the terminating machine the keyboard registering telephone numbers or by downloaded input telephone number, just can realize phonetic dialing.
[summary of the invention]
But, in the above-mentioned nonspecific language person flexion word recognition method, in order to generate recognition network, the data that must preserve relevant all phoneme informations of Korean to any Word message, therefore and possess processing speed faster, be difficult to be arranged in the limited hardware such as telephone set.
Therefore, in order to address the above problem a little, this invention has adopted the Hidden Markov Model (HMM) (HMM) and the recognition network of suitable triphones, registered telephone number is transformed into the triphones data to be preserved, and the phonetic modification that utilizes above-mentioned Hidden Markov Model (HMM) to have imported becomes the triphones data, compare with converted triphones data with the corresponding name triphones of above-mentioned registered telephone number data then, carry out speech recognition with this.Provide by said process, when changing the telephone directory book content at every turn, need not other training process, only rely on speech recognition also can obtain required information, and realize the mobile communication terminal machine audio recognition method of ease of use.
In addition, because this invention has adopted the Hidden Markov Model (HMM) (Hidden Markov Model) that is suitable for 1000 left and right sides triphones data, realized voice recognition processing speed faster.
In order to achieve the above object, the mobile communication terminal machine audio recognition method in this invention comprises following step content: step 1 constitutes the phonotape and videotape that utilizes triphones (Triphone) and learns model; Step 2, registering telephone numbers; Step 3 forms the triphones data to above-mentioned registered telephone number title or trade name, and preserves; Step 4, the input user speech; Step 5, the relevant above-mentioned triphones data of having imported institute's given data of voice and being kept on the telephone directory book compare; Step 6, above-mentioned comparative result if discern successfully, is just put through the telephone number corresponding to these input voice.
As telling about the front, the telephone number title with registration in this invention is transformed into the triphones data, and preserves; By Hidden Markov Model (HMM) the phonetic modification that is input to is become three factor data then; Then, realize speech recognition with this relatively with the triphones data of conversion triphones data and above-mentioned registered telephone number title; Like this,, also need not pass through independent training process, also can obtain required information, improve the convenience of using greatly even the content of telephone directory book is changed.
In addition, adopt the Hidden Markov Model (HMM) that is suitable for 1000 triphones data in this invention, improved voice recognition processing speed.
[description of drawings]
Fig. 1 is the mobile communication terminal machine part composition module figure that is suitable for this invention.
Fig. 2 is the flow sequence figure of mobile communication terminal machine audio recognition method in this invention.
[embodiment]
With reference to the accompanying drawings, the example to mobile communication terminal machine audio recognition method in this invention with above-mentioned feature describes.
Fig. 1 is the device composition module figure that is applicable to this invention; Just as shown in the figure, form by following parts: microphone 10, the user during registering telephone numbers, imports recipient's name or trade name for phonetic dialing; Coder 20 will be output as pulse code modulation (Pulse Code Modulation:PCM) signal by the voice signal conversion of above-mentioned microphone 10 inputs or low (the PCM signal of μ-Law) of microphone; Speech coder 30 receives the PCM signal that above-mentioned coder 20 is exported, and this PCM signal compression is output as institute's given data (for example, speech coder (QCELP)) signal; Control device 40 receives institute's given data signal of above-mentioned speech coder 30 outputs, and carries out speech identifying functions such as speech recognition and phonetic synthesis; Data-carrier store 50 is preserved institute's attribute sound data; Program storage 60 is preserved all sequences number of operation terminating machine; Sound equipment SP, the output voice.
Before the principle of work of explanation this invention, earlier acoustics model and recognition network essential in this invention are described.
Generality, when changing identifying object vocabulary, that need not conversion sound equipment model also can discern drives in the identification of folding speech, must possess the acoustics model of all phonetic features of reflection.
Used the Hidden Markov Model (HMM) (Hidden Markov Model:HMM) of suitable triphones in this invention.
Common three factors have adopted left and right sides contextual information based on extensive speech database (DB); And each three factor all have with brief interval (tablet) modeling (Modeling) of voice for import, the structure of 3 states (State) such as stable, migration.
To this, illustrate below.
'? ' the performance of basic phoneme: hagggyo.
'? ' triphones performance: #_h_a_h_a_g a_g_gg g_gg_y gg_y_o y_o_#
In addition, when supposing that basic announcement is 40, three factors can generate 40X40X40; Remove the part that to pronounce, can show most of vocabulary with the quantity about 2000.In other words, when 60000 states took place, this invention was in order to dwindle the size of model, and the similar degree between the mensuration state is carried out cluster function (Clustering) binding similar state mutually, was reduced into quantity about 1000 with this; Each state all has a code book (Codebook), and shows as the structure that possesses mutual different observation probability values.
In addition, be provided with pronunciation transformation rule program in this invention, name or the trade name Word message that is registered in the telephone directory book can be transformed into above-mentioned acoustics model information by this program.Comprised in the above-mentioned pronunciation transformation rule bottom rule, head tone rule, consonant assimilation, schwaization etc. Korean accurately can be labeled as the pronunciation mark all must the rule.
In the recognition network that this invention adopts, during registering telephone numbers, as long as input name or trade name just become three factor information according to pronunciation rule with this literal information conversion, and be kept in the memory area of having set.
Below with reference to Fig. 2, the principle of work of this invention of possessing above-mentioned acoustics model and recognition network is described.
Fig. 2 is the flow sequence figure of mobile communication terminal machine audio recognition method in this invention; As shown in the figure, comprise following step content: step 1 constitutes the phonotape and videotape that utilizes triphones (Triphone) and learns model; Step 2, registering telephone numbers; Step 3, the sharp triphones data that form above-mentioned registered telephone number title or trade name, and preserve; Step 4, the input user speech; Step 5, the relevant above-mentioned triphones data of having imported institute's given data of voice and being kept on the telephone directory book compare; Step 6, above-mentioned comparative result if discern successfully, is just put through the telephone number corresponding to these input voice.
In addition, further comprising the steps of content: above-mentioned comparative result, if recognition failures just turns back to the step of above-mentioned input voice again.
The principle of work of this invention is as follows: at first, form the HMM model (S10) that adopts triphones, utilize this model then, by the telephone directory book registration menu on the terminating machine, register required telephone number (S20).At this moment, will import and corresponding name of registered telephone number or trade name.
After above-mentioned telephone number is registered, will utilize the recognition network that is kept in the program storage 60, form triphones data (S30) corresponding to above-mentioned registered telephone number title or trade name.And above-mentioned triphones data are stored in the in store storer of above-mentioned registered telephone number (S40).In other words, when user's registering telephone numbers, as long as the literal input is corresponding to the name or the trade name of this telephone number, will utilize recognition network that is kept in the program storage 60 and the triphones data that are kept in the data-carrier store 50, formation is corresponding to the triphones data of above-mentioned input characters, and the triphones data are kept in the memory area of having set.
In addition,, will have this repetitions triphones data if when above-mentioned triphones data repeat with other telephone number triphones data of having preserved, and other data outside the preservation only.For example, suppose that the telephone number name of having preserved is called ' Hong Jitong ', decides will preserve in the memory area triphones data to ' Hong Jitong ' so; If when registering other telephone numbers, corresponding with it name is ' Hong Jizhu ', Hong Ji Zhu so ' in ' Hong Ji ' be identical with ' Hong Ji ' its triphones data among above-mentioned ' Hong Jitong ' that has preserved, therefore will have and ' Hong Ji ' corresponding triphones data message, just preserve triphones data corresponding to ' Zhu '.
Behind all required telephone numbers of said process registered user, by microphone 10 phonetic entries and the corresponding name of required telephone number (S50); Then, coder 20 is transformed into the PCM data, the line output of going forward side by side with the voice of input; Speech coder 30 receives the PCM data of above-mentioned coder 20 outputs, and compression be output as fixed data (for example, voice umbering device).
Here, in institute's given data of above-mentioned speech coder 30 outputs, include the coefficient that shows input voice status information and voice excitation signal (Excitation Signal) is carried out the information of modeling and gain (Gain) etc.For example, institute's given data of speech coder 30 outputs can be made up of line spectrum pair (Line SpectrumPair:LSP) coefficient, code book index (Codebook Index) and gain (Gain) etc.
After control device 40 receives institute's given data of above-mentioned speech coder 30 outputs, utilization is kept at 1000 triphones data in the data-carrier store 50, to constitute triphones data corresponding to the triphones data of this given data with the highest observation probability value.
Then, above-mentionedly constituted the triphones data and compared (S60) corresponding to the triphones data of registered telephone number title.Here, the triphones data of above-mentioned institute given data by fixed frame (Frame) unit form.For example, the voice of input are ' Hong Jitong ', and a frame at first constitutes the triphones data of corresponding first frame ' flood ' when being a word, and compare with the name triphones data that are recorded on the telephone directory book; If exist and the triphones data of being somebody's turn to do ' flood ' supporting (Matching), then compare the triphones data of second frame ' Ji ' and the supporting name triphones data of above-mentioned ' flood '.By the voice that said process is relatively imported,, will put through this telephone number (S80) if there is the telephone number title that matches with the voice of input.
In addition, corresponding to the voice identification result of above-mentioned input voice is 2 when above, for example Shu Ru voice are ' Hong Jitong ', the telephone number name supporting with ' Hong Jitong ' is called ' flood lucky virgin family ', ' big vast Ji Tong office ', ' the lucky virgin mobile phone of flood ' and waits most individual supporting as a result the time, its result is presented in the display window of terminating machine, utilize the voice of setting in advance then, for example ' result about the input voice is as follows ', SP circulates a notice of to the user by sound equipment.
The user can be in display window display result (catalogue), the telephone number that selection will be conversed.
On the contrary, if there is no with said process S70 in the supporting telephone number title of voice imported, utilize the voice of setting in advance, for example ' do not exist and import the identical telephone number of voice, please re-enter.', circulate a notice of to the user by sound equipment; The user carries out phonetic entry after receiving governance again.If exist and the corresponding telephone number of voice that re-enters, just put through this telephone number.

Claims (6)

1, mobile communication terminal machine audio recognition method comprises:
Step 1 constitutes the phonotape and videotape that utilizes triphones and learns model;
Step 2, registering telephone numbers;
Step 3 forms the triphones data to above-mentioned registered telephone number title or trade name, and preserves;
Step 4, the input user speech;
Step 5, the relevant above-mentioned triphones data of having imported institute's given data of voice and being kept on the telephone directory book compare;
Step 6, above-mentioned comparative result if discern successfully, is just put through the telephone number corresponding to these input voice.
2, mobile communication terminal machine audio recognition method as claimed in claim 1 is characterized in that, described method further comprises:
Above-mentioned comparative result is if recognition failures just turns back to the step of above-mentioned input voice again.
3, mobile communication terminal machine audio recognition method as claimed in claim 1 is characterized in that,
Above-mentioned acoustics model is Hidden Markov Model (HMM) (Hidden Markov Model:HMM).
4, mobile communication terminal machine audio recognition method as claimed in claim 3 is characterized in that, above-mentioned acoustics model adopts the state clustering skill and technique.
5, mobile communication terminal machine audio recognition method as claimed in claim 1 is characterized in that,
Be 2 corresponding to the recognition result of above-mentioned input voice and show its result when above, and utilize and set voice in advance and circulate a notice of to the user.
6, mobile communication terminal machine audio recognition method as claimed in claim 1 is characterized in that, and is total corresponding to above-mentioned triphones data of having preserved telephone number title or trade name.
CNA2004100514745A 2004-09-10 2004-09-10 Speech recognition of mobile telecommunication terminal Pending CN1746970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2004100514745A CN1746970A (en) 2004-09-10 2004-09-10 Speech recognition of mobile telecommunication terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2004100514745A CN1746970A (en) 2004-09-10 2004-09-10 Speech recognition of mobile telecommunication terminal

Publications (1)

Publication Number Publication Date
CN1746970A true CN1746970A (en) 2006-03-15

Family

ID=36166486

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2004100514745A Pending CN1746970A (en) 2004-09-10 2004-09-10 Speech recognition of mobile telecommunication terminal

Country Status (1)

Country Link
CN (1) CN1746970A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141508B (en) * 2006-09-05 2012-02-22 美商富迪科技股份有限公司 communication system and voice recognition method
CN106663430A (en) * 2014-09-08 2017-05-10 高通股份有限公司 Keyword detection using speaker-independent keyword models for user-designated keywords
CN111899718A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for recognizing synthesized speech

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141508B (en) * 2006-09-05 2012-02-22 美商富迪科技股份有限公司 communication system and voice recognition method
CN106663430A (en) * 2014-09-08 2017-05-10 高通股份有限公司 Keyword detection using speaker-independent keyword models for user-designated keywords
CN106663430B (en) * 2014-09-08 2021-02-26 高通股份有限公司 Keyword detection for speaker-independent keyword models using user-specified keywords
CN111899718A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for recognizing synthesized speech

Similar Documents

Publication Publication Date Title
US6463413B1 (en) Speech recognition training for small hardware devices
US6925154B2 (en) Methods and apparatus for conversational name dialing systems
US7043431B2 (en) Multilingual speech recognition system using text derived recognition models
CN1121680C (en) Speech sound recognition
CN110751943A (en) Voice emotion recognition method and device and related equipment
CN1130688C (en) Speech recognition methods and apparatus on basis of modelling of new words
CN1591567A (en) Open type word table speech identification
WO2001046945A1 (en) Learning of dialogue states and language model of spoken information system
CA2515613A1 (en) System and method of lattice-based search for spoken utterance retrieval
WO2007005098A2 (en) Method and apparatus for generating and updating a voice tag
CN1381831A (en) Phonetic recognition device independent unconnected with loudspeaker
US20050004799A1 (en) System and method for a spoken language interface to a large database of changing records
CN112686041B (en) Pinyin labeling method and device
Ramabhadran et al. Acoustics-only based automatic phonetic baseform generation
US20050071170A1 (en) Dissection of utterances into commands and voice data
US7428491B2 (en) Method and system for obtaining personal aliases through voice recognition
Gao et al. Innovative approaches for large vocabulary name recognition
CN1746970A (en) Speech recognition of mobile telecommunication terminal
WO2024251169A1 (en) Speech recognition method, device, and storage medium
CN113724690B (en) PPG feature output method, target audio output method and device
Lyu et al. Toward constructing a multilingual speech corpus for Taiwanese (Min-nan), Hakka, and Mandarin
CA2597826C (en) Method, software and device for uniquely identifying a desired contact in a contacts database based on a single utterance
Georgila et al. A speech-based human-computer interaction system for automating directory assistance services
EP1554864B1 (en) Directory assistant method and apparatus
CN1232336A (en) Voice command system for automatic dialing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication