CN102136001A - Multi-media information fuzzy search method - Google Patents
Multi-media information fuzzy search method Download PDFInfo
- Publication number
- CN102136001A CN102136001A CN2011100730481A CN201110073048A CN102136001A CN 102136001 A CN102136001 A CN 102136001A CN 2011100730481 A CN2011100730481 A CN 2011100730481A CN 201110073048 A CN201110073048 A CN 201110073048A CN 102136001 A CN102136001 A CN 102136001A
- Authority
- CN
- China
- Prior art keywords
- information
- time point
- phoneme
- confidence
- retrieved
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-media information fuzzy search method, comprising the following steps: firstly, collecting audio/video data, obtaining Lattice result of audio data, obtaining confidence grading information according to time point information and matching likelihood value grading information, rearranging multiple pieces of candidate information by adopting a stronger voice model, giving out an optimal identification result, building a word-grade and phoneme-grade index database, generating a primary information bank, inputting texts to be retrieved and time point information, converting the texts to be retrieved and time point information into a phoneme sequence, obtaining a similar phoneme sequence by utilizing a phoneme confusion matrix, and splitting the similar phoneme sequence into a plurality of phoneme combinations which enter a backward index database for search and then enter the primary information bank for accurate matching, and returning to a candidate position. By adopting the technical scheme, the retrieved quantity can be increased to the maximum extent, and the retrieving speed can be greatly promoted on the condition that the system performance is ensured.
Description
Technical field
The present invention relates to multimedia technology field, relate in particular to a kind of multimedia messages fuzzy retrieval method.
Background technology
Increasing of accompanying information development of times, multimedia document, the news broadcast program presents the scale of magnanimityization day by day.The text messages such as traditional newspaper, magazine, books that compare, and the rich text information of advanced internet, multimedia documents such as audio, video data have abundant, the lively more form that represents, and also more help people and accept like a cork.But because multimedia document is many and assorted, how obtaining interested content easily becomes a problem that needs to be resolved hurrily.Usually the method for doing is, with manually these data being carried out information extraction, doing very like this wastes time and energy, and the technology of therefore having emerged in large numbers in recent years much based on artificial intelligence are applied to this field, and wherein the most popular is exactly speech recognition technology.Speech recognition technology is a kind of switch technology of speech-to-text, and has become after the text, just can utilize search technique to carry out omnibearing index and retrieval.
Yet speech recognition technology is not a complete reliable technique, remedies at wherein identification error and revises retrieval technique, is necessary.Along with the practicability and the increasing income of automatic speech recognition technology, the automatic speech recognition system that is fit to own field and demand is set about buying in or building by a lot of companies.Utilize speech recognition technology that the text in the audio frequency and video data is discerned, just can obtain the text message in this section data, these text messages are input in the database, just can retrieve easily.
Conventional speech recognition technology can only provide the final Chinese character information of identification, on the bearing accuracy of concrete index terms, need artificial judgement on the one hand, waste time and energy, be subjected to the restriction of speech recognition performance on the other hand, the accuracy rate of index and search also is to be difficult to control.For example Beijing, somewhere has been identified as " after all ", just can not find here when " Beijing " to search as the user so.Sometimes, " Beijing " may be sent out into " Bei Jin " or " north is frightened ", also can't find.Therefore, traditional technology based on text search, performance will be subjected to the influence of speech recognition.
Summary of the invention
The objective of the invention is to propose a kind of multimedia messages fuzzy retrieval method, can increase the quantity that retrieves to greatest extent, and under the prerequisite that guarantees system performance, greatly improve retrieval rate.
For reaching this purpose, the present invention by the following technical solutions:
A kind of multimedia messages fuzzy retrieval method may further comprise the steps:
A, collection audio, video data;
B, obtain the Lattice result of voice data, comprise time point information and match likelihood value marking information, and change into many candidate informations;
C, according to time point information and match likelihood value marking information, obtain degree of confidence marking information;
D, the stronger speech model of employing are resequenced to many candidate informations, and are provided optimal identification result;
E, adopt many candidate informations, time point information and degree of confidence marking information to set up words level and phoneme level index database, constitute the back, and raw information encoded generate the raw information storehouse to index database;
F, input text to be retrieved and time point information change into aligned phoneme sequence with text to be retrieved, and utilize the phoneme confusion matrix, obtain similar aligned phoneme sequence, split into to be no less than 1 phonotactics;
G, word and aligned phoneme sequence enter the back respectively and inquire about to index database, obtain the entry position in one group of raw information storehouse and corresponding degree of confidence marking information, return successively according to degree of confidence marking information height;
H, enter into the raw information storehouse respectively and accurately mate,, return position candidate greater than confidence threshold value according to inlet number and degree of confidence marking Information Selection confidence threshold value.
Steps A is further comprising the steps of:
Audio data format is changed into WINDOWS WAV form, and sampling rate is 16 kilo hertzs.
In the steps A, the mode of employing computer and TV card is gathered the voice data in the TV programme; The mode of employing radio and sound card is gathered the voice data in the broadcast singal.
In the step F, text to be retrieved is changed into aligned phoneme sequence according to the letter-to-phone mode.
Adopted technical scheme of the present invention, at the speech recognition errors type that may occur, utilize its similarity in phone-level, and the obfuscation of introducing by the phoneme confusion matrix, can increase the quantity that retrieves to greatest extent,, introduce the mode that a plurality of phoneme set are built index jointly simultaneously at the high problem of phone-level repetition rate, under the prerequisite that guarantees system performance, improved retrieval rate greatly.
Description of drawings
Fig. 1 is the process flow diagram of multimedia messages fuzzy search in the specific embodiment of the invention.
Embodiment
Further specify technical scheme of the present invention below in conjunction with accompanying drawing and by embodiment.
Fig. 1 is the process flow diagram of multimedia messages fuzzy search in the specific embodiment of the invention.As shown in Figure 1, this multimedia information retrieval flow process may further comprise the steps:
Because the form that TV card and sound card are recorded determines, only need get final product at the specific format transcoding of programming.
Different with common recognition result, the recognition result of this embodiment is not the optimal result (claiming 1-Best again) on the conventional meaning, but the more rich decoding path that keeps in the speech recognition claims the Lattice format result again.The principal feature of this form is: contain abundant time point and quiet information and match likelihood value marking information, and can change into by the many candidate informations of speech, perhaps being called confusion network, and optimal result, can obtain on the confusion network than optimal identification result more performance.
In this step, according to the principle of search engine, the multiple information of utilizing above step to obtain is carried out index to basic index level.Here using two-layer index level, is respectively words level and phone set, and wherein phoneme can simply be interpreted as initial consonant or simple or compound vowel of a Chinese syllable.This way is also seldom used in search engine, why increased the index of phoneme level, mainly be because identification error may appear in speech recognition, between these identification errors and the correct text certain correlativity is arranged again simultaneously, for example phoneme is still more similar, train the phoneme confusion matrix according to common identification error, therefore the index of phoneme level has been arranged, just can utilize the phoneme confusion matrix.The frequency of occurrences of considering phoneme simultaneously is higher than individual character far away, can cause a large amount of candidate result and reduces search efficiency, has therefore adopted the indexing means of a plurality of phonotactics, can improve search efficiency greatly under the prerequisite that guarantees search quality.Two layer indexs have constituted back to index database, and it has comprised time point and confidence information, simultaneously raw information are carried out the efficient coding compression and generate the raw information storehouse.
Step 106, input text to be retrieved and time point information, (Grapheme-to-Phoneme G2P) changes into aligned phoneme sequence with text to be retrieved, and utilizes the phoneme confusion matrix according to the letter-to-phone mode, obtain similar aligned phoneme sequence, split into a plurality of phonotactics.
By this embodiment, can mark and build the storehouse more completely to multimedia messages, later stage inquiry can be meticulousr, index and navigate to interested position quickly.Utilize the index of phone-level, can increase the multimedia messages that finds greatly, utilize confidence information, can filter out identification is not good multimedia messages, more than two technology can avoid effectively because the retrieval error that the mistake of speech recognition is brought.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.
Claims (4)
1. a multimedia messages fuzzy retrieval method is characterized in that, may further comprise the steps:
A, collection audio, video data;
B, obtain the Lattice result of voice data, comprise time point information and match likelihood value marking information, and change into many candidate informations;
C, according to time point information and match likelihood value marking information, obtain degree of confidence marking information;
D, the stronger speech model of employing are resequenced to many candidate informations, and are provided optimal identification result;
E, adopt many candidate informations, time point information and degree of confidence marking information to set up words level and phoneme level index database, constitute the back to index database, and with the multi-medium data generation multimedia database of encoding;
F, input text to be retrieved and time point information change into aligned phoneme sequence with text to be retrieved, and utilize the phoneme confusion matrix, obtain similar aligned phoneme sequence, split into to be no less than 1 phonotactics;
G, word and aligned phoneme sequence enter the back respectively and inquire about to index database, obtain the entry position in one group of raw information storehouse and corresponding degree of confidence marking information, return successively according to degree of confidence marking information height;
H, enter into the raw information storehouse respectively and accurately mate,, return position candidate greater than confidence threshold value according to inlet number and degree of confidence marking Information Selection confidence threshold value.
2. a kind of multimedia messages fuzzy retrieval method according to claim 1 is characterized in that steps A is further comprising the steps of:
Audio data format is changed into WINDOWS WAV form, and sampling rate is 16 kilo hertzs.
3. a kind of multimedia messages fuzzy retrieval method according to claim 1 is characterized in that, in the steps A, the mode of employing computer and TV card is gathered the voice data in the TV programme; The mode of employing radio and sound card is gathered the voice data in the broadcast singal.
4. a kind of multimedia messages fuzzy retrieval method according to claim 1 is characterized in that, in the step F, according to the letter-to-phone mode text to be retrieved is changed into aligned phoneme sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110073048 CN102136001B (en) | 2011-03-25 | 2011-03-25 | Multi-media information fuzzy search method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110073048 CN102136001B (en) | 2011-03-25 | 2011-03-25 | Multi-media information fuzzy search method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102136001A true CN102136001A (en) | 2011-07-27 |
CN102136001B CN102136001B (en) | 2012-12-26 |
Family
ID=44295787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110073048 Expired - Fee Related CN102136001B (en) | 2011-03-25 | 2011-03-25 | Multi-media information fuzzy search method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102136001B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164403A (en) * | 2011-12-08 | 2013-06-19 | 深圳市北科瑞声科技有限公司 | Generation method of video indexing data and system |
CN103500579A (en) * | 2013-10-10 | 2014-01-08 | 中国联合网络通信集团有限公司 | Voice recognition method, device and system |
CN104008132A (en) * | 2014-05-04 | 2014-08-27 | 深圳市北科瑞声科技有限公司 | Voice map searching method and system |
CN112906369A (en) * | 2021-02-19 | 2021-06-04 | 脸萌有限公司 | Lyric file generation method and device |
CN113096242A (en) * | 2021-04-29 | 2021-07-09 | 平安科技(深圳)有限公司 | Virtual anchor generation method and device, electronic equipment and storage medium |
CN113744718A (en) * | 2020-05-27 | 2021-12-03 | 海尔优家智能科技(北京)有限公司 | Voice text output method and device, storage medium and electronic device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477527A (en) * | 2008-12-30 | 2009-07-08 | 深圳市迅雷网络技术有限公司 | Multimedia resource retrieval method and apparatus |
CN101552003A (en) * | 2009-02-25 | 2009-10-07 | 北京派瑞根科技开发有限公司 | Media information processing method |
CN101916251A (en) * | 2009-03-26 | 2010-12-15 | 富士通株式会社 | The storage medium of integrated indexing unit of multimedia and the integrated search program of multimedia |
-
2011
- 2011-03-25 CN CN 201110073048 patent/CN102136001B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477527A (en) * | 2008-12-30 | 2009-07-08 | 深圳市迅雷网络技术有限公司 | Multimedia resource retrieval method and apparatus |
CN101552003A (en) * | 2009-02-25 | 2009-10-07 | 北京派瑞根科技开发有限公司 | Media information processing method |
CN101916251A (en) * | 2009-03-26 | 2010-12-15 | 富士通株式会社 | The storage medium of integrated indexing unit of multimedia and the integrated search program of multimedia |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164403A (en) * | 2011-12-08 | 2013-06-19 | 深圳市北科瑞声科技有限公司 | Generation method of video indexing data and system |
CN103164403B (en) * | 2011-12-08 | 2016-03-16 | 深圳市北科瑞声科技有限公司 | The generation method and system of video index data |
CN103500579A (en) * | 2013-10-10 | 2014-01-08 | 中国联合网络通信集团有限公司 | Voice recognition method, device and system |
CN103500579B (en) * | 2013-10-10 | 2015-12-23 | 中国联合网络通信集团有限公司 | Audio recognition method, Apparatus and system |
CN104008132A (en) * | 2014-05-04 | 2014-08-27 | 深圳市北科瑞声科技有限公司 | Voice map searching method and system |
CN113744718A (en) * | 2020-05-27 | 2021-12-03 | 海尔优家智能科技(北京)有限公司 | Voice text output method and device, storage medium and electronic device |
CN112906369A (en) * | 2021-02-19 | 2021-06-04 | 脸萌有限公司 | Lyric file generation method and device |
CN113096242A (en) * | 2021-04-29 | 2021-07-09 | 平安科技(深圳)有限公司 | Virtual anchor generation method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN102136001B (en) | 2012-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101382937B (en) | Multimedia resource processing method based on speech recognition and on-line teaching system thereof | |
CN102122506B (en) | Method for recognizing voice | |
CN102136001B (en) | Multi-media information fuzzy search method | |
CN101464896B (en) | Voice fuzzy retrieval method and apparatus | |
CN103761261B (en) | A kind of media search method and device based on speech recognition | |
CN101326572B (en) | Speech recognition system with huge vocabulary | |
CN113326387B (en) | Intelligent conference information retrieval method | |
CN102667773A (en) | Search device, search method, and program | |
CN103730115A (en) | Method and device for detecting keywords in voice | |
CN105159870A (en) | Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization | |
Levin et al. | Automated closed captioning for Russian live broadcasting | |
US20150371627A1 (en) | Voice dialog system using humorous speech and method thereof | |
CN114547373A (en) | Method for intelligently identifying and searching programs based on audio | |
JP2010262413A (en) | Voice information extraction device | |
JP2019020597A (en) | End-to-end japanese voice recognition model learning device and program | |
CN106550268B (en) | Video processing method and video processing device | |
WO2008100037A1 (en) | The system and method for generating indexing information of multimedia data file using vocal data and retrieving indexing information of multimedia data file | |
CN115455946A (en) | Voice recognition error correction method and device, electronic equipment and storage medium | |
Servan et al. | Conceptual decoding from word lattices: application to the spoken dialogue corpus media | |
CN102117335B (en) | Method for retrieving multimedia information | |
Salimbajevs | Creating Lithuanian and Latvian speech corpora from inaccurately annotated web data | |
US20050125224A1 (en) | Method and apparatus for fusion of recognition results from multiple types of data sources | |
WO2007105615A1 (en) | Request content identification system, request content identification method using natural language, and program | |
de Jong et al. | OLIVE: Speech-based video retrieval | |
Jong et al. | Language-based multimedia information retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Multi-media information fuzzy search method Effective date of registration: 20130605 Granted publication date: 20121226 Pledgee: Zhongguancun Beijing technology financing Company limited by guarantee Pledgor: TVMining (Beijing) Media Technology Co., Ltd. Registration number: 2013990000345 |
|
PLDC | Enforcement, change and cancellation of contracts on pledge of patent right or utility model | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121226 Termination date: 20210325 |
|
CF01 | Termination of patent right due to non-payment of annual fee |