[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN102136001A - Multi-media information fuzzy search method - Google Patents

Multi-media information fuzzy search method Download PDF

Info

Publication number
CN102136001A
CN102136001A CN2011100730481A CN201110073048A CN102136001A CN 102136001 A CN102136001 A CN 102136001A CN 2011100730481 A CN2011100730481 A CN 2011100730481A CN 201110073048 A CN201110073048 A CN 201110073048A CN 102136001 A CN102136001 A CN 102136001A
Authority
CN
China
Prior art keywords
information
time point
phoneme
confidence
retrieved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100730481A
Other languages
Chinese (zh)
Other versions
CN102136001B (en
Inventor
伍昕
吴鹏
刘赵杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TVMining Beijing Media Technology Co Ltd
Original Assignee
TVMining Beijing Media Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TVMining Beijing Media Technology Co Ltd filed Critical TVMining Beijing Media Technology Co Ltd
Priority to CN 201110073048 priority Critical patent/CN102136001B/en
Publication of CN102136001A publication Critical patent/CN102136001A/en
Application granted granted Critical
Publication of CN102136001B publication Critical patent/CN102136001B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-media information fuzzy search method, comprising the following steps: firstly, collecting audio/video data, obtaining Lattice result of audio data, obtaining confidence grading information according to time point information and matching likelihood value grading information, rearranging multiple pieces of candidate information by adopting a stronger voice model, giving out an optimal identification result, building a word-grade and phoneme-grade index database, generating a primary information bank, inputting texts to be retrieved and time point information, converting the texts to be retrieved and time point information into a phoneme sequence, obtaining a similar phoneme sequence by utilizing a phoneme confusion matrix, and splitting the similar phoneme sequence into a plurality of phoneme combinations which enter a backward index database for search and then enter the primary information bank for accurate matching, and returning to a candidate position. By adopting the technical scheme, the retrieved quantity can be increased to the maximum extent, and the retrieving speed can be greatly promoted on the condition that the system performance is ensured.

Description

A kind of multimedia messages fuzzy retrieval method
Technical field
The present invention relates to multimedia technology field, relate in particular to a kind of multimedia messages fuzzy retrieval method.
Background technology
Increasing of accompanying information development of times, multimedia document, the news broadcast program presents the scale of magnanimityization day by day.The text messages such as traditional newspaper, magazine, books that compare, and the rich text information of advanced internet, multimedia documents such as audio, video data have abundant, the lively more form that represents, and also more help people and accept like a cork.But because multimedia document is many and assorted, how obtaining interested content easily becomes a problem that needs to be resolved hurrily.Usually the method for doing is, with manually these data being carried out information extraction, doing very like this wastes time and energy, and the technology of therefore having emerged in large numbers in recent years much based on artificial intelligence are applied to this field, and wherein the most popular is exactly speech recognition technology.Speech recognition technology is a kind of switch technology of speech-to-text, and has become after the text, just can utilize search technique to carry out omnibearing index and retrieval.
Yet speech recognition technology is not a complete reliable technique, remedies at wherein identification error and revises retrieval technique, is necessary.Along with the practicability and the increasing income of automatic speech recognition technology, the automatic speech recognition system that is fit to own field and demand is set about buying in or building by a lot of companies.Utilize speech recognition technology that the text in the audio frequency and video data is discerned, just can obtain the text message in this section data, these text messages are input in the database, just can retrieve easily.
Conventional speech recognition technology can only provide the final Chinese character information of identification, on the bearing accuracy of concrete index terms, need artificial judgement on the one hand, waste time and energy, be subjected to the restriction of speech recognition performance on the other hand, the accuracy rate of index and search also is to be difficult to control.For example Beijing, somewhere has been identified as " after all ", just can not find here when " Beijing " to search as the user so.Sometimes, " Beijing " may be sent out into " Bei Jin " or " north is frightened ", also can't find.Therefore, traditional technology based on text search, performance will be subjected to the influence of speech recognition.
Summary of the invention
The objective of the invention is to propose a kind of multimedia messages fuzzy retrieval method, can increase the quantity that retrieves to greatest extent, and under the prerequisite that guarantees system performance, greatly improve retrieval rate.
For reaching this purpose, the present invention by the following technical solutions:
A kind of multimedia messages fuzzy retrieval method may further comprise the steps:
A, collection audio, video data;
B, obtain the Lattice result of voice data, comprise time point information and match likelihood value marking information, and change into many candidate informations;
C, according to time point information and match likelihood value marking information, obtain degree of confidence marking information;
D, the stronger speech model of employing are resequenced to many candidate informations, and are provided optimal identification result;
E, adopt many candidate informations, time point information and degree of confidence marking information to set up words level and phoneme level index database, constitute the back, and raw information encoded generate the raw information storehouse to index database;
F, input text to be retrieved and time point information change into aligned phoneme sequence with text to be retrieved, and utilize the phoneme confusion matrix, obtain similar aligned phoneme sequence, split into to be no less than 1 phonotactics;
G, word and aligned phoneme sequence enter the back respectively and inquire about to index database, obtain the entry position in one group of raw information storehouse and corresponding degree of confidence marking information, return successively according to degree of confidence marking information height;
H, enter into the raw information storehouse respectively and accurately mate,, return position candidate greater than confidence threshold value according to inlet number and degree of confidence marking Information Selection confidence threshold value.
Steps A is further comprising the steps of:
Audio data format is changed into WINDOWS WAV form, and sampling rate is 16 kilo hertzs.
In the steps A, the mode of employing computer and TV card is gathered the voice data in the TV programme; The mode of employing radio and sound card is gathered the voice data in the broadcast singal.
In the step F, text to be retrieved is changed into aligned phoneme sequence according to the letter-to-phone mode.
Adopted technical scheme of the present invention, at the speech recognition errors type that may occur, utilize its similarity in phone-level, and the obfuscation of introducing by the phoneme confusion matrix, can increase the quantity that retrieves to greatest extent,, introduce the mode that a plurality of phoneme set are built index jointly simultaneously at the high problem of phone-level repetition rate, under the prerequisite that guarantees system performance, improved retrieval rate greatly.
Description of drawings
Fig. 1 is the process flow diagram of multimedia messages fuzzy search in the specific embodiment of the invention.
Embodiment
Further specify technical scheme of the present invention below in conjunction with accompanying drawing and by embodiment.
Fig. 1 is the process flow diagram of multimedia messages fuzzy search in the specific embodiment of the invention.As shown in Figure 1, this multimedia information retrieval flow process may further comprise the steps:
Step 101, collection audio, video data.The mode of employing computer and TV card is gathered the voice data in the TV programme, the mode of employing radio and sound card is gathered the voice data in the broadcast singal, then audio data format is changed into WINDOWS WAV form (pcm does not have compression), sampling rate is 16 kilo hertzs.
Because the form that TV card and sound card are recorded determines, only need get final product at the specific format transcoding of programming.
Step 102, obtain the Lattice result of voice data, comprise time point information, quiet information and match likelihood value marking information, and change into many candidate informations.
Different with common recognition result, the recognition result of this embodiment is not the optimal result (claiming 1-Best again) on the conventional meaning, but the more rich decoding path that keeps in the speech recognition claims the Lattice format result again.The principal feature of this form is: contain abundant time point and quiet information and match likelihood value marking information, and can change into by the many candidate informations of speech, perhaps being called confusion network, and optimal result, can obtain on the confusion network than optimal identification result more performance.
Step 103, according to time point information and match likelihood value marking information, calculate the marking of assessment recognition effect, also claim degree of confidence marking information.
Step 104, the stronger speech model of employing are resequenced to many candidate informations, and are provided optimal identification result.
Step 105, adopt many candidate informations, time point information and degree of confidence marking information to set up words level and phoneme level index database, constitute the back, and raw information encoded generate the raw information storehouse to index database.
In this step, according to the principle of search engine, the multiple information of utilizing above step to obtain is carried out index to basic index level.Here using two-layer index level, is respectively words level and phone set, and wherein phoneme can simply be interpreted as initial consonant or simple or compound vowel of a Chinese syllable.This way is also seldom used in search engine, why increased the index of phoneme level, mainly be because identification error may appear in speech recognition, between these identification errors and the correct text certain correlativity is arranged again simultaneously, for example phoneme is still more similar, train the phoneme confusion matrix according to common identification error, therefore the index of phoneme level has been arranged, just can utilize the phoneme confusion matrix.The frequency of occurrences of considering phoneme simultaneously is higher than individual character far away, can cause a large amount of candidate result and reduces search efficiency, has therefore adopted the indexing means of a plurality of phonotactics, can improve search efficiency greatly under the prerequisite that guarantees search quality.Two layer indexs have constituted back to index database, and it has comprised time point and confidence information, simultaneously raw information are carried out the efficient coding compression and generate the raw information storehouse.
Step 106, input text to be retrieved and time point information, (Grapheme-to-Phoneme G2P) changes into aligned phoneme sequence with text to be retrieved, and utilizes the phoneme confusion matrix according to the letter-to-phone mode, obtain similar aligned phoneme sequence, split into a plurality of phonotactics.
Step 107, word and aligned phoneme sequence enter the back respectively and inquire about to index database, obtain the entry position in one group of raw information storehouse and corresponding degree of confidence marking information, return successively according to degree of confidence marking information height.
Step 108, enter into the raw information storehouse respectively and accurately mate,, return position candidate, browse, finish primary retrieval for the user greater than confidence threshold value according to inlet number and degree of confidence marking Information Selection confidence threshold value.
By this embodiment, can mark and build the storehouse more completely to multimedia messages, later stage inquiry can be meticulousr, index and navigate to interested position quickly.Utilize the index of phone-level, can increase the multimedia messages that finds greatly, utilize confidence information, can filter out identification is not good multimedia messages, more than two technology can avoid effectively because the retrieval error that the mistake of speech recognition is brought.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (4)

1. a multimedia messages fuzzy retrieval method is characterized in that, may further comprise the steps:
A, collection audio, video data;
B, obtain the Lattice result of voice data, comprise time point information and match likelihood value marking information, and change into many candidate informations;
C, according to time point information and match likelihood value marking information, obtain degree of confidence marking information;
D, the stronger speech model of employing are resequenced to many candidate informations, and are provided optimal identification result;
E, adopt many candidate informations, time point information and degree of confidence marking information to set up words level and phoneme level index database, constitute the back to index database, and with the multi-medium data generation multimedia database of encoding;
F, input text to be retrieved and time point information change into aligned phoneme sequence with text to be retrieved, and utilize the phoneme confusion matrix, obtain similar aligned phoneme sequence, split into to be no less than 1 phonotactics;
G, word and aligned phoneme sequence enter the back respectively and inquire about to index database, obtain the entry position in one group of raw information storehouse and corresponding degree of confidence marking information, return successively according to degree of confidence marking information height;
H, enter into the raw information storehouse respectively and accurately mate,, return position candidate greater than confidence threshold value according to inlet number and degree of confidence marking Information Selection confidence threshold value.
2. a kind of multimedia messages fuzzy retrieval method according to claim 1 is characterized in that steps A is further comprising the steps of:
Audio data format is changed into WINDOWS WAV form, and sampling rate is 16 kilo hertzs.
3. a kind of multimedia messages fuzzy retrieval method according to claim 1 is characterized in that, in the steps A, the mode of employing computer and TV card is gathered the voice data in the TV programme; The mode of employing radio and sound card is gathered the voice data in the broadcast singal.
4. a kind of multimedia messages fuzzy retrieval method according to claim 1 is characterized in that, in the step F, according to the letter-to-phone mode text to be retrieved is changed into aligned phoneme sequence.
CN 201110073048 2011-03-25 2011-03-25 Multi-media information fuzzy search method Expired - Fee Related CN102136001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110073048 CN102136001B (en) 2011-03-25 2011-03-25 Multi-media information fuzzy search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110073048 CN102136001B (en) 2011-03-25 2011-03-25 Multi-media information fuzzy search method

Publications (2)

Publication Number Publication Date
CN102136001A true CN102136001A (en) 2011-07-27
CN102136001B CN102136001B (en) 2012-12-26

Family

ID=44295787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110073048 Expired - Fee Related CN102136001B (en) 2011-03-25 2011-03-25 Multi-media information fuzzy search method

Country Status (1)

Country Link
CN (1) CN102136001B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164403A (en) * 2011-12-08 2013-06-19 深圳市北科瑞声科技有限公司 Generation method of video indexing data and system
CN103500579A (en) * 2013-10-10 2014-01-08 中国联合网络通信集团有限公司 Voice recognition method, device and system
CN104008132A (en) * 2014-05-04 2014-08-27 深圳市北科瑞声科技有限公司 Voice map searching method and system
CN112906369A (en) * 2021-02-19 2021-06-04 脸萌有限公司 Lyric file generation method and device
CN113096242A (en) * 2021-04-29 2021-07-09 平安科技(深圳)有限公司 Virtual anchor generation method and device, electronic equipment and storage medium
CN113744718A (en) * 2020-05-27 2021-12-03 海尔优家智能科技(北京)有限公司 Voice text output method and device, storage medium and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477527A (en) * 2008-12-30 2009-07-08 深圳市迅雷网络技术有限公司 Multimedia resource retrieval method and apparatus
CN101552003A (en) * 2009-02-25 2009-10-07 北京派瑞根科技开发有限公司 Media information processing method
CN101916251A (en) * 2009-03-26 2010-12-15 富士通株式会社 The storage medium of integrated indexing unit of multimedia and the integrated search program of multimedia

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477527A (en) * 2008-12-30 2009-07-08 深圳市迅雷网络技术有限公司 Multimedia resource retrieval method and apparatus
CN101552003A (en) * 2009-02-25 2009-10-07 北京派瑞根科技开发有限公司 Media information processing method
CN101916251A (en) * 2009-03-26 2010-12-15 富士通株式会社 The storage medium of integrated indexing unit of multimedia and the integrated search program of multimedia

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164403A (en) * 2011-12-08 2013-06-19 深圳市北科瑞声科技有限公司 Generation method of video indexing data and system
CN103164403B (en) * 2011-12-08 2016-03-16 深圳市北科瑞声科技有限公司 The generation method and system of video index data
CN103500579A (en) * 2013-10-10 2014-01-08 中国联合网络通信集团有限公司 Voice recognition method, device and system
CN103500579B (en) * 2013-10-10 2015-12-23 中国联合网络通信集团有限公司 Audio recognition method, Apparatus and system
CN104008132A (en) * 2014-05-04 2014-08-27 深圳市北科瑞声科技有限公司 Voice map searching method and system
CN113744718A (en) * 2020-05-27 2021-12-03 海尔优家智能科技(北京)有限公司 Voice text output method and device, storage medium and electronic device
CN112906369A (en) * 2021-02-19 2021-06-04 脸萌有限公司 Lyric file generation method and device
CN113096242A (en) * 2021-04-29 2021-07-09 平安科技(深圳)有限公司 Virtual anchor generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102136001B (en) 2012-12-26

Similar Documents

Publication Publication Date Title
CN101382937B (en) Multimedia resource processing method based on speech recognition and on-line teaching system thereof
CN102122506B (en) Method for recognizing voice
CN102136001B (en) Multi-media information fuzzy search method
CN101464896B (en) Voice fuzzy retrieval method and apparatus
CN103761261B (en) A kind of media search method and device based on speech recognition
CN101326572B (en) Speech recognition system with huge vocabulary
CN113326387B (en) Intelligent conference information retrieval method
CN102667773A (en) Search device, search method, and program
CN103730115A (en) Method and device for detecting keywords in voice
CN105159870A (en) Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
Levin et al. Automated closed captioning for Russian live broadcasting
US20150371627A1 (en) Voice dialog system using humorous speech and method thereof
CN114547373A (en) Method for intelligently identifying and searching programs based on audio
JP2010262413A (en) Voice information extraction device
JP2019020597A (en) End-to-end japanese voice recognition model learning device and program
CN106550268B (en) Video processing method and video processing device
WO2008100037A1 (en) The system and method for generating indexing information of multimedia data file using vocal data and retrieving indexing information of multimedia data file
CN115455946A (en) Voice recognition error correction method and device, electronic equipment and storage medium
Servan et al. Conceptual decoding from word lattices: application to the spoken dialogue corpus media
CN102117335B (en) Method for retrieving multimedia information
Salimbajevs Creating Lithuanian and Latvian speech corpora from inaccurately annotated web data
US20050125224A1 (en) Method and apparatus for fusion of recognition results from multiple types of data sources
WO2007105615A1 (en) Request content identification system, request content identification method using natural language, and program
de Jong et al. OLIVE: Speech-based video retrieval
Jong et al. Language-based multimedia information retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Multi-media information fuzzy search method

Effective date of registration: 20130605

Granted publication date: 20121226

Pledgee: Zhongguancun Beijing technology financing Company limited by guarantee

Pledgor: TVMining (Beijing) Media Technology Co., Ltd.

Registration number: 2013990000345

PLDC Enforcement, change and cancellation of contracts on pledge of patent right or utility model
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121226

Termination date: 20210325

CF01 Termination of patent right due to non-payment of annual fee