CN105005630B - The method of multi-dimensions test specific objective in full media - Google Patents
The method of multi-dimensions test specific objective in full media Download PDFInfo
- Publication number
- CN105005630B CN105005630B CN201510515893.8A CN201510515893A CN105005630B CN 105005630 B CN105005630 B CN 105005630B CN 201510515893 A CN201510515893 A CN 201510515893A CN 105005630 B CN105005630 B CN 105005630B
- Authority
- CN
- China
- Prior art keywords
- data
- engine
- search
- retrieval
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method of multi-dimensions test specific objective, comprises the concrete steps that in full media:According to search condition sample, the data type for the object reference sample data that search engine and detection identification engine will be retrieved and be identified is determined;According to search engine and detection identification the engine data type of object reference sample data that retrieve and identify, select matched detection identification engine;The each detection identification engine of analysis as a result, obtaining search key, target characteristic amount data, being sent to search engine as search condition is retrieved;Relevant each search engine retrieves qualified data from the target retrieval data of input, and records data slot and position occur;The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are summarized again, output of classifying;The retrieval of various ways different dimensions improves the recall ratio and precision ratio of data.
Description
Technical field
The present invention relates to a kind of to detect the method more particularly to a kind of full media that specific objective occurs from full media data
The method of middle multi-dimensions test specific objective.
Background technology
Full media information includes the data of the diversified forms such as word, voice, picture, video, is looked for from these information specific
Target (people, object), be related to the multinomial technology such as Application on Voiceprint Recognition, speech recognition, image recognition, video finger print, character analysis, be one
The complicated system engineering of item.Also, it is single since vocal print, voice, image recognition and video finger print technology are all in developing stage
One technology be unable to reach it is expected look into it is complete, look into the requirement of the performance indicators such as standard.Vocal print, voice, image, video refer in media
Line, text information have certain internal association, for example, video information generally comprises word, sound, video pictures, sound
Data include that can be identified as the sound of speaking of word, also include that speaker is different from other people biological characteristic etc., these information are logical
Content analysis is crossed, certain relationship can be established, this just provides technical foundation for retrieval common objective in several ways.
Based on studying for a long period of time to vocal print, voice, image, video finger print, text information, it has been found that can be by certain
Statistical analysis, extract in these information two, three, it is multinomial between common trait or description content, utilize a kind of retrieval
Mode as a result, be extended to several ways coordinate retrieval, integrated retrieval result is provided.For example, detecting from vocal print, judge to speak
People whom is, meanwhile, extract the piece segment information that this people speaks;After being aware of speaker whom being, it can be looked for from speech recognition
To the content for being related to this speaker;The picture and relevant video segments of speaker can also be inquired;It can also further look for
To relevant text information.
Since speech recognition, image recognition, video fingerprint recognition are to be based on using technologies, most of they such as DNN, HMM
Statistic analysis models, these technologies have certain defect, monotechnics means that cannot reach expected recognition effect.In order to carry
The performance of high monotechnics needs the data volume in the sample pattern library for greatly improving statistical analysis, and still, ambient noise is spoken
The external factor such as accent, word speed, the gender of people influence the performance of voice and Application on Voiceprint Recognition, shoot the illumination of image and video, divide
Resolution, background complexity also have a significant impact to image recognition, video fingerprint recognition, and monotechnics means cannot all reach satisfied
Effect improves the recall ratio of identification therefore, it is necessary to take multiple means to combine.
Invention content
The present invention is to retrieve the different type feature vector of full media information in several ways, such as:Text key word,
Vocal print, voice content, image color, image, semantic etc. summarize the every terms of information of target to be inquired, and more comprehensively can obtain and retrieve
The relevant information metadata segment of target and the position for recording metadata, the retrieval of various ways different dimensions improve data
Recall ratio and precision ratio.
To achieve the above object, the technical solution adopted in the present invention is:The side of multi-dimensions test specific objective in full media
Method is as follows:
S1:According to search condition sample, such as text key word, vocal print feature voice, content voice, feature image, feature
Video determines the data type for the object reference sample data that search engine and detection identification engine will be retrieved and be identified;
S2:According to search engine and detection identification the engine data class of object reference sample data that retrieve and identify
Type selects matched detection to identify engine, as keyword identifies engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape
Identify engine;
S3:The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as retrieval
Condition is sent to search engine and is retrieved;
S4:Relevant each search engine retrieves qualified data from the target retrieval data of input, and records
Data slot and there is position;
S5:The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are carried out again
Summarize, output of classifying.
Further, then in step S2, if any multiple and different data type search conditions, then multiple detection identifications is selected to draw
It holds up.
Further, then in step S3, as containing 3 or more keywords, then further decomposed into pass in search condition
Keyword group.
Further, then in step S3, if certain item data is without enabling corresponding identification engines handle data, by item
Part value is arranged to null value.
Further, the target retrieval data in step S4 come from database, data file, network flow-medium, including:Text
Sheet, voice, picture, video data.
Further, retrieval result is one or more of text, voice, picture, video in step S5, for language
The retrieval result of sound, video, then extract association contents fragment or record access point and duration.
Further, in step s 5, retrieval result is realized according to following formula:
Variable and symbol description in formula:
SR, retrieval result;SEi, search engine;I, engine number, e.g., SE1Represent vocal print search engine, SE2Represent voice
Search engine;N indicates the data type number in full media;REj, detection identification engine, detection identification engine is with target detection
Can have the function of detection and identification two simultaneously for different data with the function of target identification, can also only have single
Function, difference detection identification engine handle different data content;J, detection identification engine number, for example, RE1Represent Application on Voiceprint Recognition
Engine, whom identification talker is;RE2Speech recognition engine is represented, identifies content and keyword in voice;K indicates sample
Catalogue number(Cat.No.) in library also illustrates that specimen discerning cycle-index;M indicates that the sample number in sample database, how many sample can identify
Certification;pj, search engine and detection identify the object reference sample data that engine will be retrieved and be identified;qi, search engine retrieving pair
As i.e. search engine searched targets information from which data.
There is two layer functions of detection identification and retrieval, different data as further, described detection identification engine
The engine that type object is handled is as different disposal dimension.
Patent of the present invention due to using the technology described above, can obtain following technique effect:In several ways, it examines
The different type feature vector of Suo Quan media informations, such as text key word, vocal print, voice content, image color, image, semantic
Deng summarizing the every terms of information of target to be inquired, can more comprehensively obtain and the relevant information metadata segment of searched targets and record
The position of metadata.The retrieval of various ways different dimensions improves the recall ratio and precision ratio of data.Using present invention side
Method can make up the low problem of the recall ratio of single identification engine, the recall ratio and precision ratio of full media retrieval be improved, for not
Same application environment and sample retrieval, can improve 10%-30% by recall ratio.
Description of the drawings
The present invention shares 1 width of attached drawing:
Fig. 1 is the flow chart of the present invention.
Specific embodiment
Below by specific embodiment, and in conjunction with attached drawing, explanation that the technical solution of the present invention is further explained.
As shown in Figure 1, the present invention is to provide:The method of multi-dimensions test specific objective, specific steps in a kind of full media
It is as follows:
S1:According to search condition sample, as text key word, text sentence, vocal print feature voice (voice of speaker or
The voice data that other objects to be retrieved are sent out), content voice (voice data that searched targets are mentioned in voice), characteristic pattern
Piece (face, humanoid, body form, color, coherent condition feature image), feature video (it is a bit of containing face, humanoid, object
The video data of shape, color, coherent condition feature), determine what search engine and detection identification engine will be retrieved and be identified
The data type of object reference sample data;Search condition sample similar to ordinary search engine search key, due to full matchmaker
The condition of physical examination rope may be the combination of one or more of text, voice (segment), picture, video (segment) form.Text
It can be " keyword " combinations of words;It can also be text sentence;It can also be the mixing text of Chinese and other language.Voice
(segment) is one section of voice data of input, and acquiescence supports WAV formats in the method for the invention, and the voice data of extended formatting can
With conversion, the content of voice can be complete sentence, can also be phrase.Picture uses basic BMP formats, extended formatting
Can be converted to BMP uses, to have target person, the object of retrieval in picture, lowest resolution 32X32, color value is unlimited.Depending on
Frequently based on AVI, extended formatting can be converted (segment) format, including the people to be retrieved, target, the target to be retrieved is differentiated
Rate is not less than 32X32 pixels.
S2:According to search engine and detection identification the engine data class of object reference sample data that retrieve and identify
Type selects matched detection to identify engine, as keyword identifies engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape
Identify engine;RE in Fig. 11……RENDifferent detection identification engines is represented, detection identification engine can detect or identify text
The features such as this keyword, vocal print, voice semanteme, video finger print, shape, object color, coherent condition.
S3:The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as retrieval
Condition is sent to search engine and is retrieved;Detection identification engine handling result:
Keyword detection identifies engine, extracts keyword in text sentence;
Vocal print detection identification engine identifies that whom speaker is, the I D or name of speaker are referred to as keyword and vocal print spy
Sign vector is for searching for;
Color detection identifies engine, judges the body color of target in picture, color value is as keyword for searching for;
SHAPE DETECTION identifies engine, judges that the shape of target in picture, shape are used as keyword and morphological feature vector
In search;
Social event (coherent condition) detection identification engine, judges the coherent condition of object target in picture, coherent condition
Recognition result is as keyword and coherent condition characteristic vector for searching for.
S4:Relevant each search engine retrieves qualified data from the target retrieval data of input, and records
Data slot and there is position;Searched targets data are the metadata sets such as text, voice, picture, video, can come from data
Library, data file, data flow, the present invention are exactly to be retrieved according to search condition sample from these data.
Video and Streaming Media support AVI, MPEG-1/2/4, H.263/264/265, M-JPEG, MP4;
Voice data is supported:WAV、MP3、PCM;
Picture is supported:BMP、JPG/JPEG、G i f、T i ff、PNG、P I C.
S5:The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are carried out again
Summarize, output of classifying.The present invention is directed to the testing result of full media information, and output content is as follows:
Text data:Text fragments;
Sound, video data:The access point time of target appearance, duration;
Picture file:List of file names, the store path of target appearance.
Again in step S2, if any multiple and different data type search conditions, then multiple detection identification engines, then step are selected
In S3, search condition is complicated, then further decomposes into crucial phrase, then step S3, the correlated condition in search condition cannot
It, will if certain item data is without enabling corresponding identification engines handle data for null value (whether so describe accurate) herein
Condition value is arranged to null value, and the target retrieval data in step S4 come from database, data file, network flow-medium, including:Text
Sheet, voice, picture, video data, retrieval result is one or more of text, voice, picture, video in step S5, right
In the retrieval result of voice, video, then extract association contents fragment or record access point and duration, in step s 5, retrieval result
It is to be realized according to following formula:
Variable and symbol description in formula:SR, retrieval result;SEi, search engine, different search engines are from detection and identify
It is obtained in engine as a result, retrieving specific objective data in media data;I, engine number, e.g., SE1Vocal print search is represented to draw
It holds up, SE2Represent Voice search engine;N indicates the data type number in full media;REj, detection identification engine, detection, which identifies, draws
Holding up has the function of target detection and target identification, for different data, can have the function of detection and identification two simultaneously,
Can only have simple function, difference detection identification engine to handle different data content;J, detection identification engine number, for example,
RE1Application on Voiceprint Recognition engine is represented, whom identification talker is;RE2Speech recognition engine is represented, identifies the content in voice and pass
Keyword;K indicates the catalogue number(Cat.No.) in sample database, also illustrates that specimen discerning cycle-index;M indicates the sample number in sample database, has
How many samples can identify certification;pj, search engine and detection identify the object reference sample data that engine will be retrieved and be identified;
qi, search engine retrieving object, i.e. search engine the searched targets information from which data.
There are the detection identification engine two layer functions of detection identification and retrieval, different types of data object to be handled
Engine as different disposal dimension.
Since for example word, voice, picture, video data structure are complicated, contain much information, diversification of forms for full media information,
Single data retrieval method cannot obtain satisfied effect, and the recall ratio and precision ratio of data are relatively low.Especially voice,
Image internal characteristics type is complicated, and to different types of characteristic key, what is obtained is different as a result, for example, to voice data sound
The retrieval of line feature can judge that whom speaker is, be identified to voice semantic content, the content text etc. that can be spoken.
The present invention provides a kind of dynamic, the method for the full media data information of detection of various dimensions, by the data for judging full media information
Type, dynamic load and the matched various retrievals of data type and identification engine, from different directions examine full media data
It surveys, obtains metadata clips associated with inquiry target and data storage location.Full media data text, voice, picture, figure
As containing text key word, vocal print, voice content, languages, image, semantic, color of image, target shape, goal behavior, mesh
The information such as coherent condition are marked, different data processing needs specific engine.Multi-dimensions test is exactly by text key word retrieval, vocal print
Identification, voice content identification, languages identification, image, semantic analysis, color of image identification, image object detection, target shape are distinguished
Know, goal behavior identification, target coherent condition identification etc. engines according to detection identification, retrieval two layers, different data objects into
The engine of row processing is as different disposal dimension, according to detection object data type difference, from different dimensions detection, identification, inspection
Rope specific objective.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Any one skilled in the art in the technical scope of present disclosure, according to the technique and scheme of the present invention and its
Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.
Claims (8)
1. the method for multi-dimensions test specific objective in full media, which is characterized in that be as follows:
S1:According to search condition sample, the data type of the object reference sample data of search and identification is determined;
S2:According to search engine and detection identification the engine data type of object reference sample data that retrieve and identify,
Select matched detection identification engine;
S3:The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as search condition
Search engine is sent to be retrieved;
Keyword detection identifies engine, extracts keyword in text sentence;
Vocal print detection identification engine, identifies that whom speaker is, the ID or name of speaker are referred to as keyword and vocal print Characteristic Vectors
Amount is for searching for;
Color detection identifies engine, judges the body color of target in picture, color value is as keyword for searching for;
SHAPE DETECTION identifies engine, judges the shape of target in picture, shape is as keyword and morphological feature vector for searching
Rope;
Social event detection identification engine, judges the coherent condition of object target in picture, coherent condition recognition result is as pass
Keyword and coherent condition characteristic vector are for searching for;
S4:Relevant each search engine retrieves qualified data from the target retrieval data of input, and records data
Segment and there is position;
S5:The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are converged again
Always, classification output.
2. the method for multi-dimensions test specific objective in full media according to claim 1, which is characterized in that step S2 again
In, if any multiple and different data type search conditions, then select multiple detection identification engines.
3. the method for multi-dimensions test specific objective in full media according to claim 1, which is characterized in that step S3 again
In, as containing 3 or more keywords, then further decomposed into crucial phrase in search condition.
4. the method for multi-dimensions test specific objective in full media according to claim 3, which is characterized in that step S3 again
In, if certain item data is without enabling corresponding identification engines handle data, condition value is arranged to null value.
5. according to the method for multi-dimensions test specific objective in the full media of claim 1-4 any one of them, which is characterized in that step
Target retrieval data in rapid S4 come from database, data file, network flow-medium, including:Text, voice, picture, video counts
According to.
6. the method for multi-dimensions test specific objective in full media according to claim 5, which is characterized in that examined in step S5
Rope for the retrieval result of voice, video, then extracts association the result is that one or more of text, voice, picture, video
Contents fragment or record access point and duration.
7. the method for multi-dimensions test specific objective in full media according to claim 6, which is characterized in that in step S5
In, retrieval result is realized according to following formula:
Variable and symbol description in formula:SR, retrieval result;SEi, search engine;I, engine number;N indicates the number in full media
According to number of types;REj, detection identification engine;J, detection identification engine number;K indicates the catalogue number(Cat.No.) in sample database, also illustrates that sample
This identification cycle-index;M indicates the sample number in sample database;pj, search engine and detection identify what engine will be retrieved and be identified
Object reference sample data;qi, search engine retrieving object.
8. the method for multi-dimensions test specific objective in full media according to claim 2 or 4, which is characterized in that described
Detection identification engine has two layer functions of detection identification and retrieval, and the engine that different types of data object is handled is as different
Handle dimension.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510515893.8A CN105005630B (en) | 2015-08-18 | 2015-08-18 | The method of multi-dimensions test specific objective in full media |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510515893.8A CN105005630B (en) | 2015-08-18 | 2015-08-18 | The method of multi-dimensions test specific objective in full media |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105005630A CN105005630A (en) | 2015-10-28 |
CN105005630B true CN105005630B (en) | 2018-07-13 |
Family
ID=54378306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510515893.8A Expired - Fee Related CN105005630B (en) | 2015-08-18 | 2015-08-18 | The method of multi-dimensions test specific objective in full media |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105005630B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105677799A (en) * | 2015-12-31 | 2016-06-15 | 宇龙计算机通信科技(深圳)有限公司 | Picture retrieval method and system |
CN106024013B (en) * | 2016-04-29 | 2022-01-14 | 努比亚技术有限公司 | Voice data searching method and system |
CN106101748B (en) * | 2016-07-20 | 2020-04-28 | 东软集团股份有限公司 | Program processing method and device |
CN109271533A (en) * | 2018-09-21 | 2019-01-25 | 深圳市九洲电器有限公司 | A kind of multimedia document retrieval method |
CN109299324B (en) * | 2018-10-19 | 2022-03-04 | 四川巧夺天工信息安全智能设备有限公司 | Method for searching label type video file |
CN110287384B (en) * | 2019-06-10 | 2021-08-31 | 北京百度网讯科技有限公司 | Intelligent service method, device and equipment |
CN110677716B (en) * | 2019-08-20 | 2022-02-01 | 咪咕音乐有限公司 | Audio processing method, electronic device, and storage medium |
CN114817717A (en) * | 2022-04-21 | 2022-07-29 | 国科华盾(北京)科技有限公司 | Search method, search device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101479728A (en) * | 2006-06-28 | 2009-07-08 | 微软公司 | Visual and Multidimensional Search |
CN102402593A (en) * | 2010-11-05 | 2012-04-04 | 微软公司 | Multi-modal approach to search query input |
CN102855317A (en) * | 2012-08-31 | 2013-01-02 | 王晖 | Multimode indexing method and system based on demonstration video |
CN104598585A (en) * | 2015-01-15 | 2015-05-06 | 百度在线网络技术(北京)有限公司 | Information search method and information search device |
-
2015
- 2015-08-18 CN CN201510515893.8A patent/CN105005630B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101479728A (en) * | 2006-06-28 | 2009-07-08 | 微软公司 | Visual and Multidimensional Search |
CN102402593A (en) * | 2010-11-05 | 2012-04-04 | 微软公司 | Multi-modal approach to search query input |
CN102855317A (en) * | 2012-08-31 | 2013-01-02 | 王晖 | Multimode indexing method and system based on demonstration video |
CN104598585A (en) * | 2015-01-15 | 2015-05-06 | 百度在线网络技术(北京)有限公司 | Information search method and information search device |
Also Published As
Publication number | Publication date |
---|---|
CN105005630A (en) | 2015-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105005630B (en) | The method of multi-dimensions test specific objective in full media | |
CN111160017B (en) | Keyword extraction method, phonetics scoring method and phonetics recommendation method | |
CN108073568B (en) | Keyword extraction method and device | |
US9324323B1 (en) | Speech recognition using topic-specific language models | |
JP4880258B2 (en) | Method and apparatus for natural language call routing using reliability scores | |
WO2020228173A1 (en) | Illegal speech detection method, apparatus and device and computer-readable storage medium | |
US8996371B2 (en) | Method and system for automatic domain adaptation in speech recognition applications | |
KR102241972B1 (en) | Answering questions using environmental context | |
US20150100307A1 (en) | Text segmentation with multiple granularity levels | |
US20120124029A1 (en) | Cross media knowledge storage, management and information discovery and retrieval | |
Lowe et al. | Incorporating unstructured textual knowledge sources into neural dialogue systems | |
CN109902285B (en) | Corpus classification method, corpus classification device, computer equipment and storage medium | |
KR20050074991A (en) | Content retrieval based on semantic association | |
JP2019053126A (en) | Growth type interactive device | |
JP2013521567A (en) | System including client computing device, method of tagging media objects, and method of searching a digital database including audio tagged media objects | |
JP2004005600A (en) | Method and system for indexing and retrieving document stored in database | |
CN111428466B (en) | Legal document analysis method and device | |
CN112912873A (en) | Dynamically suppress query replies in search | |
CN113806588A (en) | Method and device for searching video | |
CN112307364A (en) | A Character Representation Oriented Extraction Method of News Text Occurrence | |
US8862582B2 (en) | System and method of organizing images | |
Dunn et al. | Language identification for austronesian languages | |
US9430800B2 (en) | Method and apparatus for trade interaction chain reconstruction | |
CN115935953A (en) | False news detection method, device, electronic device and storage medium | |
Gao et al. | Support for interactive identification of mentioned entities in conversational speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180713 Termination date: 20190818 |
|
CF01 | Termination of patent right due to non-payment of annual fee |