CN105005630B

CN105005630B - The method of multi-dimensions test specific objective in full media

Info

Publication number: CN105005630B
Application number: CN201510515893.8A
Authority: CN
Inventors: 薛丹; 陈淑珊; 张松涛; 迟立明
Original assignee: REDASEN TECHNOLOGY (DALIAN) CO Ltd
Current assignee: REDASEN TECHNOLOGY (DALIAN) CO Ltd
Priority date: 2015-08-18
Filing date: 2015-08-18
Publication date: 2018-07-13
Anticipated expiration: 2035-08-18
Also published as: CN105005630A

Abstract

The method of multi-dimensions test specific objective, comprises the concrete steps that in full media：According to search condition sample, the data type for the object reference sample data that search engine and detection identification engine will be retrieved and be identified is determined；According to search engine and detection identification the engine data type of object reference sample data that retrieve and identify, select matched detection identification engine；The each detection identification engine of analysis as a result, obtaining search key, target characteristic amount data, being sent to search engine as search condition is retrieved；Relevant each search engine retrieves qualified data from the target retrieval data of input, and records data slot and position occur；The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are summarized again, output of classifying；The retrieval of various ways different dimensions improves the recall ratio and precision ratio of data.

Description

The method of multi-dimensions test specific objective in full media

Technical field

The present invention relates to a kind of to detect the method more particularly to a kind of full media that specific objective occurs from full media data The method of middle multi-dimensions test specific objective.

Background technology

Full media information includes the data of the diversified forms such as word, voice, picture, video, is looked for from these information specific Target (people, object), be related to the multinomial technology such as Application on Voiceprint Recognition, speech recognition, image recognition, video finger print, character analysis, be one The complicated system engineering of item.Also, it is single since vocal print, voice, image recognition and video finger print technology are all in developing stage One technology be unable to reach it is expected look into it is complete, look into the requirement of the performance indicators such as standard.Vocal print, voice, image, video refer in media Line, text information have certain internal association, for example, video information generally comprises word, sound, video pictures, sound Data include that can be identified as the sound of speaking of word, also include that speaker is different from other people biological characteristic etc., these information are logical Content analysis is crossed, certain relationship can be established, this just provides technical foundation for retrieval common objective in several ways.

Based on studying for a long period of time to vocal print, voice, image, video finger print, text information, it has been found that can be by certain Statistical analysis, extract in these information two, three, it is multinomial between common trait or description content, utilize a kind of retrieval Mode as a result, be extended to several ways coordinate retrieval, integrated retrieval result is provided.For example, detecting from vocal print, judge to speak People whom is, meanwhile, extract the piece segment information that this people speaks；After being aware of speaker whom being, it can be looked for from speech recognition To the content for being related to this speaker；The picture and relevant video segments of speaker can also be inquired；It can also further look for To relevant text information.

Since speech recognition, image recognition, video fingerprint recognition are to be based on using technologies, most of they such as DNN, HMM Statistic analysis models, these technologies have certain defect, monotechnics means that cannot reach expected recognition effect.In order to carry The performance of high monotechnics needs the data volume in the sample pattern library for greatly improving statistical analysis, and still, ambient noise is spoken The external factor such as accent, word speed, the gender of people influence the performance of voice and Application on Voiceprint Recognition, shoot the illumination of image and video, divide Resolution, background complexity also have a significant impact to image recognition, video fingerprint recognition, and monotechnics means cannot all reach satisfied Effect improves the recall ratio of identification therefore, it is necessary to take multiple means to combine.

Invention content

The present invention is to retrieve the different type feature vector of full media information in several ways, such as：Text key word, Vocal print, voice content, image color, image, semantic etc. summarize the every terms of information of target to be inquired, and more comprehensively can obtain and retrieve The relevant information metadata segment of target and the position for recording metadata, the retrieval of various ways different dimensions improve data Recall ratio and precision ratio.

To achieve the above object, the technical solution adopted in the present invention is：The side of multi-dimensions test specific objective in full media Method is as follows：

S1：According to search condition sample, such as text key word, vocal print feature voice, content voice, feature image, feature Video determines the data type for the object reference sample data that search engine and detection identification engine will be retrieved and be identified；

S2：According to search engine and detection identification the engine data class of object reference sample data that retrieve and identify Type selects matched detection to identify engine, as keyword identifies engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape Identify engine；

S3：The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as retrieval Condition is sent to search engine and is retrieved；

S4：Relevant each search engine retrieves qualified data from the target retrieval data of input, and records Data slot and there is position；

S5：The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are carried out again Summarize, output of classifying.

Further, then in step S2, if any multiple and different data type search conditions, then multiple detection identifications is selected to draw It holds up.

Further, then in step S3, as containing 3 or more keywords, then further decomposed into pass in search condition Keyword group.

Further, then in step S3, if certain item data is without enabling corresponding identification engines handle data, by item Part value is arranged to null value.

Further, the target retrieval data in step S4 come from database, data file, network flow-medium, including：Text Sheet, voice, picture, video data.

Further, retrieval result is one or more of text, voice, picture, video in step S5, for language The retrieval result of sound, video, then extract association contents fragment or record access point and duration.

Further, in step s 5, retrieval result is realized according to following formula：

Variable and symbol description in formula：

SR, retrieval result；SE_i, search engine；I, engine number, e.g., SE₁Represent vocal print search engine, SE₂Represent voice Search engine；N indicates the data type number in full media；RE_j, detection identification engine, detection identification engine is with target detection Can have the function of detection and identification two simultaneously for different data with the function of target identification, can also only have single Function, difference detection identification engine handle different data content；J, detection identification engine number, for example, RE₁Represent Application on Voiceprint Recognition Engine, whom identification talker is；RE₂Speech recognition engine is represented, identifies content and keyword in voice；K indicates sample Catalogue number(Cat.No.) in library also illustrates that specimen discerning cycle-index；M indicates that the sample number in sample database, how many sample can identify Certification；p_j, search engine and detection identify the object reference sample data that engine will be retrieved and be identified；q_i, search engine retrieving pair As i.e. search engine searched targets information from which data.

There is two layer functions of detection identification and retrieval, different data as further, described detection identification engine The engine that type object is handled is as different disposal dimension.

Patent of the present invention due to using the technology described above, can obtain following technique effect：In several ways, it examines The different type feature vector of Suo Quan media informations, such as text key word, vocal print, voice content, image color, image, semantic Deng summarizing the every terms of information of target to be inquired, can more comprehensively obtain and the relevant information metadata segment of searched targets and record The position of metadata.The retrieval of various ways different dimensions improves the recall ratio and precision ratio of data.Using present invention side Method can make up the low problem of the recall ratio of single identification engine, the recall ratio and precision ratio of full media retrieval be improved, for not Same application environment and sample retrieval, can improve 10%-30% by recall ratio.

Description of the drawings

The present invention shares 1 width of attached drawing：

Fig. 1 is the flow chart of the present invention.

Specific embodiment

Below by specific embodiment, and in conjunction with attached drawing, explanation that the technical solution of the present invention is further explained.

As shown in Figure 1, the present invention is to provide：The method of multi-dimensions test specific objective, specific steps in a kind of full media It is as follows：

S1：According to search condition sample, as text key word, text sentence, vocal print feature voice (voice of speaker or The voice data that other objects to be retrieved are sent out), content voice (voice data that searched targets are mentioned in voice), characteristic pattern Piece (face, humanoid, body form, color, coherent condition feature image), feature video (it is a bit of containing face, humanoid, object The video data of shape, color, coherent condition feature), determine what search engine and detection identification engine will be retrieved and be identified The data type of object reference sample data；Search condition sample similar to ordinary search engine search key, due to full matchmaker The condition of physical examination rope may be the combination of one or more of text, voice (segment), picture, video (segment) form.Text It can be " keyword " combinations of words；It can also be text sentence；It can also be the mixing text of Chinese and other language.Voice (segment) is one section of voice data of input, and acquiescence supports WAV formats in the method for the invention, and the voice data of extended formatting can With conversion, the content of voice can be complete sentence, can also be phrase.Picture uses basic BMP formats, extended formatting Can be converted to BMP uses, to have target person, the object of retrieval in picture, lowest resolution 32X32, color value is unlimited.Depending on Frequently based on AVI, extended formatting can be converted (segment) format, including the people to be retrieved, target, the target to be retrieved is differentiated Rate is not less than 32X32 pixels.

S2：According to search engine and detection identification the engine data class of object reference sample data that retrieve and identify Type selects matched detection to identify engine, as keyword identifies engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape Identify engine；RE in Fig. 1₁……RE_NDifferent detection identification engines is represented, detection identification engine can detect or identify text The features such as this keyword, vocal print, voice semanteme, video finger print, shape, object color, coherent condition.

S3：The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as retrieval Condition is sent to search engine and is retrieved；Detection identification engine handling result：

Keyword detection identifies engine, extracts keyword in text sentence；

Vocal print detection identification engine identifies that whom speaker is, the I D or name of speaker are referred to as keyword and vocal print spy Sign vector is for searching for；

Color detection identifies engine, judges the body color of target in picture, color value is as keyword for searching for；

SHAPE DETECTION identifies engine, judges that the shape of target in picture, shape are used as keyword and morphological feature vector In search；

Social event (coherent condition) detection identification engine, judges the coherent condition of object target in picture, coherent condition Recognition result is as keyword and coherent condition characteristic vector for searching for.

S4：Relevant each search engine retrieves qualified data from the target retrieval data of input, and records Data slot and there is position；Searched targets data are the metadata sets such as text, voice, picture, video, can come from data Library, data file, data flow, the present invention are exactly to be retrieved according to search condition sample from these data.

Video and Streaming Media support AVI, MPEG-1/2/4, H.263/264/265, M-JPEG, MP4；

Voice data is supported:WAV、MP3、PCM；

Picture is supported:BMP、JPG/JPEG、G i f、T i ff、PNG、P I C.

S5：The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are carried out again Summarize, output of classifying.The present invention is directed to the testing result of full media information, and output content is as follows：

Text data：Text fragments；

Sound, video data：The access point time of target appearance, duration；

Picture file：List of file names, the store path of target appearance.

Again in step S2, if any multiple and different data type search conditions, then multiple detection identification engines, then step are selected In S3, search condition is complicated, then further decomposes into crucial phrase, then step S3, the correlated condition in search condition cannot It, will if certain item data is without enabling corresponding identification engines handle data for null value (whether so describe accurate) herein Condition value is arranged to null value, and the target retrieval data in step S4 come from database, data file, network flow-medium, including：Text Sheet, voice, picture, video data, retrieval result is one or more of text, voice, picture, video in step S5, right In the retrieval result of voice, video, then extract association contents fragment or record access point and duration, in step s 5, retrieval result It is to be realized according to following formula：

Variable and symbol description in formula：SR, retrieval result；SE_i, search engine, different search engines are from detection and identify It is obtained in engine as a result, retrieving specific objective data in media data；I, engine number, e.g., SE₁Vocal print search is represented to draw It holds up, SE₂Represent Voice search engine；N indicates the data type number in full media；RE_j, detection identification engine, detection, which identifies, draws Holding up has the function of target detection and target identification, for different data, can have the function of detection and identification two simultaneously, Can only have simple function, difference detection identification engine to handle different data content；J, detection identification engine number, for example, RE₁Application on Voiceprint Recognition engine is represented, whom identification talker is；RE₂Speech recognition engine is represented, identifies the content in voice and pass Keyword；K indicates the catalogue number(Cat.No.) in sample database, also illustrates that specimen discerning cycle-index；M indicates the sample number in sample database, has How many samples can identify certification；p_j, search engine and detection identify the object reference sample data that engine will be retrieved and be identified； q_i, search engine retrieving object, i.e. search engine the searched targets information from which data.

There are the detection identification engine two layer functions of detection identification and retrieval, different types of data object to be handled Engine as different disposal dimension.

Since for example word, voice, picture, video data structure are complicated, contain much information, diversification of forms for full media information, Single data retrieval method cannot obtain satisfied effect, and the recall ratio and precision ratio of data are relatively low.Especially voice, Image internal characteristics type is complicated, and to different types of characteristic key, what is obtained is different as a result, for example, to voice data sound The retrieval of line feature can judge that whom speaker is, be identified to voice semantic content, the content text etc. that can be spoken. The present invention provides a kind of dynamic, the method for the full media data information of detection of various dimensions, by the data for judging full media information Type, dynamic load and the matched various retrievals of data type and identification engine, from different directions examine full media data It surveys, obtains metadata clips associated with inquiry target and data storage location.Full media data text, voice, picture, figure As containing text key word, vocal print, voice content, languages, image, semantic, color of image, target shape, goal behavior, mesh The information such as coherent condition are marked, different data processing needs specific engine.Multi-dimensions test is exactly by text key word retrieval, vocal print Identification, voice content identification, languages identification, image, semantic analysis, color of image identification, image object detection, target shape are distinguished Know, goal behavior identification, target coherent condition identification etc. engines according to detection identification, retrieval two layers, different data objects into The engine of row processing is as different disposal dimension, according to detection object data type difference, from different dimensions detection, identification, inspection Rope specific objective.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Any one skilled in the art in the technical scope of present disclosure, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims

1. the method for multi-dimensions test specific objective in full media, which is characterized in that be as follows：

S1：According to search condition sample, the data type of the object reference sample data of search and identification is determined；

S2：According to search engine and detection identification the engine data type of object reference sample data that retrieve and identify, Select matched detection identification engine；

S3：The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as search condition Search engine is sent to be retrieved；

Keyword detection identifies engine, extracts keyword in text sentence；

Vocal print detection identification engine, identifies that whom speaker is, the ID or name of speaker are referred to as keyword and vocal print Characteristic Vectors Amount is for searching for；

SHAPE DETECTION identifies engine, judges the shape of target in picture, shape is as keyword and morphological feature vector for searching Rope；

Social event detection identification engine, judges the coherent condition of object target in picture, coherent condition recognition result is as pass Keyword and coherent condition characteristic vector are for searching for；

S4：Relevant each search engine retrieves qualified data from the target retrieval data of input, and records data Segment and there is position；

S5：The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are converged again Always, classification output.

2. the method for multi-dimensions test specific objective in full media according to claim 1, which is characterized in that step S2 again In, if any multiple and different data type search conditions, then select multiple detection identification engines.

3. the method for multi-dimensions test specific objective in full media according to claim 1, which is characterized in that step S3 again In, as containing 3 or more keywords, then further decomposed into crucial phrase in search condition.

4. the method for multi-dimensions test specific objective in full media according to claim 3, which is characterized in that step S3 again In, if certain item data is without enabling corresponding identification engines handle data, condition value is arranged to null value.

5. according to the method for multi-dimensions test specific objective in the full media of claim 1-4 any one of them, which is characterized in that step Target retrieval data in rapid S4 come from database, data file, network flow-medium, including：Text, voice, picture, video counts According to.

6. the method for multi-dimensions test specific objective in full media according to claim 5, which is characterized in that examined in step S5 Rope for the retrieval result of voice, video, then extracts association the result is that one or more of text, voice, picture, video Contents fragment or record access point and duration.

7. the method for multi-dimensions test specific objective in full media according to claim 6, which is characterized in that in step S5 In, retrieval result is realized according to following formula：

Variable and symbol description in formula：SR, retrieval result；SE_i, search engine；I, engine number；N indicates the number in full media According to number of types；RE_j, detection identification engine；J, detection identification engine number；K indicates the catalogue number(Cat.No.) in sample database, also illustrates that sample This identification cycle-index；M indicates the sample number in sample database；p_j, search engine and detection identify what engine will be retrieved and be identified Object reference sample data；q_i, search engine retrieving object.

8. the method for multi-dimensions test specific objective in full media according to claim 2 or 4, which is characterized in that described Detection identification engine has two layer functions of detection identification and retrieval, and the engine that different types of data object is handled is as different Handle dimension.