[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN105005630B - The method of multi-dimensions test specific objective in full media - Google Patents

The method of multi-dimensions test specific objective in full media Download PDF

Info

Publication number
CN105005630B
CN105005630B CN201510515893.8A CN201510515893A CN105005630B CN 105005630 B CN105005630 B CN 105005630B CN 201510515893 A CN201510515893 A CN 201510515893A CN 105005630 B CN105005630 B CN 105005630B
Authority
CN
China
Prior art keywords
data
engine
search
retrieval
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510515893.8A
Other languages
Chinese (zh)
Other versions
CN105005630A (en
Inventor
薛丹
陈淑珊
张松涛
迟立明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
REDASEN TECHNOLOGY (DALIAN) CO Ltd
Original Assignee
REDASEN TECHNOLOGY (DALIAN) CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by REDASEN TECHNOLOGY (DALIAN) CO Ltd filed Critical REDASEN TECHNOLOGY (DALIAN) CO Ltd
Priority to CN201510515893.8A priority Critical patent/CN105005630B/en
Publication of CN105005630A publication Critical patent/CN105005630A/en
Application granted granted Critical
Publication of CN105005630B publication Critical patent/CN105005630B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method of multi-dimensions test specific objective, comprises the concrete steps that in full media:According to search condition sample, the data type for the object reference sample data that search engine and detection identification engine will be retrieved and be identified is determined;According to search engine and detection identification the engine data type of object reference sample data that retrieve and identify, select matched detection identification engine;The each detection identification engine of analysis as a result, obtaining search key, target characteristic amount data, being sent to search engine as search condition is retrieved;Relevant each search engine retrieves qualified data from the target retrieval data of input, and records data slot and position occur;The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are summarized again, output of classifying;The retrieval of various ways different dimensions improves the recall ratio and precision ratio of data.

Description

The method of multi-dimensions test specific objective in full media
Technical field
The present invention relates to a kind of to detect the method more particularly to a kind of full media that specific objective occurs from full media data The method of middle multi-dimensions test specific objective.
Background technology
Full media information includes the data of the diversified forms such as word, voice, picture, video, is looked for from these information specific Target (people, object), be related to the multinomial technology such as Application on Voiceprint Recognition, speech recognition, image recognition, video finger print, character analysis, be one The complicated system engineering of item.Also, it is single since vocal print, voice, image recognition and video finger print technology are all in developing stage One technology be unable to reach it is expected look into it is complete, look into the requirement of the performance indicators such as standard.Vocal print, voice, image, video refer in media Line, text information have certain internal association, for example, video information generally comprises word, sound, video pictures, sound Data include that can be identified as the sound of speaking of word, also include that speaker is different from other people biological characteristic etc., these information are logical Content analysis is crossed, certain relationship can be established, this just provides technical foundation for retrieval common objective in several ways.
Based on studying for a long period of time to vocal print, voice, image, video finger print, text information, it has been found that can be by certain Statistical analysis, extract in these information two, three, it is multinomial between common trait or description content, utilize a kind of retrieval Mode as a result, be extended to several ways coordinate retrieval, integrated retrieval result is provided.For example, detecting from vocal print, judge to speak People whom is, meanwhile, extract the piece segment information that this people speaks;After being aware of speaker whom being, it can be looked for from speech recognition To the content for being related to this speaker;The picture and relevant video segments of speaker can also be inquired;It can also further look for To relevant text information.
Since speech recognition, image recognition, video fingerprint recognition are to be based on using technologies, most of they such as DNN, HMM Statistic analysis models, these technologies have certain defect, monotechnics means that cannot reach expected recognition effect.In order to carry The performance of high monotechnics needs the data volume in the sample pattern library for greatly improving statistical analysis, and still, ambient noise is spoken The external factor such as accent, word speed, the gender of people influence the performance of voice and Application on Voiceprint Recognition, shoot the illumination of image and video, divide Resolution, background complexity also have a significant impact to image recognition, video fingerprint recognition, and monotechnics means cannot all reach satisfied Effect improves the recall ratio of identification therefore, it is necessary to take multiple means to combine.
Invention content
The present invention is to retrieve the different type feature vector of full media information in several ways, such as:Text key word, Vocal print, voice content, image color, image, semantic etc. summarize the every terms of information of target to be inquired, and more comprehensively can obtain and retrieve The relevant information metadata segment of target and the position for recording metadata, the retrieval of various ways different dimensions improve data Recall ratio and precision ratio.
To achieve the above object, the technical solution adopted in the present invention is:The side of multi-dimensions test specific objective in full media Method is as follows:
S1:According to search condition sample, such as text key word, vocal print feature voice, content voice, feature image, feature Video determines the data type for the object reference sample data that search engine and detection identification engine will be retrieved and be identified;
S2:According to search engine and detection identification the engine data class of object reference sample data that retrieve and identify Type selects matched detection to identify engine, as keyword identifies engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape Identify engine;
S3:The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as retrieval Condition is sent to search engine and is retrieved;
S4:Relevant each search engine retrieves qualified data from the target retrieval data of input, and records Data slot and there is position;
S5:The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are carried out again Summarize, output of classifying.
Further, then in step S2, if any multiple and different data type search conditions, then multiple detection identifications is selected to draw It holds up.
Further, then in step S3, as containing 3 or more keywords, then further decomposed into pass in search condition Keyword group.
Further, then in step S3, if certain item data is without enabling corresponding identification engines handle data, by item Part value is arranged to null value.
Further, the target retrieval data in step S4 come from database, data file, network flow-medium, including:Text Sheet, voice, picture, video data.
Further, retrieval result is one or more of text, voice, picture, video in step S5, for language The retrieval result of sound, video, then extract association contents fragment or record access point and duration.
Further, in step s 5, retrieval result is realized according to following formula:
Variable and symbol description in formula:
SR, retrieval result;SEi, search engine;I, engine number, e.g., SE1Represent vocal print search engine, SE2Represent voice Search engine;N indicates the data type number in full media;REj, detection identification engine, detection identification engine is with target detection Can have the function of detection and identification two simultaneously for different data with the function of target identification, can also only have single Function, difference detection identification engine handle different data content;J, detection identification engine number, for example, RE1Represent Application on Voiceprint Recognition Engine, whom identification talker is;RE2Speech recognition engine is represented, identifies content and keyword in voice;K indicates sample Catalogue number(Cat.No.) in library also illustrates that specimen discerning cycle-index;M indicates that the sample number in sample database, how many sample can identify Certification;pj, search engine and detection identify the object reference sample data that engine will be retrieved and be identified;qi, search engine retrieving pair As i.e. search engine searched targets information from which data.
There is two layer functions of detection identification and retrieval, different data as further, described detection identification engine The engine that type object is handled is as different disposal dimension.
Patent of the present invention due to using the technology described above, can obtain following technique effect:In several ways, it examines The different type feature vector of Suo Quan media informations, such as text key word, vocal print, voice content, image color, image, semantic Deng summarizing the every terms of information of target to be inquired, can more comprehensively obtain and the relevant information metadata segment of searched targets and record The position of metadata.The retrieval of various ways different dimensions improves the recall ratio and precision ratio of data.Using present invention side Method can make up the low problem of the recall ratio of single identification engine, the recall ratio and precision ratio of full media retrieval be improved, for not Same application environment and sample retrieval, can improve 10%-30% by recall ratio.
Description of the drawings
The present invention shares 1 width of attached drawing:
Fig. 1 is the flow chart of the present invention.
Specific embodiment
Below by specific embodiment, and in conjunction with attached drawing, explanation that the technical solution of the present invention is further explained.
As shown in Figure 1, the present invention is to provide:The method of multi-dimensions test specific objective, specific steps in a kind of full media It is as follows:
S1:According to search condition sample, as text key word, text sentence, vocal print feature voice (voice of speaker or The voice data that other objects to be retrieved are sent out), content voice (voice data that searched targets are mentioned in voice), characteristic pattern Piece (face, humanoid, body form, color, coherent condition feature image), feature video (it is a bit of containing face, humanoid, object The video data of shape, color, coherent condition feature), determine what search engine and detection identification engine will be retrieved and be identified The data type of object reference sample data;Search condition sample similar to ordinary search engine search key, due to full matchmaker The condition of physical examination rope may be the combination of one or more of text, voice (segment), picture, video (segment) form.Text It can be " keyword " combinations of words;It can also be text sentence;It can also be the mixing text of Chinese and other language.Voice (segment) is one section of voice data of input, and acquiescence supports WAV formats in the method for the invention, and the voice data of extended formatting can With conversion, the content of voice can be complete sentence, can also be phrase.Picture uses basic BMP formats, extended formatting Can be converted to BMP uses, to have target person, the object of retrieval in picture, lowest resolution 32X32, color value is unlimited.Depending on Frequently based on AVI, extended formatting can be converted (segment) format, including the people to be retrieved, target, the target to be retrieved is differentiated Rate is not less than 32X32 pixels.
S2:According to search engine and detection identification the engine data class of object reference sample data that retrieve and identify Type selects matched detection to identify engine, as keyword identifies engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape Identify engine;RE in Fig. 11……RENDifferent detection identification engines is represented, detection identification engine can detect or identify text The features such as this keyword, vocal print, voice semanteme, video finger print, shape, object color, coherent condition.
S3:The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as retrieval Condition is sent to search engine and is retrieved;Detection identification engine handling result:
Keyword detection identifies engine, extracts keyword in text sentence;
Vocal print detection identification engine identifies that whom speaker is, the I D or name of speaker are referred to as keyword and vocal print spy Sign vector is for searching for;
Color detection identifies engine, judges the body color of target in picture, color value is as keyword for searching for;
SHAPE DETECTION identifies engine, judges that the shape of target in picture, shape are used as keyword and morphological feature vector In search;
Social event (coherent condition) detection identification engine, judges the coherent condition of object target in picture, coherent condition Recognition result is as keyword and coherent condition characteristic vector for searching for.
S4:Relevant each search engine retrieves qualified data from the target retrieval data of input, and records Data slot and there is position;Searched targets data are the metadata sets such as text, voice, picture, video, can come from data Library, data file, data flow, the present invention are exactly to be retrieved according to search condition sample from these data.
Video and Streaming Media support AVI, MPEG-1/2/4, H.263/264/265, M-JPEG, MP4;
Voice data is supported:WAV、MP3、PCM;
Picture is supported:BMP、JPG/JPEG、G i f、T i ff、PNG、P I C.
S5:The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are carried out again Summarize, output of classifying.The present invention is directed to the testing result of full media information, and output content is as follows:
Text data:Text fragments;
Sound, video data:The access point time of target appearance, duration;
Picture file:List of file names, the store path of target appearance.
Again in step S2, if any multiple and different data type search conditions, then multiple detection identification engines, then step are selected In S3, search condition is complicated, then further decomposes into crucial phrase, then step S3, the correlated condition in search condition cannot It, will if certain item data is without enabling corresponding identification engines handle data for null value (whether so describe accurate) herein Condition value is arranged to null value, and the target retrieval data in step S4 come from database, data file, network flow-medium, including:Text Sheet, voice, picture, video data, retrieval result is one or more of text, voice, picture, video in step S5, right In the retrieval result of voice, video, then extract association contents fragment or record access point and duration, in step s 5, retrieval result It is to be realized according to following formula:
Variable and symbol description in formula:SR, retrieval result;SEi, search engine, different search engines are from detection and identify It is obtained in engine as a result, retrieving specific objective data in media data;I, engine number, e.g., SE1Vocal print search is represented to draw It holds up, SE2Represent Voice search engine;N indicates the data type number in full media;REj, detection identification engine, detection, which identifies, draws Holding up has the function of target detection and target identification, for different data, can have the function of detection and identification two simultaneously, Can only have simple function, difference detection identification engine to handle different data content;J, detection identification engine number, for example, RE1Application on Voiceprint Recognition engine is represented, whom identification talker is;RE2Speech recognition engine is represented, identifies the content in voice and pass Keyword;K indicates the catalogue number(Cat.No.) in sample database, also illustrates that specimen discerning cycle-index;M indicates the sample number in sample database, has How many samples can identify certification;pj, search engine and detection identify the object reference sample data that engine will be retrieved and be identified; qi, search engine retrieving object, i.e. search engine the searched targets information from which data.
There are the detection identification engine two layer functions of detection identification and retrieval, different types of data object to be handled Engine as different disposal dimension.
Since for example word, voice, picture, video data structure are complicated, contain much information, diversification of forms for full media information, Single data retrieval method cannot obtain satisfied effect, and the recall ratio and precision ratio of data are relatively low.Especially voice, Image internal characteristics type is complicated, and to different types of characteristic key, what is obtained is different as a result, for example, to voice data sound The retrieval of line feature can judge that whom speaker is, be identified to voice semantic content, the content text etc. that can be spoken. The present invention provides a kind of dynamic, the method for the full media data information of detection of various dimensions, by the data for judging full media information Type, dynamic load and the matched various retrievals of data type and identification engine, from different directions examine full media data It surveys, obtains metadata clips associated with inquiry target and data storage location.Full media data text, voice, picture, figure As containing text key word, vocal print, voice content, languages, image, semantic, color of image, target shape, goal behavior, mesh The information such as coherent condition are marked, different data processing needs specific engine.Multi-dimensions test is exactly by text key word retrieval, vocal print Identification, voice content identification, languages identification, image, semantic analysis, color of image identification, image object detection, target shape are distinguished Know, goal behavior identification, target coherent condition identification etc. engines according to detection identification, retrieval two layers, different data objects into The engine of row processing is as different disposal dimension, according to detection object data type difference, from different dimensions detection, identification, inspection Rope specific objective.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Any one skilled in the art in the technical scope of present disclosure, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (8)

1. the method for multi-dimensions test specific objective in full media, which is characterized in that be as follows:
S1:According to search condition sample, the data type of the object reference sample data of search and identification is determined;
S2:According to search engine and detection identification the engine data type of object reference sample data that retrieve and identify, Select matched detection identification engine;
S3:The each detection identification engine of analysis as a result, search key, target characteristic amount data are obtained, as search condition Search engine is sent to be retrieved;
Keyword detection identifies engine, extracts keyword in text sentence;
Vocal print detection identification engine, identifies that whom speaker is, the ID or name of speaker are referred to as keyword and vocal print Characteristic Vectors Amount is for searching for;
Color detection identifies engine, judges the body color of target in picture, color value is as keyword for searching for;
SHAPE DETECTION identifies engine, judges the shape of target in picture, shape is as keyword and morphological feature vector for searching Rope;
Social event detection identification engine, judges the coherent condition of object target in picture, coherent condition recognition result is as pass Keyword and coherent condition characteristic vector are for searching for;
S4:Relevant each search engine retrieves qualified data from the target retrieval data of input, and records data Segment and there is position;
S5:The different data of each search engine retrieving, obtain different retrieval results, these retrieval results are converged again Always, classification output.
2. the method for multi-dimensions test specific objective in full media according to claim 1, which is characterized in that step S2 again In, if any multiple and different data type search conditions, then select multiple detection identification engines.
3. the method for multi-dimensions test specific objective in full media according to claim 1, which is characterized in that step S3 again In, as containing 3 or more keywords, then further decomposed into crucial phrase in search condition.
4. the method for multi-dimensions test specific objective in full media according to claim 3, which is characterized in that step S3 again In, if certain item data is without enabling corresponding identification engines handle data, condition value is arranged to null value.
5. according to the method for multi-dimensions test specific objective in the full media of claim 1-4 any one of them, which is characterized in that step Target retrieval data in rapid S4 come from database, data file, network flow-medium, including:Text, voice, picture, video counts According to.
6. the method for multi-dimensions test specific objective in full media according to claim 5, which is characterized in that examined in step S5 Rope for the retrieval result of voice, video, then extracts association the result is that one or more of text, voice, picture, video Contents fragment or record access point and duration.
7. the method for multi-dimensions test specific objective in full media according to claim 6, which is characterized in that in step S5 In, retrieval result is realized according to following formula:
Variable and symbol description in formula:SR, retrieval result;SEi, search engine;I, engine number;N indicates the number in full media According to number of types;REj, detection identification engine;J, detection identification engine number;K indicates the catalogue number(Cat.No.) in sample database, also illustrates that sample This identification cycle-index;M indicates the sample number in sample database;pj, search engine and detection identify what engine will be retrieved and be identified Object reference sample data;qi, search engine retrieving object.
8. the method for multi-dimensions test specific objective in full media according to claim 2 or 4, which is characterized in that described Detection identification engine has two layer functions of detection identification and retrieval, and the engine that different types of data object is handled is as different Handle dimension.
CN201510515893.8A 2015-08-18 2015-08-18 The method of multi-dimensions test specific objective in full media Expired - Fee Related CN105005630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510515893.8A CN105005630B (en) 2015-08-18 2015-08-18 The method of multi-dimensions test specific objective in full media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510515893.8A CN105005630B (en) 2015-08-18 2015-08-18 The method of multi-dimensions test specific objective in full media

Publications (2)

Publication Number Publication Date
CN105005630A CN105005630A (en) 2015-10-28
CN105005630B true CN105005630B (en) 2018-07-13

Family

ID=54378306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510515893.8A Expired - Fee Related CN105005630B (en) 2015-08-18 2015-08-18 The method of multi-dimensions test specific objective in full media

Country Status (1)

Country Link
CN (1) CN105005630B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677799A (en) * 2015-12-31 2016-06-15 宇龙计算机通信科技(深圳)有限公司 Picture retrieval method and system
CN106024013B (en) * 2016-04-29 2022-01-14 努比亚技术有限公司 Voice data searching method and system
CN106101748B (en) * 2016-07-20 2020-04-28 东软集团股份有限公司 Program processing method and device
CN109271533A (en) * 2018-09-21 2019-01-25 深圳市九洲电器有限公司 A kind of multimedia document retrieval method
CN109299324B (en) * 2018-10-19 2022-03-04 四川巧夺天工信息安全智能设备有限公司 Method for searching label type video file
CN110287384B (en) * 2019-06-10 2021-08-31 北京百度网讯科技有限公司 Intelligent service method, device and equipment
CN110677716B (en) * 2019-08-20 2022-02-01 咪咕音乐有限公司 Audio processing method, electronic device, and storage medium
CN114817717A (en) * 2022-04-21 2022-07-29 国科华盾(北京)科技有限公司 Search method, search device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101479728A (en) * 2006-06-28 2009-07-08 微软公司 Visual and Multidimensional Search
CN102402593A (en) * 2010-11-05 2012-04-04 微软公司 Multi-modal approach to search query input
CN102855317A (en) * 2012-08-31 2013-01-02 王晖 Multimode indexing method and system based on demonstration video
CN104598585A (en) * 2015-01-15 2015-05-06 百度在线网络技术(北京)有限公司 Information search method and information search device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101479728A (en) * 2006-06-28 2009-07-08 微软公司 Visual and Multidimensional Search
CN102402593A (en) * 2010-11-05 2012-04-04 微软公司 Multi-modal approach to search query input
CN102855317A (en) * 2012-08-31 2013-01-02 王晖 Multimode indexing method and system based on demonstration video
CN104598585A (en) * 2015-01-15 2015-05-06 百度在线网络技术(北京)有限公司 Information search method and information search device

Also Published As

Publication number Publication date
CN105005630A (en) 2015-10-28

Similar Documents

Publication Publication Date Title
CN105005630B (en) The method of multi-dimensions test specific objective in full media
CN111160017B (en) Keyword extraction method, phonetics scoring method and phonetics recommendation method
CN108073568B (en) Keyword extraction method and device
US9324323B1 (en) Speech recognition using topic-specific language models
JP4880258B2 (en) Method and apparatus for natural language call routing using reliability scores
WO2020228173A1 (en) Illegal speech detection method, apparatus and device and computer-readable storage medium
US8996371B2 (en) Method and system for automatic domain adaptation in speech recognition applications
KR102241972B1 (en) Answering questions using environmental context
US20150100307A1 (en) Text segmentation with multiple granularity levels
US20120124029A1 (en) Cross media knowledge storage, management and information discovery and retrieval
Lowe et al. Incorporating unstructured textual knowledge sources into neural dialogue systems
CN109902285B (en) Corpus classification method, corpus classification device, computer equipment and storage medium
KR20050074991A (en) Content retrieval based on semantic association
JP2019053126A (en) Growth type interactive device
JP2013521567A (en) System including client computing device, method of tagging media objects, and method of searching a digital database including audio tagged media objects
JP2004005600A (en) Method and system for indexing and retrieving document stored in database
CN111428466B (en) Legal document analysis method and device
CN112912873A (en) Dynamically suppress query replies in search
CN113806588A (en) Method and device for searching video
CN112307364A (en) A Character Representation Oriented Extraction Method of News Text Occurrence
US8862582B2 (en) System and method of organizing images
Dunn et al. Language identification for austronesian languages
US9430800B2 (en) Method and apparatus for trade interaction chain reconstruction
CN115935953A (en) False news detection method, device, electronic device and storage medium
Gao et al. Support for interactive identification of mentioned entities in conversational speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180713

Termination date: 20190818

CF01 Termination of patent right due to non-payment of annual fee