CN110866129A - Cross-media retrieval method based on cross-media uniform characterization model - Google Patents
Cross-media retrieval method based on cross-media uniform characterization model Download PDFInfo
- Publication number
- CN110866129A CN110866129A CN201911061277.4A CN201911061277A CN110866129A CN 110866129 A CN110866129 A CN 110866129A CN 201911061277 A CN201911061277 A CN 201911061277A CN 110866129 A CN110866129 A CN 110866129A
- Authority
- CN
- China
- Prior art keywords
- cross
- media
- data
- retrieval
- original domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a cross-media retrieval method based on a cross-media uniform representation model aiming at the problem of cross-media retrieval, which comprises the following steps: (1) constructing a cross-media database, and establishing a large-cross-media database facing the government affair news field; (2) cross-media data preprocessing, input preprocessing of data such as texts, images, videos and audios; (3) extracting original domain features of cross-media data, and extracting original domain feature vectors of the cross-media data; (4) uniformly representing the cross-media data, and extracting feature vectors of the cross-media data in a common representation space; (5) and calculating and sequencing the semantic similarity of the data, calculating the semantic similarity of the data of the retrieval target and the data in the cross-media database, and sequencing to output results. The invention not only provides a mutual retrieval method supporting four media data, but also provides a unified representation model of multiple media data, improves the cross-media semantic retrieval precision, and has a broad application prospect.
Description
Technical Field
The invention relates to a cross-media retrieval method based on a cross-media unified representation model, which belongs to the technical field of natural language processing, computer vision, cross-media data retrieval and the like and comprises the steps of extracting original domain features of multimedia data, uniformly representing the model through cross-media data, constructing a cross-media database, calculating and sequencing similarity of the cross-media data and the like.
Background
With the development of the big data era, data of various industries are explosively increased, and a large amount of multimedia data including massive unstructured data such as texts, images, videos and audios are generated at the moment of intelligent application represented by 5G and the Internet of things. How to better organize and retrieve queries across media data becomes a great challenge and research focus in the field of information retrieval, such as retrieving images, video, and audio through text; text, audio, etc. is retrieved via video.
For multimedia information sets such as texts, images, videos, audios and the like, most retrieval systems still adopt text keyword search technology, for example, the image and video retrieval function of Google is still based on text keywords (keywords), and the basic flow is to extract keyword labels from unstructured data, wherein the keyword labels may be texts, file names, data subject labels, target detection labels and the like around pictures, and a small amount of manual labels from the internet are also provided. Due to different cultural backgrounds and different professional knowledge of multimedia information producers, the text information associated with the pictures is often extremely unreliable and can be appreciated by people. For multimedia information such as images and videos, it is generally difficult to use natural language to perform effective and accurate description, and it is impossible to express the essential content and semantic relationship, so the solution for retrieving pictures and videos according to text information is difficult to meet the query requirement of people, and the search accuracy is very low.
Aiming at the problem of cross-media data retrieval, a semantic embedding method based on machine learning and deep learning becomes a research key point, a VSE + + model learns visual semantic embedding representation through a difficult case mining method, and the cross-media retrieval precision is improved; the ACMR and CM-GANs models perform model training by resisting the generation idea and achieve better performance in Wikipedia and NUSWIDE data sets. Most of the existing cross-media retrieval methods with good effects mostly adopt a deep neural network model, the model is usually poor in interpretability, and meanwhile, the model based on the countermeasure concept is used for enabling the transformation of data into a common representation space to be assumed to be linear reversible transformation, so that the inverse transformation constraint condition is increased, but the inverse transformation constraint condition is contradictory to the nonlinear transformation property of the neural network.
Disclosure of Invention
In order to solve the technical problems, the invention provides a cross-media retrieval method based on a cross-media uniform representation model, which supports four uniform representation models of media data retrieval and is used for cross-media data retrieval to improve retrieval precision.
The invention is realized by the following technical scheme.
The invention provides a cross-media retrieval method based on a cross-media uniform representation model, which comprises the following steps:
① constructing cross-media database, establishing cross-media database facing government affairs news field;
② preprocessing cross-media data, preprocessing the input of the cross-media database to obtain cross-media data;
③ extracting original domain features of the cross-media data, namely extracting original domain feature vectors of the cross-media data;
④, performing unified characterization of cross-media data, namely generating a cross-media unified characterization model supporting four media data input by adopting a deep neural network model for countermeasure train, and extracting a common spatial feature vector output by the cross-media unified characterization model;
⑤ calculating and sorting the semantic similarity of data retrieval, namely calculating the cosine similarity of the public space feature vector output by the cross-media uniform representation model and the original domain feature vector of the cross-media data, sorting by the similarity, and outputting the first K data with the maximum similarity as a retrieval result to be output.
In step ①, the domain of the government affairs news includes government affairs news, political characters and political events, and the cross-media database stores four types of unstructured data including text, images, videos and audios.
In step ②, the data format and dimensions of the multimedia search input data of text, image, video and audio are preprocessed, wherein the audio data is converted into a spectrum image as audio data input, and the text is segmented to obtain a segmentation array.
In the step ③, a word2vec model is used for text data to extract original domain feature vectors, a depth convolution network is used for image data to extract original domain features, C3D is used for video data to extract video original domain features, and a depth convolution network is used for voice data to extract original domain feature vectors.
And acquiring a word segmentation array of the text through word segmentation.
The invention has the beneficial effects that:
1. the method for supporting the unified representation of four media data, namely text, image, video and voice, is provided, and the cross-media data unified representation model adopts a model training method based on a generated countermeasure thought, so that semantic gaps among different media data representations are reduced;
2. a cross-media data retrieval method based on a cross-media unified representation model is provided, and mutual retrieval of four kinds of media data is achieved.
Drawings
Fig. 1 is a flow chart of the present invention.
Detailed Description
The technical solution of the present invention is further described below, but the scope of the claimed invention is not limited to the described.
As shown in fig. 1, a cross-media retrieval method based on a cross-media uniform characterization model includes the following steps:
① constructing cross-media database, establishing cross-media database facing government affairs news field;
② preprocessing cross-media data, preprocessing the input of the cross-media database to obtain cross-media data;
③ extracting original domain features of the cross-media data, namely extracting original domain feature vectors of the cross-media data;
④, performing unified characterization of cross-media data, namely generating a cross-media unified characterization model supporting four media data input by adopting a deep neural network model for countermeasure train, and extracting a common spatial feature vector output by the cross-media unified characterization model;
⑤ calculating and sorting the semantic similarity of data retrieval, namely calculating the cosine similarity of the public space feature vector output by the cross-media uniform representation model and the original domain feature vector of the cross-media data, sorting by the similarity, and outputting the first K data with the maximum similarity as a retrieval result to be output.
In step ①, the domain of the government affairs news includes government affairs news, political characters and political events, and the cross-media database stores four types of unstructured data including text, images, videos and audios.
In step ②, the data format and dimensions of the multimedia search input data of text, image, video and audio are preprocessed, wherein the audio data is converted into a spectrum image as audio data input, and the text is segmented to obtain a segmentation array.
In the step ③, a word2vec model is used for extracting original domain feature vectors from text data to obtain word vector representation, a deep convolution network is used for extracting original domain features from image data, a C3D (three-dimensional convolution network) is used for extracting video original domain features from video data, namely, a section of fixed frame number sequence image is obtained through video sampling, then a C3D model (three-dimensional convolution network) is used for obtaining video image features, and a deep convolution network is used for extracting original domain feature vectors from voice data, namely, an audio frequency spectrum image is input and extracted through the deep convolution network.
And acquiring a word segmentation array of the text through word segmentation.
Specifically, the cross-media data unified representation model adopts a model training method based on a countermeasure train, and in the training of the cross-media data unified representation model, the modal data discriminant loss function expression is as follows:
wherein L isadv(θD) Represents the cross entropy loss function of all samples between different modalities, D (; thetaD) Representing the probability, m, of an image or text sample being discriminated as an image or textiA real label indicating whether a sample belongs to an image or text;
the cross-media data characterization loss function is:
Lemd(θV,θTiθimd)=ω1×Limi+ω2×Limd+Lreg
wherein L isimiIs an inter-modal structure invariant loss function, LimdIs an intra-modal data class loss function, LregRegularizing term, ω, for model parameters1、ω2Is a model hyper-parameter;
the model training optimization function based on the generation of the countermeasure train is as follows:
wherein the threshold value of max is thetaD。
Examples
As described above, a cross-media retrieval method based on a cross-media uniform characterization model includes the following steps:
step 1: cross-media data pre-processing
The text input is: "nuebel physiological or medical professor in 2019 to american scientists william kelin, gray samarazake and british scientist peter lattershiro to show their contribution in" finding how cells perceive and adapt to oxygen supply ". "
Text word segmentation preprocessing is carried out, and the word segmentation result is as follows: [ 2019; a nobel; (ii) physiology; medical awards; awarding; the united states; a scientist; william kelin; greige, plug bundle; british; a scientist; peter lattershiro; carrying out outmost; they are; finding out; a cell; how; sensing; adapting; oxygen gas; supplying; an aspect; so as to obtain the finished product; contribution (B)
Step 2: cross-media data origin domain feature extraction
Obtaining a text feature vector by using a word2vec model: q1 ═ 1,1,0,0,0,1,0 … …;
and step 3: unified characterization across media data
Obtaining a feature vector Q2 of the text in a common representation space through a cross-media data uniform representation model;
and 4, step 4: data retrieval semantic similarity calculation and ranking
And (3) expressing the Q2 and feature vectors of all cross-media data in the database { V1, V2, V3 …, T1, T2 and … } to calculate cosine similarity, and sorting the cosine similarity according to the similarity and outputting a retrieval result.
Claims (5)
1. A cross-media retrieval method based on a cross-media uniform characterization model is characterized in that: the method comprises the following steps:
① constructing cross-media database, establishing cross-media database facing government affairs news field;
② preprocessing cross-media data, preprocessing the input of the cross-media database to obtain cross-media data;
③ extracting original domain features of the cross-media data, namely extracting original domain feature vectors of the cross-media data;
④, performing unified characterization of cross-media data, namely generating a cross-media unified characterization model supporting four media data input by adopting a deep neural network model for countermeasure train, and extracting a common spatial feature vector output by the cross-media unified characterization model;
⑤ calculating and sorting the semantic similarity of data retrieval, namely calculating the cosine similarity of the public space feature vector output by the cross-media uniform representation model and the original domain feature vector of the cross-media data, sorting by the similarity, and outputting the first K data with the maximum similarity as a retrieval result to be output.
2. The cross-media search method based on the cross-media uniform characterization model according to claim 1, wherein in the step ①, the government news domain comprises government news, political characters and political events, and the cross-media database stores four types of unstructured data including text, images, videos and audios.
3. The method as claimed in claim 1, wherein the step ② is performed by preprocessing the data format and dimension of the multimedia search input data of text, image, video and audio, wherein the audio data is transformed into the spectrum image as the audio data input, and the text is segmented to obtain the segmentation array.
4. The cross-media retrieval method based on the cross-media uniform characterization model according to claim 1, wherein in the step ③, a word2vec model is used to extract original domain feature vectors for text data, a deep convolution network is used to extract original domain features for image data, a C3D is used to extract video original domain features for video data, and a deep convolution network is used to extract original domain feature vectors for voice data.
5. The cross-media retrieval method based on the cross-media uniform characterization model according to claim 3, wherein: and acquiring a word segmentation array of the text through word segmentation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911061277.4A CN110866129A (en) | 2019-11-01 | 2019-11-01 | Cross-media retrieval method based on cross-media uniform characterization model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911061277.4A CN110866129A (en) | 2019-11-01 | 2019-11-01 | Cross-media retrieval method based on cross-media uniform characterization model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110866129A true CN110866129A (en) | 2020-03-06 |
Family
ID=69654308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911061277.4A Pending CN110866129A (en) | 2019-11-01 | 2019-11-01 | Cross-media retrieval method based on cross-media uniform characterization model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110866129A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111813967A (en) * | 2020-07-14 | 2020-10-23 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
CN111949806A (en) * | 2020-08-03 | 2020-11-17 | 中电科大数据研究院有限公司 | Cross-media retrieval method based on Resnet-Bert network model |
CN112528127A (en) * | 2020-05-30 | 2021-03-19 | 山东工商学院 | Big data-based plane design work matching degree analysis system |
CN112559820A (en) * | 2020-12-17 | 2021-03-26 | 中国科学院空天信息创新研究院 | Sample data set intelligent question setting method, device and equipment based on deep learning |
CN115309941A (en) * | 2022-08-19 | 2022-11-08 | 联通沃音乐文化有限公司 | AI-based intelligent tag retrieval method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104166684A (en) * | 2014-07-24 | 2014-11-26 | 北京大学 | Cross-media retrieval method based on uniform sparse representation |
CN105701225A (en) * | 2016-01-15 | 2016-06-22 | 北京大学 | Cross-media search method based on unification association supergraph protocol |
CN108319686A (en) * | 2018-02-01 | 2018-07-24 | 北京大学深圳研究生院 | Antagonism cross-media retrieval method based on limited text space |
-
2019
- 2019-11-01 CN CN201911061277.4A patent/CN110866129A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104166684A (en) * | 2014-07-24 | 2014-11-26 | 北京大学 | Cross-media retrieval method based on uniform sparse representation |
CN105701225A (en) * | 2016-01-15 | 2016-06-22 | 北京大学 | Cross-media search method based on unification association supergraph protocol |
CN108319686A (en) * | 2018-02-01 | 2018-07-24 | 北京大学深圳研究生院 | Antagonism cross-media retrieval method based on limited text space |
Non-Patent Citations (2)
Title |
---|
WANG B ET AL.: "Adversarial Cross-Modal Retrieval", 《ACM》 * |
董建锋: "跨模态检索中的相关度计算研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112528127A (en) * | 2020-05-30 | 2021-03-19 | 山东工商学院 | Big data-based plane design work matching degree analysis system |
CN111813967A (en) * | 2020-07-14 | 2020-10-23 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
CN111813967B (en) * | 2020-07-14 | 2024-01-30 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
CN111949806A (en) * | 2020-08-03 | 2020-11-17 | 中电科大数据研究院有限公司 | Cross-media retrieval method based on Resnet-Bert network model |
CN112559820A (en) * | 2020-12-17 | 2021-03-26 | 中国科学院空天信息创新研究院 | Sample data set intelligent question setting method, device and equipment based on deep learning |
CN115309941A (en) * | 2022-08-19 | 2022-11-08 | 联通沃音乐文化有限公司 | AI-based intelligent tag retrieval method and system |
CN115309941B (en) * | 2022-08-19 | 2023-03-10 | 联通沃音乐文化有限公司 | AI-based intelligent tag retrieval method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kaur et al. | Comparative analysis on cross-modal information retrieval: A review | |
CN110866129A (en) | Cross-media retrieval method based on cross-media uniform characterization model | |
WO2023065617A1 (en) | Cross-modal retrieval system and method based on pre-training model and recall and ranking | |
CN108268600B (en) | AI-based unstructured data management method and device | |
CN111274790B (en) | Chapter-level event embedding method and device based on syntactic dependency graph | |
JP2006510114A (en) | Representation of content in conceptual model space and method and apparatus for retrieving it | |
CN110990597A (en) | Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof | |
CN113392265A (en) | Multimedia processing method, device and equipment | |
CN116821696B (en) | Training method, device, equipment and storage medium for form question-answer model | |
CN116702091A (en) | Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP | |
CN117173730A (en) | Document image intelligent analysis and processing method based on multi-mode information | |
CN112182273B (en) | Cross-modal retrieval method and system based on semantic constraint matrix decomposition hash | |
CN117688220A (en) | Multi-mode information retrieval method and system based on large language model | |
CN117332103A (en) | Image retrieval method based on keyword extraction and multi-modal feature fusion | |
Pereira et al. | SAPTE: A multimedia information system to support the discourse analysis and information retrieval of television programs | |
CN107633259B (en) | Cross-modal learning method based on sparse dictionary representation | |
Perdana et al. | Instance-based deep transfer learning on cross-domain image captioning | |
CN105069136A (en) | Image recognition method in big data environment | |
Tian et al. | Research on image classification based on a combination of text and visual features | |
Chivadshetti et al. | Content based video retrieval using integrated feature extraction and personalization of results | |
CN109255098B (en) | Matrix decomposition hash method based on reconstruction constraint | |
CN115563311B (en) | Document labeling and knowledge base management method and knowledge base management system | |
CN117851654A (en) | Archives resource retrieval system based on artificial intelligence pronunciation and image recognition | |
Ronghui et al. | Application of Improved Convolutional Neural Network in Text Classification. | |
CN111506754B (en) | Picture retrieval method, device, storage medium and processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200306 |
|
RJ01 | Rejection of invention patent application after publication |