[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110866129A - Cross-media retrieval method based on cross-media uniform characterization model - Google Patents

Cross-media retrieval method based on cross-media uniform characterization model Download PDF

Info

Publication number
CN110866129A
CN110866129A CN201911061277.4A CN201911061277A CN110866129A CN 110866129 A CN110866129 A CN 110866129A CN 201911061277 A CN201911061277 A CN 201911061277A CN 110866129 A CN110866129 A CN 110866129A
Authority
CN
China
Prior art keywords
cross
media
data
retrieval
original domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911061277.4A
Other languages
Chinese (zh)
Inventor
王进
刘汪洋
曹扬
张秋悦
闫盈盈
宋荣伟
阚丹会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Division Big Data Research Institute Co Ltd
Original Assignee
Division Big Data Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Division Big Data Research Institute Co Ltd filed Critical Division Big Data Research Institute Co Ltd
Priority to CN201911061277.4A priority Critical patent/CN110866129A/en
Publication of CN110866129A publication Critical patent/CN110866129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a cross-media retrieval method based on a cross-media uniform representation model aiming at the problem of cross-media retrieval, which comprises the following steps: (1) constructing a cross-media database, and establishing a large-cross-media database facing the government affair news field; (2) cross-media data preprocessing, input preprocessing of data such as texts, images, videos and audios; (3) extracting original domain features of cross-media data, and extracting original domain feature vectors of the cross-media data; (4) uniformly representing the cross-media data, and extracting feature vectors of the cross-media data in a common representation space; (5) and calculating and sequencing the semantic similarity of the data, calculating the semantic similarity of the data of the retrieval target and the data in the cross-media database, and sequencing to output results. The invention not only provides a mutual retrieval method supporting four media data, but also provides a unified representation model of multiple media data, improves the cross-media semantic retrieval precision, and has a broad application prospect.

Description

Cross-media retrieval method based on cross-media uniform characterization model
Technical Field
The invention relates to a cross-media retrieval method based on a cross-media unified representation model, which belongs to the technical field of natural language processing, computer vision, cross-media data retrieval and the like and comprises the steps of extracting original domain features of multimedia data, uniformly representing the model through cross-media data, constructing a cross-media database, calculating and sequencing similarity of the cross-media data and the like.
Background
With the development of the big data era, data of various industries are explosively increased, and a large amount of multimedia data including massive unstructured data such as texts, images, videos and audios are generated at the moment of intelligent application represented by 5G and the Internet of things. How to better organize and retrieve queries across media data becomes a great challenge and research focus in the field of information retrieval, such as retrieving images, video, and audio through text; text, audio, etc. is retrieved via video.
For multimedia information sets such as texts, images, videos, audios and the like, most retrieval systems still adopt text keyword search technology, for example, the image and video retrieval function of Google is still based on text keywords (keywords), and the basic flow is to extract keyword labels from unstructured data, wherein the keyword labels may be texts, file names, data subject labels, target detection labels and the like around pictures, and a small amount of manual labels from the internet are also provided. Due to different cultural backgrounds and different professional knowledge of multimedia information producers, the text information associated with the pictures is often extremely unreliable and can be appreciated by people. For multimedia information such as images and videos, it is generally difficult to use natural language to perform effective and accurate description, and it is impossible to express the essential content and semantic relationship, so the solution for retrieving pictures and videos according to text information is difficult to meet the query requirement of people, and the search accuracy is very low.
Aiming at the problem of cross-media data retrieval, a semantic embedding method based on machine learning and deep learning becomes a research key point, a VSE + + model learns visual semantic embedding representation through a difficult case mining method, and the cross-media retrieval precision is improved; the ACMR and CM-GANs models perform model training by resisting the generation idea and achieve better performance in Wikipedia and NUSWIDE data sets. Most of the existing cross-media retrieval methods with good effects mostly adopt a deep neural network model, the model is usually poor in interpretability, and meanwhile, the model based on the countermeasure concept is used for enabling the transformation of data into a common representation space to be assumed to be linear reversible transformation, so that the inverse transformation constraint condition is increased, but the inverse transformation constraint condition is contradictory to the nonlinear transformation property of the neural network.
Disclosure of Invention
In order to solve the technical problems, the invention provides a cross-media retrieval method based on a cross-media uniform representation model, which supports four uniform representation models of media data retrieval and is used for cross-media data retrieval to improve retrieval precision.
The invention is realized by the following technical scheme.
The invention provides a cross-media retrieval method based on a cross-media uniform representation model, which comprises the following steps:
① constructing cross-media database, establishing cross-media database facing government affairs news field;
② preprocessing cross-media data, preprocessing the input of the cross-media database to obtain cross-media data;
③ extracting original domain features of the cross-media data, namely extracting original domain feature vectors of the cross-media data;
④, performing unified characterization of cross-media data, namely generating a cross-media unified characterization model supporting four media data input by adopting a deep neural network model for countermeasure train, and extracting a common spatial feature vector output by the cross-media unified characterization model;
⑤ calculating and sorting the semantic similarity of data retrieval, namely calculating the cosine similarity of the public space feature vector output by the cross-media uniform representation model and the original domain feature vector of the cross-media data, sorting by the similarity, and outputting the first K data with the maximum similarity as a retrieval result to be output.
In step ①, the domain of the government affairs news includes government affairs news, political characters and political events, and the cross-media database stores four types of unstructured data including text, images, videos and audios.
In step ②, the data format and dimensions of the multimedia search input data of text, image, video and audio are preprocessed, wherein the audio data is converted into a spectrum image as audio data input, and the text is segmented to obtain a segmentation array.
In the step ③, a word2vec model is used for text data to extract original domain feature vectors, a depth convolution network is used for image data to extract original domain features, C3D is used for video data to extract video original domain features, and a depth convolution network is used for voice data to extract original domain feature vectors.
And acquiring a word segmentation array of the text through word segmentation.
The invention has the beneficial effects that:
1. the method for supporting the unified representation of four media data, namely text, image, video and voice, is provided, and the cross-media data unified representation model adopts a model training method based on a generated countermeasure thought, so that semantic gaps among different media data representations are reduced;
2. a cross-media data retrieval method based on a cross-media unified representation model is provided, and mutual retrieval of four kinds of media data is achieved.
Drawings
Fig. 1 is a flow chart of the present invention.
Detailed Description
The technical solution of the present invention is further described below, but the scope of the claimed invention is not limited to the described.
As shown in fig. 1, a cross-media retrieval method based on a cross-media uniform characterization model includes the following steps:
① constructing cross-media database, establishing cross-media database facing government affairs news field;
② preprocessing cross-media data, preprocessing the input of the cross-media database to obtain cross-media data;
③ extracting original domain features of the cross-media data, namely extracting original domain feature vectors of the cross-media data;
④, performing unified characterization of cross-media data, namely generating a cross-media unified characterization model supporting four media data input by adopting a deep neural network model for countermeasure train, and extracting a common spatial feature vector output by the cross-media unified characterization model;
⑤ calculating and sorting the semantic similarity of data retrieval, namely calculating the cosine similarity of the public space feature vector output by the cross-media uniform representation model and the original domain feature vector of the cross-media data, sorting by the similarity, and outputting the first K data with the maximum similarity as a retrieval result to be output.
In step ①, the domain of the government affairs news includes government affairs news, political characters and political events, and the cross-media database stores four types of unstructured data including text, images, videos and audios.
In step ②, the data format and dimensions of the multimedia search input data of text, image, video and audio are preprocessed, wherein the audio data is converted into a spectrum image as audio data input, and the text is segmented to obtain a segmentation array.
In the step ③, a word2vec model is used for extracting original domain feature vectors from text data to obtain word vector representation, a deep convolution network is used for extracting original domain features from image data, a C3D (three-dimensional convolution network) is used for extracting video original domain features from video data, namely, a section of fixed frame number sequence image is obtained through video sampling, then a C3D model (three-dimensional convolution network) is used for obtaining video image features, and a deep convolution network is used for extracting original domain feature vectors from voice data, namely, an audio frequency spectrum image is input and extracted through the deep convolution network.
And acquiring a word segmentation array of the text through word segmentation.
Specifically, the cross-media data unified representation model adopts a model training method based on a countermeasure train, and in the training of the cross-media data unified representation model, the modal data discriminant loss function expression is as follows:
Figure BDA0002258018080000051
wherein L isadvD) Represents the cross entropy loss function of all samples between different modalities, D (; thetaD) Representing the probability, m, of an image or text sample being discriminated as an image or textiA real label indicating whether a sample belongs to an image or text;
the cross-media data characterization loss function is:
LemdV,θTiθimd)=ω1×Limi2×Limd+Lreg
wherein L isimiIs an inter-modal structure invariant loss function, LimdIs an intra-modal data class loss function, LregRegularizing term, ω, for model parameters1、ω2Is a model hyper-parameter;
further, the air conditioner is provided with a fan,
Figure BDA0002258018080000061
the model training optimization function based on the generation of the countermeasure train is as follows:
Figure BDA0002258018080000062
Figure BDA0002258018080000063
wherein the threshold value of max is thetaD
Examples
As described above, a cross-media retrieval method based on a cross-media uniform characterization model includes the following steps:
step 1: cross-media data pre-processing
The text input is: "nuebel physiological or medical professor in 2019 to american scientists william kelin, gray samarazake and british scientist peter lattershiro to show their contribution in" finding how cells perceive and adapt to oxygen supply ". "
Text word segmentation preprocessing is carried out, and the word segmentation result is as follows: [ 2019; a nobel; (ii) physiology; medical awards; awarding; the united states; a scientist; william kelin; greige, plug bundle; british; a scientist; peter lattershiro; carrying out outmost; they are; finding out; a cell; how; sensing; adapting; oxygen gas; supplying; an aspect; so as to obtain the finished product; contribution (B)
Step 2: cross-media data origin domain feature extraction
Obtaining a text feature vector by using a word2vec model: q1 ═ 1,1,0,0,0,1,0 … …;
and step 3: unified characterization across media data
Obtaining a feature vector Q2 of the text in a common representation space through a cross-media data uniform representation model;
and 4, step 4: data retrieval semantic similarity calculation and ranking
And (3) expressing the Q2 and feature vectors of all cross-media data in the database { V1, V2, V3 …, T1, T2 and … } to calculate cosine similarity, and sorting the cosine similarity according to the similarity and outputting a retrieval result.

Claims (5)

1. A cross-media retrieval method based on a cross-media uniform characterization model is characterized in that: the method comprises the following steps:
① constructing cross-media database, establishing cross-media database facing government affairs news field;
② preprocessing cross-media data, preprocessing the input of the cross-media database to obtain cross-media data;
③ extracting original domain features of the cross-media data, namely extracting original domain feature vectors of the cross-media data;
④, performing unified characterization of cross-media data, namely generating a cross-media unified characterization model supporting four media data input by adopting a deep neural network model for countermeasure train, and extracting a common spatial feature vector output by the cross-media unified characterization model;
⑤ calculating and sorting the semantic similarity of data retrieval, namely calculating the cosine similarity of the public space feature vector output by the cross-media uniform representation model and the original domain feature vector of the cross-media data, sorting by the similarity, and outputting the first K data with the maximum similarity as a retrieval result to be output.
2. The cross-media search method based on the cross-media uniform characterization model according to claim 1, wherein in the step ①, the government news domain comprises government news, political characters and political events, and the cross-media database stores four types of unstructured data including text, images, videos and audios.
3. The method as claimed in claim 1, wherein the step ② is performed by preprocessing the data format and dimension of the multimedia search input data of text, image, video and audio, wherein the audio data is transformed into the spectrum image as the audio data input, and the text is segmented to obtain the segmentation array.
4. The cross-media retrieval method based on the cross-media uniform characterization model according to claim 1, wherein in the step ③, a word2vec model is used to extract original domain feature vectors for text data, a deep convolution network is used to extract original domain features for image data, a C3D is used to extract video original domain features for video data, and a deep convolution network is used to extract original domain feature vectors for voice data.
5. The cross-media retrieval method based on the cross-media uniform characterization model according to claim 3, wherein: and acquiring a word segmentation array of the text through word segmentation.
CN201911061277.4A 2019-11-01 2019-11-01 Cross-media retrieval method based on cross-media uniform characterization model Pending CN110866129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911061277.4A CN110866129A (en) 2019-11-01 2019-11-01 Cross-media retrieval method based on cross-media uniform characterization model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911061277.4A CN110866129A (en) 2019-11-01 2019-11-01 Cross-media retrieval method based on cross-media uniform characterization model

Publications (1)

Publication Number Publication Date
CN110866129A true CN110866129A (en) 2020-03-06

Family

ID=69654308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911061277.4A Pending CN110866129A (en) 2019-11-01 2019-11-01 Cross-media retrieval method based on cross-media uniform characterization model

Country Status (1)

Country Link
CN (1) CN110866129A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813967A (en) * 2020-07-14 2020-10-23 中国科学技术信息研究所 Retrieval method, retrieval device, computer equipment and storage medium
CN111949806A (en) * 2020-08-03 2020-11-17 中电科大数据研究院有限公司 Cross-media retrieval method based on Resnet-Bert network model
CN112528127A (en) * 2020-05-30 2021-03-19 山东工商学院 Big data-based plane design work matching degree analysis system
CN112559820A (en) * 2020-12-17 2021-03-26 中国科学院空天信息创新研究院 Sample data set intelligent question setting method, device and equipment based on deep learning
CN115309941A (en) * 2022-08-19 2022-11-08 联通沃音乐文化有限公司 AI-based intelligent tag retrieval method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166684A (en) * 2014-07-24 2014-11-26 北京大学 Cross-media retrieval method based on uniform sparse representation
CN105701225A (en) * 2016-01-15 2016-06-22 北京大学 Cross-media search method based on unification association supergraph protocol
CN108319686A (en) * 2018-02-01 2018-07-24 北京大学深圳研究生院 Antagonism cross-media retrieval method based on limited text space

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166684A (en) * 2014-07-24 2014-11-26 北京大学 Cross-media retrieval method based on uniform sparse representation
CN105701225A (en) * 2016-01-15 2016-06-22 北京大学 Cross-media search method based on unification association supergraph protocol
CN108319686A (en) * 2018-02-01 2018-07-24 北京大学深圳研究生院 Antagonism cross-media retrieval method based on limited text space

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG B ET AL.: "Adversarial Cross-Modal Retrieval", 《ACM》 *
董建锋: "跨模态检索中的相关度计算研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528127A (en) * 2020-05-30 2021-03-19 山东工商学院 Big data-based plane design work matching degree analysis system
CN111813967A (en) * 2020-07-14 2020-10-23 中国科学技术信息研究所 Retrieval method, retrieval device, computer equipment and storage medium
CN111813967B (en) * 2020-07-14 2024-01-30 中国科学技术信息研究所 Retrieval method, retrieval device, computer equipment and storage medium
CN111949806A (en) * 2020-08-03 2020-11-17 中电科大数据研究院有限公司 Cross-media retrieval method based on Resnet-Bert network model
CN112559820A (en) * 2020-12-17 2021-03-26 中国科学院空天信息创新研究院 Sample data set intelligent question setting method, device and equipment based on deep learning
CN115309941A (en) * 2022-08-19 2022-11-08 联通沃音乐文化有限公司 AI-based intelligent tag retrieval method and system
CN115309941B (en) * 2022-08-19 2023-03-10 联通沃音乐文化有限公司 AI-based intelligent tag retrieval method and system

Similar Documents

Publication Publication Date Title
Kaur et al. Comparative analysis on cross-modal information retrieval: A review
CN110866129A (en) Cross-media retrieval method based on cross-media uniform characterization model
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
CN108268600B (en) AI-based unstructured data management method and device
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
JP2006510114A (en) Representation of content in conceptual model space and method and apparatus for retrieving it
CN110990597A (en) Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof
CN113392265A (en) Multimedia processing method, device and equipment
CN116821696B (en) Training method, device, equipment and storage medium for form question-answer model
CN116702091A (en) Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN117173730A (en) Document image intelligent analysis and processing method based on multi-mode information
CN112182273B (en) Cross-modal retrieval method and system based on semantic constraint matrix decomposition hash
CN117688220A (en) Multi-mode information retrieval method and system based on large language model
CN117332103A (en) Image retrieval method based on keyword extraction and multi-modal feature fusion
Pereira et al. SAPTE: A multimedia information system to support the discourse analysis and information retrieval of television programs
CN107633259B (en) Cross-modal learning method based on sparse dictionary representation
Perdana et al. Instance-based deep transfer learning on cross-domain image captioning
CN105069136A (en) Image recognition method in big data environment
Tian et al. Research on image classification based on a combination of text and visual features
Chivadshetti et al. Content based video retrieval using integrated feature extraction and personalization of results
CN109255098B (en) Matrix decomposition hash method based on reconstruction constraint
CN115563311B (en) Document labeling and knowledge base management method and knowledge base management system
CN117851654A (en) Archives resource retrieval system based on artificial intelligence pronunciation and image recognition
Ronghui et al. Application of Improved Convolutional Neural Network in Text Classification.
CN111506754B (en) Picture retrieval method, device, storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200306

RJ01 Rejection of invention patent application after publication