[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114943017B - Cross-modal retrieval method based on similarity zero sample hash - Google Patents

Cross-modal retrieval method based on similarity zero sample hash Download PDF

Info

Publication number
CN114943017B
CN114943017B CN202210696434.4A CN202210696434A CN114943017B CN 114943017 B CN114943017 B CN 114943017B CN 202210696434 A CN202210696434 A CN 202210696434A CN 114943017 B CN114943017 B CN 114943017B
Authority
CN
China
Prior art keywords
similarity
hash
modal
cross
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210696434.4A
Other languages
Chinese (zh)
Other versions
CN114943017A (en
Inventor
舒振球
永凯玲
余正涛
高盛祥
毛存礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202210696434.4A priority Critical patent/CN114943017B/en
Publication of CN114943017A publication Critical patent/CN114943017A/en
Application granted granted Critical
Publication of CN114943017B publication Critical patent/CN114943017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-modal retrieval method based on similarity zero sample hash. A new zero sample hash framework is provided to fully mine supervised semantic information, which combines intra-modal similarity, inter-modal similarity, semantic tags and class attributes to guide the learning of zero sample hash codes. In this framework intra-modality similarity and inter-modality similarity are considered at the same time. Intra-modality similarity is represented by manifold structure and feature similarity of multi-modality data, and inter-modality similarity is represented by semantic correlation between modalities. In addition, the semantic tags and class attributes are embedded into the hash codes, and more discriminative hash codes are learned for each instance. However, due to the embedding of class attributes, the relationship between the visible class and the invisible class can be captured well in the hash code, so that the knowledge of the attributes can be transferred from the visible class into the invisible class. The invention realizes higher-precision retrieval of zero sample cross-modal data.

Description

Cross-modal retrieval method based on similarity zero sample hash
Technical Field
The invention relates to a cross-modal retrieval method based on similarity zero sample hash, and belongs to the field of cross-modal hash retrieval.
Background
Most existing cross-modal hash retrieval methods are studied in a visible class dataset. However, with the explosive growth of multimedia data, a large number of new concepts (invisible classes) emerge. Retraining an existing cross-modal hash model by collecting new conceptual data is not feasible because it would consume a significant amount of time and space. It is therefore necessary to propose a cross-modal hash model in which the training data does not contain new concepts, but which can still handle the new concepts. However, zero sample learning can identify data categories that have never been seen. That is, the trained classifier is not only able to identify existing data categories in the training set, but also to distinguish data from unseen categories. This makes zero sample learning a research hotspot for invisible class retrieval tasks.
Over the past few years, zero sample learning has been widely used in single-mode retrieval tasks. Some researchers implement potential semantic transfers by projecting tags into a word embedding space. Some researchers have proposed a zero sample hash based on an asymmetric ratio similarity matrix to improve knowledge transfer capability from visible to invisible classes. Other researchers have proposed a zero-sample learning model for multi-label image retrieval that predicts data labels of invisible classes using an example concept consistency ranking algorithm. However, the above work is a study on a single-mode search task, and the study on an invisible cross-mode search task is still relatively poor. Under the big data age that new concepts are continuously emerging, the existing cross-mode retrieval method also has the following problems: (1) The existing method only considers visible class data, and ignores invisible class data. Therefore, such a model is not suitable for cross-modal data retrieval in the big data age. (2) Most methods do not use class attribute information in hash code learning, which is detrimental to the transfer of knowledge from visible classes to invisible classes. (3) The existing few zero sample cross-modal retrieval methods fail to train models by using intra-modal similarity, inter-modal similarity, class labels and class attributes at the same time.
Disclosure of Invention
In view of the challenges presented above, the present invention provides a cross-modal retrieval method based on similarity zero sample hashing. The invention is used for solving the problem of cross-mode retrieval containing invisible class data by fusing intra-mode similarity, inter-mode similarity, tag information and class attribute.
In order to achieve the purpose of the invention, the cross-modal retrieval method based on similarity zero sample hash has the technical scheme that: the invention provides a new zero sample hash framework for fully mining and supervising semantic information, which combines intra-mode similarity, inter-mode similarity, semantic tags and class attributes to guide the learning process of zero sample hash codes. In this framework intra-modality similarity and inter-modality similarity are considered at the same time. Intra-modal similarity represents features and semantic similarity among samples in a mode, and inter-modal similarity represents semantic correlation among modes. In addition, the semantic tags and class attributes are embedded into the hash codes, and more discriminative hash codes are learned for each instance. However, due to the embedding of class attributes, the relationship between the visible class and the invisible class can be captured well in the hash code, so that the supervision knowledge can be transferred from the visible class into the invisible class. The invention comprises the following steps:
step1, acquiring a cross-modal dataset, and extracting cross-modal dataset characteristics and class attribute vectors;
Step2, processing a cross-modal dataset: processing the existing cross-modal data set into a cross-modal zero-sample data set; the original data set is divided into a training set and a query set, 20% of the classes of the original data set are randomly selected as invisible classes, and the rest are visible classes. For a zero sample cross-modal retrieval scene, the invention takes a sample pair corresponding to an invisible class in an original query set as a new query set; taking a sample pair corresponding to the visible class in the original training set as a new training set; the search set consists of an original training set;
Step3, learning an objective function: the intra-modal similarity, inter-modal similarity, semantic tags, class attributes, hash codes and hash functions are fused to learn the same framework, so that an objective function is obtained, and hash codes with more discriminant are learned;
Step4, performing iterative updating of the objective function: the variable matrix in the objective function obtained in the last step is updated through iteration until the objective function converges or reaches the maximum iteration number, and a hash function and a hash code of a training set are obtained;
Step5, performing zero sample cross-modal retrieval: and inputting a query sample, and obtaining a hash code of the query sample according to the hash function obtained in Step 4. The hash codes of the query samples are substituted into the retrieval set to query, and because the query is performed in a binary space, the result of the query is obtained by calculating the Hamming distance between the query samples and each sample in the retrieval set. The sample corresponding to the minimum Hamming distance in the search set is the query result obtained for us.
Further, the cross-modality retrieval data set includes a plurality of sample pairs, each sample pair including: text, images, and corresponding semantic tags.
Further, in Step1, extracting image features through a VGG-16 model; extracting text features through a word bag model; extracting class attributes, extracting a corresponding word vector for each class name by a Glove method, and forming a class attribute matrix.
Further, in Step2, in order to ensure generalization capability of the model, each time the data enters the model for training, a random selection method should be used to process and divide the data set. Through multiple training, an average value is taken as the final result.
The intra-modal similarity in Step3 is divided into feature similarity and semantic similarity, wherein the feature similarity is calculated through Euclidean similarity, and the semantic similarity is measured through Jaccard similarity.
Further, the inter-modality similarity in Step3 refers to the semantic similarity between each instance of different modalities, and the semantic similarity is measured by the tag semantic information.
Further, the objective function obtained in Step3 includes two parts, i.e., hash code learning and hash function, where hash code learning refers to learning hash codes by combining intra-modal similarity, inter-modal similarity, semantic tags and class attributes; the learning of the hash function refers to learning the hash function by minimizing the least square regression problem, and the semantic relation between the hash code and the hash function is enhanced by putting the hash code learning and the hash function learning into the same model for learning, so that high-precision zero sample cross-modal retrieval is realized.
Further, the iterative update of the objective function in Step4 is updated by using the objective function obtained in Step4 as an original function. It is clear that the objective function is not optimal and that it needs to be optimized. Since the objective function is a non-convex problem, when other variables are fixed and a matrix variable is updated, the function at this time is a convex problem, so that the update of the objective function is facilitated. In the invention, the matrix variable is updated by adopting the alternate iterative algorithm until the objective function converges or the maximum iterative times are reached, and finally the optimal hash code and the hash function are obtained.
Further, in Step3, establishing intra-modality similarity, inter-modality similarity and hash code connection through a kernel-based supervised hash (KSH) optimization model, and enhancing semantic information in the hash code in a manner that the similarity is embedded in the hash code; establishing a relation between the semantic tag and the class attribute and the hash code in a tag reconstruction mode, embedding the tag into the hash code, and enhancing semantic information contained in the hash code; embedding class attributes in the hash code is realized by transferring attribute knowledge in the visible class to the invisible class so as to realize the retrieval of the invisible class.
Further, in Step4, since the master model is a non-convex problem, it is a difficult problem to directly optimize it. However, when the rest variables are fixed and only one variable is optimized, the new problem converted by the original model is a convex problem, and the optimization solution can be directly carried out. Similarly, each variable is optimized in this way until convergence or the maximum number of iterations is reached, resulting in an optimal result.
(1) The existing method only considers visible class data, and ignores invisible class data. Therefore, such a model is not suitable for cross-modal data retrieval in the big data age. (2) Most methods do not use class attribute information in hash code learning, which is detrimental to the transfer of knowledge from visible classes to invisible classes. (3) The existing few zero sample cross-modal retrieval methods fail to train models by using intra-modal similarity, inter-modal similarity, class labels and class attributes at the same time.
The beneficial effects of the invention are as follows:
The invention provides a cross-modal retrieval method based on similarity zero sample hash. The method overcomes the defect that the existing majority of cross-modal retrieval methods cannot solve the limitation of zero sample data. The hash codes are learned by simultaneously using intra-mode similarity, inter-mode similarity and class attributes, so that the relationship between the visible class and the invisible class can be well captured, and the monitoring knowledge is transferred from the visible class to the invisible class. Furthermore, to take into account the supervised tag information, the present invention improves accuracy by embedding the tag information into the attribute space. Thus, more discriminant hash codes can be generated from the model proposed by the present invention. In addition, the invention provides a discrete optimization scheme to solve the proposed model, thereby effectively avoiding quantization errors.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention.
FIG. 1 is a flow chart of a method according to an embodiment of the invention.
FIG. 2 is a flow chart of iterative updating of the SAZH model of the present invention.
Detailed Description
The following description is exemplary and is intended to further illustrate the embodiments of the present invention with reference to the accompanying drawings.
Example 1
FIG. 1 is a flow chart of a cross-modal retrieval method based on similarity zero sample hashing.
In this example, referring to fig. 1, the method of the present invention specifically includes the following processes:
1. And acquiring a cross-modal dataset, and extracting cross-modal dataset characteristics and class attribute vectors. In this example, the data set used includes both image and text modalities, as well as a label corresponding to one of them. In Step1, extracting the class attribute, extracting a corresponding word vector for each class name by using Glove method to form a class attribute matrix.
2. Processing of cross-modality datasets. Because the problem to be solved by the invention is zero sample cross-modal retrieval, the acquired cross-modal dataset cannot be directly used. The data set should be processed according to the application scenario of the zero sample so as to conform to the application scenario of the zero sample cross-modal retrieval. The specific treatment method comprises the following steps:
the original data set is divided into a training set and a query set, 20% of the classes of the original data set are randomly selected as invisible classes, and the rest are visible classes. For a zero sample cross-modal retrieval scene, the invention takes a sample pair corresponding to an invisible class in an original query set as a new query set; taking a sample pair corresponding to the visible class in the original training set as a new training set; the search set consists of the original training set.
In the present invention, a given set of multimodal data is: wherein/> Is a multi-modal data point,/>For the feature vector corresponding to the ith instance of the image modality,/>For the feature vector corresponding to the ith instance of the text mode, l i is the common label vector corresponding to the ith instance of the two modes, and n is the total instance number of the data set.
The invention is used for processing and dividing the data setTo represent multimodal data for the training set, where n s is the number of training samples.
3. The intra-modal similarity, inter-modal similarity, tag information, class attributes, hash codes and hash functions are fused to learn the same framework, so that an objective function is obtained, and hash codes with more discriminant are learned. The learning model of each module will be described in detail below:
3.1, modal similarity learning
Intra-modal similarity is divided into feature similarity, which is calculated by Euclidean similarity, and semantic similarity, which is measured by Jaccard similarity. The Euclidean distance is simple to calculate and reflects the distance between the two vectors, and the method adopts the Euclidean distance as the characteristic similarity measurement. First of all,And/>The Euclidean distance between the two is: /(I)Then/>And/>The similarity is: /(I)Wherein/>And/>The ith and jth samples, respectively, representing the tth modality, and t= {1,2} representation are illustrated in two modalities in the present invention.
Furthermore, we measure semantic similarity with Jaccard similarity as follows:
Wherein, Corresponding to the number of labels assigned to the ith instance in the nth modality. The label of an instance depends on its features, and the semantic similarity is positively correlated with the feature similarity of the respective instance. Therefore, we can combine the feature similarity with the semantic similarity between the data, and can get the following learning model:
Wherein, Is the overall similarity within the modality,/>Representing the feature similarity between two samples,/>Semantic similarity is measured by the Jaccard similarity method.
3.2 Inter-modality similarity learning
Inter-modality similarity refers to semantic similarity among instances among different modalities, and the semantic similarity is measured through label semantic information;
specifically, in the present invention, the inter-modality similarity is calculated by a class label matrix. Is provided with For the corresponding tag matrix, where L ij =1 means that X i* belongs to class j, otherwise 0. Furthermore, the inter-modality similarity matrix, expressed asCan be constructed from a tag matrix if/>X i* and X j* are described as similar. Otherwise, X i* and X j* are dissimilar. Where c represents the number of categories.
3.3 Hash function learning
The hash function in the present invention is learned by minimizing the following least squares regression problem:
Where β is a non-negative parameter, B 1 and B 2 correspond to the hash codes of the two modalities of image and text, respectively, and W 1 and W 2 correspond to the projection matrices of the two modalities of image and text, respectively.
3.4, Similarity preserving learning
The similarity preserving learning model provided by the invention comprehensively considers intra-mode similarity and inter-mode similarity by combining an optimization model based on Kernel Supervised Hash (KSH). The model expression is as follows:
Wherein S 11 and S 22 are intra-mode similarity matrices of two modes of the image and the text, respectively, and S 12 is an inter-mode similarity matrix of the two modes of the image and the text.
3.5 Class Attribute and Label embedding
The method embeds the tag information into the hash code, which is helpful for generating the optimized binary code by fully utilizing the tag information, and has stronger robustness when processing data in a large scale. Thus, an optimized hash code is obtained by optimizing the following model:
Where α is a non-negative parameter, C 1 and C 2 represent projection matrices that project the two modality hash codes of the image and text, respectively, into the tag.
In addition, class attribute information is added into the proposed model, so that not only can the generation of a hash code with more discriminant be facilitated, but also the transfer of attribute knowledge from visible class to invisible class is mainly realized, and the zero sample cross-modal retrieval problem is solved. The class attribute information is embedded by embedding a class attribute matrix corresponding to each class name into the projection matrix of formula (5). Thus, the tag information and the class attribute information can be simultaneously embedded into the learning of the hash code. Updating the formula (5) to:
Wherein V 1 and V 2 represent a transformation projection matrix for projecting two modality hash codes of an image and a text into a tag and adding class attribute information respectively.
3.6 Objective function
The objective function of the invention is obtained by combining the steps as follows:
Wherein, Regularization terms representing the model, the purpose of which is to prevent overfitting; gamma is a parameter that controls the regularization term. X (1) and X (2) are feature matrices of two modes of an image and a text respectively; y is a label matrix; a is a class attribute matrix; s 11 and S 22 are intra-mode similarity matrices of two modes of an image and a text respectively, and S 12 is an inter-mode similarity matrix of the two modes of the image and the text; w 1、W2、V1、V2 is a projection matrix; alpha and beta are non-negative parameters.
4. Performing iterative updating of the objective function: and iteratively updating the objective function obtained in the last step until the objective function converges or the maximum iteration number is reached, so as to obtain a hash function and a hash code of the training set.
The function (7) is not optimal and it needs to be updated iteratively. Obviously, the overall objective function is a non-convex optimization problem. We therefore propose an efficient iterative algorithm to solve this problem.
Specifically, referring to fig. 2, the optimization procedure for equation (7) is as follows:
B 1 -step: the variable W 1,W2,V1,V2,B2 is fixed, so for B 1, equation (7) can be reduced to:
By setting the partial derivative of B 1 to zero, a closed-loop solution for B 1 can be derived. The following are provided:
B 2 -step: similar to the update procedure of B 1, a closed solution of B 2 is obtained. The following are provided:
v 1 -step: the variable W 1,W2,V2,B1,B2 is fixed, so for V 1, equation (7) can be reduced to:
By setting the partial derivative of V 1 to zero, we can derive the following formula:
we define B11=AAT,/>However, equation (12) can be rewritten as:
A11V1+V1B11=C11 (13)
equation (13) is a Sylvester equation that can be solved using the Sylvester function in MATLAB.
V 2 -step: similarly, with respect to V 2, we have:
A22V2+V2B22=C22 (14)
Wherein, B22=AAT,/>
W 1 -step: similarly, with respect to W 1, we have:
A33W1+W1B33=C33 (15)
Wherein, B33=B1 TB1,/>
W 2 -step: similarly, with respect to W 2, we have:
A44W2+W2B44=C44 (16)
Wherein, B44=B2 TB2,/>
And (3) optimizing the formula (7) through the steps until the function converges or the maximum iteration number is reached, and stopping iteration.
5. Inquiring, and performing zero sample cross-modal retrieval: firstly, obtaining a hash code corresponding to a search set, inputting a query sample, and obtaining the hash code of the query sample according to the hash function obtained in the last step. And substituting the hash codes of the query samples into the retrieval set to query. The specific implementation steps are as follows:
the feature matrix corresponding to the query sample of a given image and text is And/>The projection matrices W 1 and W 2 obtained in the previous step are combined. By the formula/>And/>And obtaining the hash code corresponding to the query sample. In this embodiment, we do two main search tasks: image query text and text query images.
Because the query task of the invention is carried out in a binary space, the result of the query is obtained by calculating the Hamming distance between the query sample and each sample in the search set. The sample corresponding to the minimum Hamming distance in the search set is the query result obtained by the user.
In order to illustrate the effects of the present invention, the following describes the technical solution of the present invention through specific embodiments:
1. Simulation conditions
The invention uses Matlab software to perform experimental simulation. Experiments were conducted on a cross-modality dataset Wiki (containing both image and text modalities), the conducted experiments including two query tasks: (1) Text query image (Text 2 Img), (2) image query Text (Img 2 Text). The parameters in the experiment were set to α=1e-2, β=1e5, γ=1e-4.
2. Emulation content
The method is compared with the existing non-zero sample cross-mode hash retrieval method and the zero sample cross-mode hash retrieval method, and the non-zero sample cross-mode hash retrieval method is adopted as the comparison method: (1) Collaborative Matrix Factorization Hash (CMFH), (2) Joint and Individual Matrix Factorization Hash (JIMFH), (3) Discrete Robust Matrix Factorization Hash (DRMFH), (4) Asymmetric Supervised Consistent and Specific Hash (ASCSH), (5) tagged consistent matrix factorization hash (LCMFH); the zero sample single-mode hash retrieval method comprises the following steps: (1) Zero sample hashing (TSK) based on supervised knowledge transfer, (2) attribute hashing Algorithm (AH) for zero sample image retrieval; the zero sample cross-modal hash retrieval method comprises the following steps: (1) Cross-modal property hashing (CMAH), (2) orthogonal hashing algorithm (CHOP) for zero sample cross-modal retrieval. And for the zero sample single-mode hash retrieval method, the hash codes of the two modes of the image and the text are obtained through a single-mode model respectively, and then the following query task is carried out.
3. Simulation results
The comparison method and the experimental result of the method under the data set Wiki are respectively given in the simulation experiment. In order to meet the zero sample cross-modal retrieval scenario, 20% of classes in the random data set Wiki are selected as invisible classes. The total of 8 classes are contained in the data set Wiki, and according to the experimental setting, the embodiment randomly selects two classes as invisible classes from the classes, and the processing mode of the rest data sets is the same as that of the invention.
In this simulation, a widely used index was used to measure the performance of the SASH method and other comparative methods proposed by the present invention. I.e. the average value of the average accuracy (mAP). Given a query and a list of search results, the average Accuracy (AP) is defined as:
Where N is the number of related instances in the search set, P (r) is defined as the accuracy of the r-th search instance, and if the r-th search instance is the true neighbor of the query, δ (r) =1; otherwise δ (r) =0. All queried APs are then averaged to obtain the mAP. The evaluation rule is that the larger the mAP value is, the better the performance is.
The hash code lengths from the simulation experiments are 8 bits, 12 bits, 16 bits and 32 bits, and the corresponding mAP values of the SAZH method and other comparison methods proposed by the present invention are shown in Table 1.
Table 1 mAP values on Text query image (Text 2 Img) task for all methods on Wiki dataset
Table 2 mAP values on image query (Img 2 Text) task for all methods on Wiki dataset
As can be seen from tables 1 and 2, the mAP values in two query tasks of the SAZH method proposed by the present invention in the zero-sample cross-modal retrieval scenario of the Wiki dataset are higher than those of other comparative methods. Further proves the superiority of SAZH method in zero sample cross-modal search.
The foregoing examples merely illustrate specific embodiments of the invention, which are described in greater detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims (5)

1. A cross-modal retrieval method based on similarity zero sample hash is characterized in that: the method comprises the following specific steps:
step1, acquiring a cross-modal dataset, and extracting features of the cross-modal dataset and class attributes;
step2, processing a cross-modal dataset: processing the existing cross-modal data set into a cross-modal zero-sample data set;
Step3, learning an objective function: the intra-modal similarity, inter-modal similarity, semantic tags, class attributes, hash codes and hash functions are fused to learn the same framework, so that an objective function is obtained, and hash codes with more discriminant are learned;
Step4, performing iterative updating of the objective function: the variable matrix in the objective function obtained by Step3 is updated iteratively until the objective function converges or reaches the maximum iteration number, so that a hash function and a hash code of a training set are obtained;
Step5: zero sample cross-modal retrieval: firstly, obtaining hash codes corresponding to a search set, then obtaining hash codes of a query set through a hash function obtained in Step4, putting the hash codes into the search set for query, obtaining a query result through calculating Hamming distances between the query set and each sample in the search set, and obtaining a final query result by the smallest Hamming distance;
In Step1, extracting class attributes, wherein a Glove method is adopted to extract a corresponding word vector for each class name to form a class attribute matrix;
The objective function obtained in Step3 comprises two parts of hash code learning and a hash function, wherein the hash code learning refers to learning the hash code by combining intra-mode similarity, inter-mode similarity, semantic tags and class attributes; learning of the hash function means that the hash function is learned by minimizing the least square regression problem, and the hash code learning and the hash function learning are put into the same model for learning, so that semantic relation between the hash code and the hash function is enhanced, and high-precision zero sample cross-modal retrieval is realized;
the objective function in Step3 is:
Wherein, Regularization terms representing the model to prevent overfitting; gamma is a parameter for controlling regularization term, and X (1) and X (2) are feature matrices of two modes of image and text respectively; y is a label matrix; a is a class attribute matrix; s 11 and S 22 are intra-mode similarity matrices of two modes of an image and a text respectively, and S 12 is an inter-mode similarity matrix of the two modes of the image and the text; w 1、W2、V1、V2 is a projection matrix; alpha and beta are non-negative parameters, n s is the number of training samples, and B 1 and B 2 correspond to hash codes of two modes of an image and a text respectively.
2. The cross-modal retrieval method based on similarity zero sample hashing according to claim 1, wherein: the specific method of Step2 is as follows: the original data set is divided into a training set and a query set firstly, then 20% of the classes in all classes of the original data set are randomly selected as invisible classes, and the rest classes are visible classes; for a zero sample cross-modal retrieval scene, taking a sample pair corresponding to an invisible class in an original query set as a new query set; taking a sample pair corresponding to the visible class in the original training set as a new training set; the search set consists of the original training set.
3. The cross-modal retrieval method based on similarity zero sample hashing according to claim 1, wherein: the intra-modal similarity in Step3 is divided into feature similarity and semantic similarity, wherein the feature similarity is calculated through Euclidean similarity, and the semantic similarity is measured through Jaccard similarity.
4. The cross-modal retrieval method based on similarity zero sample hashing according to claim 1, wherein: the inter-modality similarity in Step3 refers to the semantic similarity between each instance of different modalities, and the semantic similarity is measured through label semantic information.
5. The cross-modal retrieval method based on similarity zero sample hashing according to claim 1, wherein: the iterative updating of the objective function in Step4 is performed by taking the objective function obtained in Step4 as an original function, and obviously the objective function is not optimal, and the objective function needs to be optimized, because the objective function is a non-convex problem, when other variables are fixed and a matrix variable is updated, the function is a convex problem, so that the updating of the objective function is facilitated; and updating the matrix variables by adopting the alternate iterative algorithm until the objective function converges or the maximum iterative times are reached, and finally obtaining the optimal hash code and the hash function.
CN202210696434.4A 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash Active CN114943017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210696434.4A CN114943017B (en) 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210696434.4A CN114943017B (en) 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash

Publications (2)

Publication Number Publication Date
CN114943017A CN114943017A (en) 2022-08-26
CN114943017B true CN114943017B (en) 2024-06-18

Family

ID=82911208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210696434.4A Active CN114943017B (en) 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash

Country Status (1)

Country Link
CN (1) CN114943017B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244484B (en) * 2023-05-11 2023-08-08 山东大学 Federal cross-modal retrieval method and system for unbalanced data
CN116244483B (en) * 2023-05-12 2023-07-28 山东建筑大学 Large-scale zero sample data retrieval method and system based on data synthesis
CN117992805B (en) * 2024-04-07 2024-07-30 武汉商学院 Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460077B (en) * 2019-01-22 2021-03-26 大连理工大学 Cross-modal Hash retrieval method based on class semantic guidance
CN110059198B (en) * 2019-04-08 2021-04-13 浙江大学 Discrete hash retrieval method of cross-modal data based on similarity maintenance
CN112364195B (en) * 2020-10-22 2022-09-30 天津大学 Zero sample image retrieval method based on attribute-guided countermeasure hash network
CN113342922A (en) * 2021-06-17 2021-09-03 北京邮电大学 Cross-modal retrieval method based on fine-grained self-supervision of labels

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Cross modal zero shot hashing;Xuanwu Liu等;《2019 IEEE International Conference on Data Mining (ICDM)》;20200130;1-9 *
跨模态哈希学习算法及其应用研究;庾骏;《中国博士学位论文全文数据库 信息科技辑》;20210415;I140-12 *

Also Published As

Publication number Publication date
CN114943017A (en) 2022-08-26

Similar Documents

Publication Publication Date Title
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
CN114943017B (en) Cross-modal retrieval method based on similarity zero sample hash
Xu et al. Learning low-rank label correlations for multi-label classification with missing labels
CN110674323B (en) Unsupervised cross-modal Hash retrieval method and system based on virtual label regression
Fang et al. Active learning for crowdsourcing using knowledge transfer
CN112364174A (en) Patient medical record similarity evaluation method and system based on knowledge graph
CN112199532B (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN109376796A (en) Image classification method based on active semi-supervised learning
CN107346327A (en) The zero sample Hash picture retrieval method based on supervision transfer
CN113535947B (en) Multi-label classification method and device for incomplete data with missing labels
CN109271486A (en) A kind of similitude reservation cross-module state Hash search method
CN111080551B (en) Multi-label image complement method based on depth convolution feature and semantic neighbor
CN111026887B (en) Cross-media retrieval method and system
Amiri et al. Automatic image annotation using semi-supervised generative modeling
CN116932722A (en) Cross-modal data fusion-based medical visual question-answering method and system
CN111368176B (en) Cross-modal hash retrieval method and system based on supervision semantic coupling consistency
CN109857892B (en) Semi-supervised cross-modal Hash retrieval method based on class label transfer
Pei et al. Efficient semantic image segmentation with multi-class ranking prior
CN108427730B (en) Social label recommendation method based on random walk and conditional random field
CN111506832B (en) Heterogeneous object completion method based on block matrix completion
CN111259176B (en) Cross-modal Hash retrieval method based on matrix decomposition and integrated with supervision information
CN109543114A (en) Heterogeneous Information network linking prediction technique, readable storage medium storing program for executing and terminal
CN111160398B (en) Missing label multi-label classification method based on example level and label level association
Li et al. More correlations better performance: Fully associative networks for multi-label image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant